说明

本文所涉及的spring data elasticsearch 4.1.x，该版本对应的es版本是7.9.3，由于7.x版本改变差异很大，望周知！

@Document

@Document源码

@Persistent
@Inherited
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE })
public @interface Document {

	/**
	 * Name of the Elasticsearch index.
	 * <ul>
	 * <li>Lowercase only</li>
	 * <li><Cannot include \, /, *, ?, ", <, >, |, ` ` (space character), ,, #/li>
	 * <li>Cannot start with -, _, +</li>
	 * <li>Cannot be . or ..</li>
	 * <li>Cannot be longer than 255 bytes (note it is bytes, so multi-byte characters will count towards the 255 limit
	 * faster)</li>
	 * </ul>
	 */
        // 索引库名称
	String indexName();

	/**
	 * Use server-side settings when creating the index.
	 */
	boolean useServerConfiguration() default false;

	/**
	 * Number of shards for the index {@link #indexName()}. Used for index creation. <br/>
	 * With version 4.0, the default value is changed from 5 to 1 to reflect the change in the default settings of
	 * Elasticsearch which changed to 1 as well in Elasticsearch 7.0.
	 */
        // 默认分片数1
	short shards() default 1;

	/**
	 * Number of replicas for the index {@link #indexName()}. Used for index creation.
	 */
        // 默认副本数1
	short replicas() default 1;

	/**
	 * Refresh interval for the index {@link #indexName()}. Used for index creation.
	 */
        // 刷新间隔
	String refreshInterval() default "1s";

	/**
	 * Index storage type for the index {@link #indexName()}. Used for index creation.
	 */
        // 索引文件存储类型
	String indexStoreType() default "fs";

	/**
	 * Configuration whether to create an index on repository bootstrapping.
	 */
        // 是否创建索引
	boolean createIndex() default true;

	/**
	 * Configuration of version management.
	 */
        // 版本管理的配置
	VersionType versionType() default VersionType.EXTERNAL;
}

@Document注解使用

indexName：对应索引库名称
type：对应在索引库中的类型(7.x以后去除原来的分类，现在默认的分类都是_doc)
shards：分片数(7.x以后默认的分片数都变成1了)
replicas：副本数

@Field

@Field源码

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.FIELD)
@Documented
@Inherited
public @interface Field {

	/**
	 * Alias for {@link #name}.
	 *
	 * @since 3.2
	 */
	@AliasFor("name")
	String value() default "";

	/**
	 * The <em>name</em> to be used to store the field inside the document.
	 * <p>
	 * √5 If not set, the name of the annotated property is used.
	 *
	 * @since 3.2
	 */
	@AliasFor("value")
	String name() default "";

        //自动检测属性类型
	FieldType type() default FieldType.Auto;

	boolean index() default true;
        
        //时间类型的字段格式化
	DateFormat format() default DateFormat.none;

	String pattern() default "";

        //默认情况下不存储
	boolean store() default false;

	boolean fielddata() default false;

        //指定字段使用搜索时的分词
	String searchAnalyzer() default "";

	String analyzer() default "";

	String normalizer() default "";

        //如果某个字段需要被忽略
	String[] ignoreFields() default {};

	boolean includeInParent() default false;

	String[] copyTo() default {};

	/**
	 * @since 4.0
	 */
	int ignoreAbove() default -1;

	/**
	 * @since 4.0
	 */
	boolean coerce() default true;

	/**
	 * @since 4.0
	 */
	boolean docValues() default true;

	/**
	 * @since 4.0
	 */
	boolean ignoreMalformed() default false;

	/**
	 * @since 4.0
	 */
	IndexOptions indexOptions() default IndexOptions.none;

	/**
	 * @since 4.0
	 */
	boolean indexPhrases() default false;

	/**
	 * implemented as array to enable the empty default value
	 *
	 * @since 4.0
	 */
	IndexPrefixes[] indexPrefixes() default {};

	/**
	 * @since 4.0
	 */
	boolean norms() default true;

	/**
	 * @since 4.0
	 */
	String nullValue() default "";

	/**
	 * @since 4.0
	 */
	int positionIncrementGap() default -1;

	/**
	 * @since 4.0
	 */
	Similarity similarity() default Similarity.Default;

	/**
	 * @since 4.0
	 */
	TermVector termVector() default TermVector.none;

	/**
	 * @since 4.0
	 */
	double scalingFactor() default 1;

	/**
	 * @since 4.0
	 */
	int maxShingleSize() default -1;

	/**
	 * if true, the field will be stored in Elasticsearch even if it has a null value
	 * 
	 * @since 4.1
	 */
	boolean storeNullValue() default false;

	/**
	 * to be used in combination with {@link FieldType#Rank_Feature}
	 * 
	 * @since 4.1
	 */
	boolean positiveScoreImpact() default true;

	/**
	 * to be used in combination with {@link FieldType#Object}
	 *
	 * @since 4.1
	 */
	boolean enabled() default true;

	/**
	 * @since 4.1
	 */
	boolean eagerGlobalOrdinals() default false;

	/**
	 * @since 4.1
	 */
	NullValueType nullValueType() default NullValueType.String;
}

枚举类FieldType

public enum FieldType {
	Auto, //
	Text, //
	Keyword, //
	Long, //
	Integer, //
	Short, //
	Byte, //
	Double, //
	Float, //
	Half_Float, //
	Scaled_Float, //
	Date, //
	Date_Nanos, //
	Boolean, //
	Binary, //
	Integer_Range, //
	Float_Range, //
	Long_Range, //
	Double_Range, //
	Date_Range, //
	Ip_Range, //
	Object, //
	Nested, //
	Ip, //
	TokenCount, //
	Percolator, //
	Flattened, //
	Search_As_You_Type, //
	/** @since 4.1 */
	Rank_Feature, //
	/** @since 4.1 */
	Rank_Features //
}

@Field注解使用

@Field作用在成员变量，标记为文档的字段，并制定映射属性；

@Id：作用在成员变量，标记一个字段为id主键；一般id字段或是域不需要存储也不需要分词；
type：字段的类型，取值是枚举，FieldType；
index：是否索引，布尔值类型，默认是true；
store：是否存储，布尔值类型，默认值是false；
analyzer：分词器名称

Keyword && Text 的区别

@Field(type = FieldType.Keyword)和 @Field(type = FieldType.Text)区别
在早期elasticsearch5.x之前的版本存储字符串只有string字段；但是在elasticsearch5.x之后的版本存储了Keyword和Text，都是存储字符串的。FieldType.Keyword存储字符串数据时，不会建立索引；而FieldType.Text在存储字符串数据的时候，会自动建立索引，也会占用部分空间资源。

@Field(store = true)

其实不管我们将store值设置为true或false，elasticsearch都会将该字段存储到Field域中；
但是他们的区别是什么？
（1）store = false时，默认设置；那么给字段只存储在"_source"的Field域中；
（2）store = true时，该字段的value会存储在一个跟_source平级的独立Field域中；同时也会存储在_source中，所以有两份拷贝。
那么我们在什么样的业务场景下使用store field功能？
（1）_source field在索引的mapping 中disable了。这种情况下，如果不将某个field定义成store=true，那些将无法在返回的查询结果中看到这个field。
（2）_source的内容非常大。这时候如果我们想要在返回的_source document中解释出某个field的值的话，开销会很大（当然你也可以定义source filtering将减少network overhead），比例某个document中保存的是一本书，所以document中可能有这些field: title, date, content。假如我们只是想查询书的title 跟date信息，而不需要解释整个_source（非常大），这个时候我们可以考虑将title, date这些field设置成store=true。

需要注意的是，看起来将field store可以减少查询的开销，但其实这样也会加大disk的访问频率。假如你将_source中的10个field都定义store，那么在你查询这些field的时候会将会有10次disk seek的操作。而返回_source只有一次disk seek的操作。所以这个也是我们在定义的时候需要blance的。

实体类代码

@Document(indexName = "item",type = "docs",shards = 1,replicas = 0)
public class Item {
    @Id
    private Long id;
    @Field(type = FieldType.Text,analyzer = "ik_max_word")
    private String title;
    @Field(type=FieldType.Keyword)
    private String category;
    @Field(type=FieldType.Keyword)
    private String brand;
    @Field(type=FieldType.Double)
    private Double price;
    @Field(index = false,type = FieldType.Keyword)
    private String images;
}

（1）@Field(index=true)表示是否索引，如果是索引表示该字段(或者叫域)能能够搜索。
（2）@Field(analyzer="ik_max_word",searchAnalyzer="ik_max_word")表示是否分词，如果是分词就会按照分词的单词搜索，如果不是分词就按照整体搜索。
（3）@Field(store=true)是否存储，也就是页面上显示。

目录CONTENT

elasticsearch笔记 - 实体类注解

说明