springCloud integrates Elasticsearch's es related concepts

There are several basic concepts in ES: index (index), type (type), document (document), mapping (mapping) and so on.

The main concepts of ES data architecture (compared with relational database Mysql)

Write picture description here


(1) The database (DataBase) in the relational database is equivalent to the index (Index) in ES  
(2) There are N tables (Table) under a database, which is equivalent to 1 index with N multiple types ( Type),  
(3) The data under a database table (Table) is composed of multiple rows (ROW) and multiple columns (attributes), which is equivalent to a Type composed of multiple documents (Document) and multiple fields.
(4) In a relational database, the schema defines the tables, the fields of each table, and the relationship between the tables and the fields. Correspondingly, in ES: Mapping defines the field processing rules of the Type under the index, that is, how the index is created, the index type, whether to save the original indexed JSON document, whether to compress the original JSON document, whether it needs word segmentation processing, and how to perform word segmentation processing Wait.
(5) The operations of adding insert, deleting delete, changing update, and checking search in the database are equivalent to adding PUT/POST, deleting Delete, changing _update, and checking GET in ES.

Index:


Index is a logical storage of ES, corresponding to the library in the relational database, ES can store the index data in the server, or store it on multiple servers after sharding (sharding). Each index has one or more shards, and each shard can have multiple copies.

Type (type):


In ES, an index can store multiple objects for different purposes, and different objects in the index can be distinguished by type, corresponding to the concept of a table in a relational database. But since ES6.0, the concept of type was abandoned, and it was completely deleted in ES7. Reasons for deleting type:

We have always believed that the "index" in ES is similar to the "database" of a relational database, and the "type" is equivalent to a data table. The ES developers think this is a bad idea. For example: Two data representations in a relational database are independent, even if they have columns with the same name in them, it will not affect the use, but this is not the case in ES.

We all know that elasticsearch is a search engine developed based on Lucene, and files with the same name under different types in ES are ultimately processed in Lucene in the same way. For example, two user_names under two different types are actually considered to be the same filed under the same ES index. You must define different filed mappings in two different types. Otherwise, the same field names in different types will conflict in the processing, resulting in a decrease in Lucene processing efficiency.

Removing the type allows the data to be stored in a separate index, so that even if there are the same field names, there will be no conflicts, just like the first sentence in ElasticSearch, "You know, for searching...", remove the type It is to improve the efficiency of ES processing data.

In addition, storing entities with different numbers of fields under different types of the same index will cause sparse data in the storage, affect the ability of Lucene to compress documents, and reduce the efficiency of ES query

Document:


The main entity stored in the ES is called a document, which can be understood as a row of data records in a table in a relational database. Each document consists of multiple fields. The difference from relational databases is that ES is an unstructured database. Each document can have different fields and have a unique identifier.

Mapping:


Mapping is to define the index fields and their data types in the index library, similar to the table structure in a relational database. ES dynamically creates the mapping of indexes and index types by default, which is like in relational data. There is no need to define the table structure, let alone specify the data type of the field. Of course, you can also manually specify the mapping type.

ES cluster core concepts:

Near Real Time (NRT)  
Elasticsearch is a near real time search platform. This means that there is a slight delay (usually 1 second) from indexing a document until the document can be searched.


1. Cluster

Cluster architecture diagram:

Principles of Distributed Architecture:

The basic unit of storing data in es is the index
index: a table in mysql, there can be multiple types in an index, and
each type has a mapping. The mapping is the table structure of this type.
Then you build an index. This index can be split Divided into multiple shards, each shard stores part of the data.
The data of this shard actually has multiple backups, each shard has a primary shard, responsible for writing data,
but there are several replica shards. After the primary shard writes the data, it will synchronize the data to several other replica shards. (High availability)

ES can be used as an independent single search server. However, in order to process large data sets and achieve fault tolerance and high availability, ES can run on many servers that cooperate with each other. The collection of these servers is called a cluster.
An ES cluster is composed of multiple nodes, and each cluster has a common cluster name as an identifier

The name is "elasticsearch" by default. This name is important because a node can only join the cluster by specifying the name of a certain cluster. It is a good habit to explicitly set this name in the production environment, but it is also good to use the default value for testing/development.

2. Node


An es instance is a node, and a machine can have multiple nodes. Under normal use, each instance should be deployed on a different machine. The node type can be set in the ES configuration file through node.master and node.data

node.master: true/false indicates whether the node has the qualification to become the master node.
node.data: true/false indicates whether the
node is a combination of storing data node nodes:

Master node + data node: By default, the node can act as the master node and store data.
Data node: The node only stores data and does not participate in the election
of the master node. Client node: Will not become the master node and does not store data, mainly for mass Load balancing on request


3. Shard:


If the amount of our index data is large and exceeds the limit of the hardware to store a single file, it will affect the speed of query requests. ES introduces fragmentation technology. A shard itself is a completed search engine. Documents are stored in the shards, and the shards are distributed to each node in the cluster. As the cluster expands and shrinks, ES will automatically divide the shards among the nodes. Migrate between to ensure that the cluster can maintain a balance. Sharding has the following characteristics:

An ES index can contain multiple shards;
each shard is a smallest unit of work, carrying part of data;
each shard is a Lucene instance with a complete resume index and processing request the ability;
increase and decrease node, shard is automatically load-balanced nodes in;
a complete document can only be stored on a shard
number index contains a shard, the default value is 5, the value of this index can not be created fixed one.
Advantages: split horizontally and expand our stored content index; distribution and parallel cross-shard operations improve performance/throughput;
the number of replica shards associated with each shard, the default value is 1, this setting is always available at all times Can be modified.
2. Replica:
Replica shard is the redundant backup of shard, its main function:

Redundant backup to prevent data loss;
Responsible for fault tolerance and load balancing when the shard is abnormal;
ES features:
fast speed, easy expansion, flexibility, flexibility, simple operation, multi-language client, X-Pack, hadoop/spark strong cooperation, Out of the box.

Distributed: Horizontal expansion is very flexible.
Full-text search: Lucene-based powerful full-text search capabilities;
near real-time search and analysis: data enters ES, which can achieve near-real-time search and aggregate analysis.
High availability: fault-tolerant mechanism, automatic discovery of new Or failed nodes, reorganize and rebalance data
. Free mode: The dynamic mapping mechanism of ES can automatically detect the structure and type of the data, create an index and make the data searchable.
RESTful API: JSON + HTTP

es several important underlying principles related concepts

1. Elasticsearch's transparent and hidden features of complex distributed mechanisms

  • Fragmentation mechanism
  • shard copy
  • Cluster discovery mechanism
  • shard load balancing

2. Vertical expansion and horizontal expansion of Elasticsearch

  • Vertical expansion: The cost of purchasing more powerful servers is very high, and there will be bottlenecks. Assuming that the most powerful server capacity in the world is 10T, but when your total data volume reaches 5000T, how many powerful servers do you want to purchase? Server
  • Horizontal expansion: A scheme often used in the industry. More and more ordinary servers are purchased, and the performance is relatively average. However, if many ordinary servers are grouped together, they can form powerful computing and storage capabilities.

3. Distributed architecture with equal nodes

  • Node peering, each node can receive all requests
  • Automatic request routing
  • Response collection

4. Master node

  • Create or delete an index
  • Add or delete nodes

5. Optimistic lock concurrency control based on _version
\

es write data process

  1. The client selects a node to send the request, and this node is the coordinating node (coordinating node)
  2. coordinating node, route the document, and forward the request to the corresponding node
  3. The actual primary shard on the node processes the request and then synchronizes the data to the replica node
  4. Coordinating node, if it is found that the primary node and all replica nodes are settled, the request will be returned to the client

This route is simply the modulus algorithm. For example, there are 3 Tai servers now, and the id passed at this time is 5, then 5%3=2, and it is placed on the 2nd Tai server.

The underlying principle of writing data:

  1. The data is written to the buffer first, and the data in the buffer cannot be searched, and the data is written to the translog log file at the same time
  2. If the buffer is almost full, or after a period of time (timing), the buffer data will be refreshed to a new OS cache, and then every 1 second, the OS cache data will be written to the segment file , But if there is no new data in the buffer every second, a new empty segment file will be created. As long as the data in the buffer is refreshed to the OS cache, it means that the data can be searched. Of course, you can manually perform a refresh operation through restful api and Java api, that is, manually flush the data in the buffer into the OS cache, so that the data can be searched immediately, as long as the data is input into the OS cache, the buffer's The content will be emptied. At the same time, after the data is sent to the shard, the data will be written to the translog, and the data in the translog will be persisted to the disk every 5 seconds
  3. Repeat the above operations. Each time a piece of data is written to the buffer, a log will be written to the translog log file at the same time. The translog file will continue to grow larger, and when it reaches a certain level, the commit operation will be triggered.
  4. Write a commit point to the disk file, which identifies all the segment files corresponding to this commit point
  5. Forcibly fsync all the data in the OS cache to the disk file.
  6. Explanation: The role of translog: Before the commit is executed, all the data stays in the buffer or the OS cache. Both the buffer and the OS cache are memory. Once the machine dies, the data in the memory will be lost, so The operation corresponding to the data needs to be written into a special log request. Once the machine is down and restarted again, es will actively read the data in the log file in the translog and restore it to the memory buffer and OS cache. Among.
  7. Clear the existing translog file, and then restart a translog. At this time, even if the commit is successful, the default is to commit once every 30 minutes, but if the translog file is too large, it will also trigger the commit, the entire commit The process is called a flush operation. We can also manually perform the flush operation through the ES API, manually fsync the OS cache data to the disk, record a commit point, and clear the translog file
  8. Supplement: In fact, the translog data is also written to the OS cache first. By default, the data is refreshed to the hard disk every 5 seconds. That is to say, there may be 5 seconds of data that only stays in the buffer or translog file. In the OS cache, if the machine hangs at this time, it will lose 5 seconds of data, but this performance is better. We can also fsync each operation directly to the disk, but the performance will be worse.
  9. If the delete operation is performed, a .del file will be generated when committing, and a certain doc will be marked as a delete state, then when searching, it will know that the file has been deleted according to the state of the .del file.
  10. If the operation is updated, it means that the original doc is marked as the delete state, and then a piece of data can be rewritten.
  11. Each time the buffer is updated, a segment file file will be generated, so by default, a lot of segment file files will be generated, and the merge operation will be performed periodically
  12. During each merge, multiple segment file files will be merged into one, and the file marked as delete will be deleted at the same time, and then the new segment file will be written to the disk, where a commit point will be written to identify all The new segment file, and then open the new segment file for search.

es read data process

Query, GET a certain piece of data, write it into a certain document, this document will automatically assign you a globally unique ID, and follow this ID for hash routing to the corresponding primary shard. Of course, you can also manually set the ID.

  1. The client sends any request to any node and becomes a coordinate node
  2. The coordinate node routes the document and forwards the request to the corresponding node. At this time, the round-robin random rotation algorithm is used to randomly select one of the primary shard and all replicas to balance the load of read requests.
  3. The node that accepts the request, returns the document to coordinate note
  4. The coordinate node is returned to the client

es search data process

  1. The client sends a request to the coordinate node
  2. The coordinating node forwards the search request to the primary shard or replica shard corresponding to all shards
  3. Query phase: Each shard returns the results of its own search (in fact, some unique identifiers) to the coordinating node, and the coordinating node performs data merging, sorting, paging and other operations to produce the final result
  4. fetch phase, and then the coordinating node will go to each node to pull the data according to the unique identifier, and finally return it to the client

Low-level introduction of writing data

  1. The data is written to the buffer first, and the data in the buffer cannot be searched, and the data is written to the translog log file at the same time
  2. If the buffer is almost full, or after a period of time, the buffer data will be refreshed to a new OS cache, and then every 1 second, the OS cache data will be written to the segment file, but if Every second there is no new data in the buffer, a new empty segment file will be created. As long as the data in the buffer is refreshed to the OS cache, it means that the data can be searched. Of course, you can manually perform a refresh operation through restful api and Java api, that is, manually flush the data in the buffer into the OS cache, so that the data can be searched immediately, as long as the data is input into the OS cache, the buffer's The content will be emptied. At the same time, after the data is sent to the shard, the data will be written to the translog, and the data in the translog will be persisted to the disk every 5 seconds
  3. Repeat the above operations. Each time a piece of data is written to the buffer, a log will be written to the translog log file at the same time. The translog file will continue to grow larger, and when it reaches a certain level, the commit operation will be triggered.
  4. Write a commit point to the disk file, which identifies all the segment files corresponding to this commit point
  5. Forcibly fsync all the data in the OS cache to the disk file.
    Explanation: The role of translog: Before the commit is executed, all the data stays in the buffer or the OS cache. Both the buffer and the OS cache are memory. Once the machine dies, the data in the memory will be lost, so The operation corresponding to the data needs to be written into a special log request. Once the machine goes down and restarts again, es will actively read the data in the log file in the translog and restore it to the memory buffer and OS cache. Among.
  6. Clear the existing translog file, and then restart a translog. At this time, even if the commit is successful, the default is to commit once every 30 minutes, but if the translog file is too large, it will also trigger the commit, the entire commit The process is called a flush operation. We can also manually perform the flush operation through the ES API, manually fsync the OS cache data to the disk, record a commit point, and clear the translog file.
    Supplement: In fact, the translog data is also written to first In the OS cache, the data is refreshed to the hard disk every 5 seconds by default. That is to say, there may be 5 seconds of data that only stays in the OS cache of the buffer or translog file. If the machine hangs at this time, 5 seconds of data will be lost, but this performance is better. We can also fsync each operation directly to the disk, but the performance will be worse.
  7. If the delete operation is performed, a .del file will be generated when committing, and a certain doc will be marked as a delete state, then when searching, it will know that the file has been deleted according to the state of the .del file.
  8. If the operation is updated, it means that the original doc is marked as the delete state, and then a piece of data can be rewritten.
  9. Each time the buffer is updated, a segment file file will be generated, so by default, a lot of segment file files will be generated, and the merge operation will be performed periodically
  10. During each merge, multiple segment file files will be merged into one, and the file marked as delete will be deleted at the same time, and then the new segment file will be written to the disk, where a commit point will be written to identify all The new segment file, and then open the new segment file for search.

In short, the four core concepts of segment, refresh, flush, translog, merge

The underlying principle of search

The query process is roughly divided into two stages: query and retrieval. The query request is broadcast to all relevant shards, and their responses are integrated into a global sorted result set, which will be returned to the client.

Query phase

  1. When a node receives a search request, this node will become a coordinating node. The first step is to broadcast the request to the shard copy of each node in the search. The query request can be made by a primary shard or a secondary shard. Fragmentation processing, the coordinating node will rotate all the fragment copies in subsequent requests to share the load.
  2. Each shard will build a priority queue locally. If the client requests to return a result set of size starting from the from name in the result sorting, each node will generate a result set of from+size size, so it is preferred The size of the level queue is from+size, and the fragmentation just returns a lightweight result to the coordinating node, including the ID of each document in the result level and the information needed for sorting.
  3. The coordinating node will summarize all the results and perform a global sorting, and finally get the sorting result.

Value phase

  1. The sorting results obtained in the query process, and which documents at the mark meet the requirements. At this time, these documents still need to be retrieved and returned to the client
  2. The coordinating node will determine the actually required returned document, and send a get request to the shard containing the document, the document obtained by the shard is returned to the coordinating node, and the coordinating node returns the result to the client.

Inverted index

The inverted index establishes the mapping relationship between the word segmentation and the document. In the inverted index, the data is word-oriented rather than document-oriented.

How to improve efficiency in massive data


  1. The search engine of filesystem cache ES is heavily dependent on the underlying filesystem cache. If more memory is given to the filesystem cache, try to make the memory available for all index segment file index data files.
  2. Data warm-up
    For those data that you think is relatively hot, and data that is often accessed by people, it is best to make a special cache warm-up subsystem, that is, for hot data, you will visit the following in advance at regular intervals to let the data enter Go inside the filesystem cache, so that the performance will be better when you next visit.
  3. Hot and cold separation

Regarding ES performance optimization and data splitting, a large number of fields that cannot be searched are split into other storage. This is similar to the vertical split of MySQL's sub-database sub-table.

  1. document model design

Do not perform various complex operations when searching, try to complete it when writing the document model when designing the document model. In addition, for some complex operations, try to avoid them.

  1. Paging performance optimization

When turning the page, the deeper the turning, the more data returned by each shard, and the longer the processing time of the coordination node. Of course, scroll is used. Scroll will generate a snapshot of all data at once, and then turn the page each time All are done by moving the cursor. The api is just going back page by page