Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

Full text search and message queue middleware

In the previous chapters, we learned the basic knowledge and skills necessary to build a distributed system, such as the basic theory of distributed systems, network programming technology, RP architecture, memory computing distributed file system, distributed computing framework, etc. But it’s not enough to just master these contents. We also need to learn and master some middleware commonly used in distributed systems. These middleware are mainly used for some common business scenarios in distributed systems. Full-text data retrieval, log and message processing, and databases. Sharding, load balancing of the website, etc. Due to limited space, this chapter only gives a comprehensive introduction to two middleware, which are full-text search and message queuing, which are widely used and relatively complex.

We are accustomed to using online search to quickly learn knowledge and solve technical problems, which requires Internet search engines. How to accurately and quickly find all the web pages containing the keywords we searched for and display them in a reasonable order in the massive web page (text) information is indeed a very challenging problem.

In addition to the search engines we use in our daily work, a large number of Internet applications require keyword search (ie full-text search) functions. To understand the value of keyword search, we need to understand the limitations of relational database indexes. When we use query conditions like "%keyword%" in the SQL query, the database index does not work. At this point, the search becomes a traversal process similar to a page-by-page book, almost all of which are IO operations, so the negative impact on performance is great; if you need to perform fuzzy matching on multiple keywords, such as like"% keyword1%" and like "%keyword2%", the query efficiency at this time can be imagined.

Keyword retrieval is essentially to analyze the contents of a series of text files in units of "phrases (keywords)" and generate corresponding index records. The index stores the mapping relationship between keywords and articles. In the mapping relationship, key information such as the article number, number of occurrences, and frequency of the keywords are recorded in the mapping relationship, and even includes the starting position of the keywords in the article, so we have a chance See the query result page for the keyword "highlighted display".

The first step of keyword search is to segment the entire document (Document) to get every word in the text. This is not difficult for English, because the word B in an English sentence is separated by a space. Fu Dajie Li Kai B, but the characters and words in the ten-population sentence are two concepts, so Chinese word segmentation has become a big problem. For example, how does the Li. Beier female head segment the word? Is it "Beijing, Tiananmen" or " Beijing Tiananmen"? The best way to solve this problem is to combine the Chinese thesaurus with Chinese word segmentation. The well-known Chinese word segmentation methods are IK (IKAnalyzer) or Paoding (PaodingAnalyzcr), which is very convenient to use with open source Lucene.

Lucene

A well-known full-text search open source project in the Java ecosystem is Apache Lucene (hereinafter referred to as Lucene), which became an open source project of Apache in 2001. Lucene’s original contributor, Doug Cutting, is a senior expert in the field of full-text retrieval. He was once the main developer of the V-Twin search engine (one of the achievements of Apple’s Copland operating system). The purpose of his contribution to Lucene is to contribute to various Chinese Full-text search function is added to small applications. The Lucene-related open source projects currently officially maintained by Apache are as follows.

  • Lucene Core: A core class library written in Java that provides the underlying API and SDK of the full-text search function.
  • Solr: A high-performance search service developed based on Lucene Core. It provides a high-level encapsulation interface of RESTAPI and a web management interface.
  • PyLucene: A high imitation implementation of the Python version of Lucene Core.

In order to index a document, Lucene provides 5 basic classes, namely Document, Field, Index Writer, Analyzer and Directory. First, Document is used to describe any document to be searched, such as HTML pages, emails, or text files. We know that a document may have multiple attributes. For example, an email has attributes such as reception date, sender, recipient, email subject, and email content. Each attribute can be described by a Field object. In addition, we can think of a Document object as a record in the database, and each Field object is a field of this record. Second, before a Document can be queried, we need to segment the content of the document to find out the keywords contained in the document. This part of the work is implemented by the Analyzer object. Analyzer passes the content after word segmentation to IndexWriter for indexing. IndexWriter is one of the core classes used by Lucene to create an index (Index), which is used to add each Document object to the index and store the index object persistently in the Directory. Directory represents the storage location of the Lucene index. There are currently two implementations: the first is FSDirectory, which means storage in the file system; the second is RAMDirectory, which means storage in memory.

After understanding these classes needed to establish Lucene index, we can create an index on any document. The source code for indexing all text files in the specified file directory is given below:

//索引文件目录
Directory indexDir = FSDirectory.open (Paths.get ("index-dir"));Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);IndexWriter indexWriter = new Indexwriter (indexDir, config);//需要被索引的文件目录
String dataDir=". ";
File[] dataFiles = new File(dataDir).listFiles();long startTime - new Date() .getTime();
for(int i= 0; i<dataFiles.length; i++){
if(dataFiles[i].isFile () && dataFiles[i].getName ().endsWith(".txt"))(
System .out.println ("Indexing file"+
dataFiles[i]-getCanonicalPath());
Reader txtReader = new FileReader (dataFiles[i]);Document doc - new Document();
//文档的文件名也被作为一个Field,从而定位到具体的文件
doc.add(new StringField("filename", dataFiles[i].getName (),
Field.store.YES));
doc.add(new TextField("body", txtReader));indexwriter.addDocument (doc);
}
}
indexWriter.close();
long endTime - new Date( .getTime();
System.out.println("It takes " +(endTime - startTime)
+ " milliseconds to create index for the files in directory "+dataDir);

You can put any text containing English sentences (such as English lyrics) in the root directory of the project, and run the above program to complete the index creation process. If everything is normal, a prompt similar to the following will appear:

Indexing file D:\project\leader-study-search\lemon-tree.txt
It takes 337 milliseconds to create index for the files in directory .

Next, we try to query keywords, query content (corresponding to the body field) including all documents with "good" and output the results. To this end, we first need to open the specified index file, then construct the Query object and execute the query logic, and finally output the query results. The following is the corresponding source code:

//打开指定的索引文件
Directory indexDir = FSDirectory.open (Paths.get("index-dir"));IndexReader reader = DirectoryReader.open (indexDir);
IndexSearcher searcher = new IndexSearcher(reader);//查询
String querystr = "good";
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser ( "body", analyzer);Query a- parser.parse(querystr);
int hitsPerPage = 10;
TopDocs docs=searcher.search(q, hitsPerPage);ScoreDoc[] hits = docs.scoreDocs;
//输出查询结果
System.out.println("Found " +hits.length + " hits.");for (int i =0;i< hits.length; ++i){
int docid = hits[i] .doc;
Document d = searcher.doc(docId);
System.out.println((i +1)+ "." +d.get ("filename "));
}

If the keyword you search happens to be in a text file, run this code, and the console will output something similar to the following:

Found 1 hits.1. lemon-tree.txt

Through the study of the above examples, we have initially mastered the basic usage of Lucene. The entire flow of Lucene programming is shown in the figure below.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

The entire process of Lucene programming can be summarized as the following three independent steps.

  • Modeling: According to the structure and information of the indexed document (original document), model the corresponding Document object and related Lucene index fields (there can be multiple indexes). This step is similar to database modeling. One of the key points is to determine what information in the original document needs to be stored in the Document object as a Field. Usually the ID or full path file name of the document is reserved (Field.Store.YES) so that the user can view or download the original document after the result is retrieved.
  • Inclusion: Write a program to scan each target document to be retrieved, convert it into a corresponding Document object and create a related index, and finally store it in Lucene's index warehouse (Directory). This step can be compared to initializing data (importing data in batches).
  • Retrieval: Use Lucene API similar to SQL query to write our full-text retrieval conditions, query the eligible Documents from Lucene's index warehouse and output them to users. This step is completely similar to SQL query.

Lucene is also generally combined with web crawler technology to provide full-text search functions based on Internet resources. For example, there are many information websites that provide commodity price comparison and optimal shopping. It is entered into the Lucene index library, and then provides user retrieval services. The following is a typical architecture diagram of this type of system.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

Solr

If you compare Lucene with MySQL, you will find that Lucene is like a MySQL storage engine, such as InnoDB or MyISAM. Lucene only provides basic full-text search-related APIs. It is not an independent middleware. The functions are not rich enough, and the APIs are relatively complicated and not easy to use. In addition, Lucene also lacks a more critical feature-distributed, when the number of documents we want to retrieve is particularly large, we will inevitably encounter the bottleneck of downtime, so with SolrElasticSearch, they are all based on Lucene functions Rich distributed full-text search middleware.

The following is a schematic diagram of Solr's architecture. We have seen that Solr has developed many enterprise-level enhancements on the basis of Lucene: it provides a powerful Data Schema to facilitate users to define the structure of the document; it adds efficient and flexible caching functions; it adds a Web-based management interface to Provides centralized configuration management functions; Solr index data can be sharded and stored on multiple nodes, and the reliability of the system can be improved through multi-copy replication.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

Solr's distributed cluster mode is also called SolrCloud, which is a very flexible distributed indexing and retrieval system. SolrCloud is also a distributed cluster with a decentralized idea. There is no special Master node in the cluster, but it relies on ZooKeeper to coordinate the cluster. An index data (Collection) in SolrCloud can be divided into multiple shards and stored on different nodes (Solr Core or Core). While indexing data is sharded, SolrCloud can also implement shard replication ( Replication) function to improve the availability of the cluster. All state information of the SolrCloud cluster is maintained in ZooKeeper. When the client accesses the SolrCloud cluster, it must first query ZooKeeper for the address list of the Core node where the index data (Collection) is located, and then connect to any Core node. Complete all index operations (CRUD).

As shown in the figure below, a SolrCloud reference deployment plan is given. The index data (Collection) in this plan is divided into two shards. At the same time, each shard shard has 3 copies of data, one of which is located on the Core node. It is called Leader, and the other two Core nodes are called Replica. All index data is distributed in 8 Cores, and they are located on 3 independent servers, so any machine downtime will not affect the availability of the system. If a server goes down during operation, SolrCloud will automatically trigger Leader's re-election behavior, which is achieved through the distributed lock function provided by ZooKeeper.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

As mentioned before, each shard in SolrCloud is composed of a Leader and N Replicas, and the client can connect to any Core node to perform index data operations, then, how to implement index data at this time How about multi-copy synchronization? The following figure shows the answer behind it.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

If the core connected by the client is not the leader, this node will forward the request to the leader node of the shard where it is located.

Leader will route the data (Document) to each Replica node of the shard. If the document fragment routing rules calculate that the target shard fragment is another fragment, the leader will forward the data to the leader node corresponding to the fragment for processing.

Next, let’s talk about another important issue. What kind of algorithm does SolrCloud use to index data shards? In order to choose a suitable sharding algorithm, SolrCloud puts forward the following two key requirements.

(1) The calculation speed of the fragmentation algorithm must be fast, because the fragmentation algorithm is frequently used in the process of indexing and accessing the index.

(2) The sharding algorithm must ensure that the index data can be evenly distributed to each shard. SolrCloud's query is a process of dividing first and then totaling. If the number of indexed documents in a shard is much larger than other shards , Then it takes significantly more time to query this shard than other shards, that is to say, the query speed of the slowest shard determines the overall query speed.

If you feel that your learning efficiency is low and you lack correct guidance, you can join a technical circle with rich resources and a strong learning atmosphere to learn and communicate together!
[Java Architecture Group]
There are many front-line technical experts in the group, as well as code farmers who are struggling in small factories or outsourcing companies. We are committed to creating an equal and high-quality JAVA communication circle. It may not be possible to let everyone in the short term. Technology is advancing by leaps and bounds, but in the long run, vision, pattern, and long-term development direction are the most important.

Based on the above two points, SolrCloud chose a consistent hash algorithm to implement index sharding.

At the end of this section, we will talk about the advanced feature of "near real-time search" supported by SolrCloud. Near real-time search is to make the newly added Document visible and checkable in a short period of time, which is mainly based on Solr's Soft Commit mechanism. As mentioned in Section 8.1.2, when Lucene creates an index, data is written to disk when submitted. This is Hard Commit, which ensures that data will not be lost even if the power fails, but it will increase the delay. At the same time, for the previously opened Searcher, the newly added Document is also invisible. In order to provide more real-time retrieval capabilities, Solr provides a new mode of Soft Commit. In this mode, only the data is submitted to the memory. At this time, it is not written to the disk index file, but the index is visible and Solr will open The new Searcher thus makes the new Document visible. At the same time, Solr will preheat the cache and query so that the cached data is also visible. In order to ensure that the data will eventually be persisted to disk, Hard Commit can be triggered automatically every 1~10 minutes and Soft Commit can be triggered automatically every second. Soft Commit is also a double-edged sword. On the one hand, the more frequent the Commit, the real-time query The higher the performance, but at the same time increase the load of Solr, because the more frequent the Commit, the smaller and more index segments (Segments) will be generated, so the operation of Solr Merge will be more frequent. In actual projects, it is recommended to determine the frequency of Soft Commit based on business needs and tolerance.

ElasticSearch

ElasticSearch (hereafter referred to as ES) is not produced by Apache. It is similar to Solr and is also a distributed indexing service middleware based on Lucene. ES appeared later than Solr, but judging from the current development status, its momentum and popularity are much higher than its predecessors. It is worth mentioning that in the field of log analysis, the ELK Stack with ES as the core has become the de facto standard. ELK is actually not a piece of software, but a complete set of solutions. It is an acronym for the acronyms of ES, Logstash and Kibana. These three softwares are all open source software, usually used in conjunction, and successively belonged to the name of Elastic.co, so they are referred to as ELK Stack for short. An article on Google mentioned that ELK Stack has reached 500,000 downloads per month and has become the most popular log management platform in the world. On the current popular PaaS platform based on Docker and Kubernetes, ELK is also standard. One, the reason why ES, which is not produced by Apache, can come from behind is also inextricably linked to the popularity and influence of ELK.

In fact, the log module is the most needed full-text search in all distributed systems. If you try to troubleshoot a distributed system with more than 5 nodes, you will understand the importance and urgency of centralized collection of logs and full-text search functions. In the absence of a log subsystem like ELK Stack, we have to log in to each host to query the log, and "splice" all related query results to locate and analyze the link and the cause and effect of the failure. The work does not seem complicated, but it is actually very energy-consuming, because there may be multiple log files to be analyzed on each host. Just locating the logs at a certain point in time is a headache.

The following is an architectural composition diagram of ELK Stack. Logstash is a data collection engine with real-time pipeline capabilities. It is used to collect log data and write it as index data to the ES cluster. We can also develop a custom log collection probe and write it to the ES cluster in accordance with ELK's log index format; Kibana provides ES with a web platform for data analysis and data visualization. It can search for data in the ES index and generate tables of various dimensions.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

ES hides the complexity of Lucene through a simple RESTful API, thus making full-text search simple. It provides near real-time indexing, search, analysis and other functions. We can understand and describe ES in this way.

  • Distributed real-time document storage, each field in the document can be indexed and searched.
  • Distributed real-time analysis search engine.
  • It can be extended to hundreds of servers to process PB-level structured or unstructured data.

The concept of Type is added to ES. If we compare Index to Database, Type is equivalent to Table, but this analogy is not very appropriate, because we know that the structure of different Tables is completely different, and the structure of all Documents in an Index is height Consistent. The Type in ES is actually a special field in Document, which is used to filter different documents when querying. For example, when we are building a B2C e-commerce platform, we need to index the products of each store, and we can use Type to distinguish Different shops. In fact, there are very few usage scenarios for Type, which is something we need to pay attention to.

Like SolrCloud, ES is also a distributed system, but instead of using ZooKeeper as the coordinator of the cluster, ES has implemented a set of modules called Zen Discovery itself, which is mainly responsible for the automatic discovery of nodes in the cluster and Master nodes Elections. The master node maintains the global state of the cluster, such as redistributing shards when nodes join and leave, and the nodes of the cluster use P2P for direct communication, and there is no single point of failure. One of the advantages of ES not using ZooKeeper is that system deployment and operation and maintenance are simpler, but the disadvantage is that the so-called split-brain problem may occur. To prevent the split-brain problem, one of the parameters we need to pay attention to is
discovery.zen.minimum_master_nodes, which determines how many nodes need to communicate in the process of electing the Master node. A basic principle is that it needs to be set to N/2+1. N is the number of nodes in the cluster. For example, in a three-node cluster, minimum_master_nodes should be set to 3/2+1 = 2 (rounded up). When the communication between the two nodes fails, node 1 will lose its master state, and node 2 will not be elected As the Master node, no node will receive index or search requests, and no shard will be in an inconsistent state. ES has made many improvements to the Zen Discovery algorithm to solve the split-brain problem. The issue on GitHub about split-brain was later closed in 2014. The following is the relevant description:

2. Zen Discovery
Pinging after master loss (no local elects)Fixes the split brain issue:#2488
Batching join requests
More resilient joining process (wait on a publish from master)

There is another major difference between the ES cluster and SolrCloud, that is, there are more than one type of nodes in the ES cluster, and there are the following types.

  • Master node: It is eligible to be selected as the master node and control the entire cluster.
  • Data node: This node saves index data and performs related operations, such as addition, deletion, modification, search, and aggregation.
  • Load balance node: This node can only process routing requests, process search and distribute index operations, etc. In essence, the performance of this node is equivalent to an intelligent load balancer. Load balance nodes are very useful in a larger cluster. Load balance nodes can get the status of the cluster after joining the cluster, and can directly route requests according to the status of the cluster.
  • Tribe node: This is a special Load balance node that can connect to multiple clusters and perform search and other operations on all connected clusters.
  • Ingest node: It is a new node type of ES 5.0. This node greatly simplifies the complexity of adding data in the previous ES cluster.

The following is an ELK deployment scheme in which Tribe nodes connect to multiple ES clusters to display logs. It is said that Meizu has adopted this scheme to solve the problem of centralized display of logs in each IDC computer room.

Architectural decryption from distributed to microservices: a comprehensive explanation of full-text retrieval middleware

At the end of this section, we will install ES together and write some simple examples to deepen our understanding of ES and initially grasp the usage of its API. We can go to the ES official website to download the binary version of ES (you can download the ZIP package when running on Windows). After decompression, there are executable scripts in the bin directory, such as elasticsearch.bat. After executing the startup script, visit http://localhost:9200 in the browser, and the following information will be displayed, indicating that the ES has started normally:

{
"name" : "Y8 klCx",
"cluster name" :"elasticsearch",
"cluster uuid" :"X3tmO4iXSKa8l_ ADWNh_g","version" :{
"number" :"5.3.0",
"build hash" :"3adb13b",
"build date" : "2017-03-23T03:31:50.652Z","build snapshot" :false,
"lucene version" :"6.4.1",
},
"tagline" :"You Know, for Search"
}

ES provides a REST interface to allow us to easily add a Document to the index. In order to learn this API first, we need to know that a Document in ES is uniquely determined by the following 3 fields.

  • _index: The index where the document is located.
  • _type: The type of document.
  • _id: The string ID of the document, which can be specified when inserting the document, or it can be randomly generated by ES.

Now it is easy for us to understand the writing of the URL of Document CRUD's REST interface: http:/localhost:9200///[].

In addition, a Document in ES is represented by a JSON structure. Since the JSON structure itself has the hint of the field type, such as the properties of strings and numbers are expressed in different ways, ES can implement JSO to Document Schema The reason why ES is called a Schema-less system. But if Schema-less i enters the mapping of the field type you want, if you are not satisfied with the configured schema, you can also use a custom mapping (MappTgOnNo Schema). If you are not satisfied with the schema automatically matched by ES, you can also You can use custom mapping (Mapping) to design a more reasonable Schema.

Starting from version 5.0, ES has developed a brand new Java client API. The biggest goal of this API is to remove the dependency on ES and Lucene class libraries and become more lightweight. At the same time, it adopts the idea of ​​layered design. It only includes an HTTP communication layer and a Sniffer for discovering other nodes in the cluster. The other layers include functions such as Query DSL. In this section, we will use this new Java API to complete Document operations.

We need to reference this API in Maven:

<dependency>
<groupId>org.elasticsearch.client</groupId><artifactId>sniffer</artifactId>
<version>5.3.0</version>
</dependency>

The function of the following piece of code is to obtain the health information of ES, similar to the address we access in the browser
http://localhost:9200/_cluster/health:

RestClient client = RestClient
.builder(new HttpHost("localhost",9200)).build();
Response response = client.performRequest(
"GET", "/ cluster/health", Collections.singletonMap ("pretty"
"true"));
HttpEntity entity =response.getEntity();
System. out.println(EntityUtils.tostring(entity));

After running, it will output the following section of content, where the status attribute is more important, and green indicates that the cluster is healthy:

"cluster name" :"elasticsearch",
"status" :"green",
"timed out" : false,"number of nodes" :1,
"number of_data_nodes" :1,
"active_primary_shards" :0,"active shards" :0,
"relocating_shards" :0,"initializing_shards" :0,"unassigned shards" :0,
"delayed unassigned_shards" :0,"number of pending tasks" :0,"number of in_flight_fetch" :0,
"task max _waiting_in_queue_millis" :0,"active_shards percent as_number" :100.0
}

Next, we insert a Document in the Index named blogs, the type of Document is blogId is 1. Below is the JSON content of Document:

{
"user" : "Leader us",
"post date" :"2017-12-12",
"message" : "Mycat 2.0 is coming ! "
}

The corresponding code is as follows:

//index a document
String docJson= "{An"+
" "user\ " :\"Leader us\ ",n"+
" "post datel ": "2017-12-12\",n" +
" \"messaael" :\ "Mycat 2.0 is coming!\"\n"+"}";
System. out.println(docJson);
entity = new NStringEntity(docJson,ContentType.APPLICATION_JSON);Response indexResponse = client.performRequest("PUT","/blogs/blog/1",
Collections.singletonMap ( "pretty", "true"), entity);
System.out.println (Entityotils.tostring (indexResponse.getEntity()));

After running the above code, an index document is successfully inserted in ES, and the console will output the following information:

{
" index" :"blogs",
"_type" : "blog","_id" :"1",
"_version" :1,"result" : "created"," _shards" :{
"total" :2,
"successful" :1,
"failed" :0
},
"created" : true
}

If you run the above code for the second time, the result value in the output information will change from created to updated, indicating that it is an operation to update the Document, and _version will accumulate at the same time. We use the following code to continue to add 100 documents for testing:

String[] products= {"Mycat ", "Mydog", "Mybear","MyAllice"};
for (int i=0;i<100;i++)
{
//index a document
String docJson= "{\n"+
"\ "user\ ":\ "Leader us ",\n" +
\"post_date\" :\""+(2017+i)+"-12-12\",\n"+
"\ "messagel" :\""+products[i%products.length] +i+" is
coming! "\n" +
"]";
HttpEntity entity = new NStringEntity(docJson, ContentType.APPLICATIONM_ JSOtResponse indexResponse = client.performRequest ("PUT","/blos/blog/"+i,
Collections.singletonMap ("pretty", "true"),
entity);
System.out.println(EntityUtils.toString(indexResponse.getEntity()));
}
此时打开浏览器,输入查询指令:
http://localhost:9200/blogs/blog/_search?pretty=true

The following query information will appear:

{
"took" :14,
"timed_ out" :false," shards" :{
"total":5,
"successful" :5,"failed" :0
),
"hits" :{
"total" :100,
"max_score" : 1.0,"hits" :[
{
" index" :"blogs"," type" :"blog",
"id" : "19"," score" :1.0," source":{
"user" : "Leader us",
"postdate" :"2036-12-12",
"message" :"MyAllice19 is coming ! "
},
{
" index" : "blogs","type" : "blog" ," id" :"22"," score" :1.0," source" :{
"user":"Leader us",
"post date" :"2039-12-12",
"message" :"Mybear22 is coming!"
}
},

According to the above information, we know that the number of shards in the index of blogs is 5, the hits part is a list of documents that match the query conditions, there are a total of 100 documents that meet the conditions, and the _source part is the original document information we entered. If you query the content of a particular Document, you only need to specify the document ID in the URL, such as
http://localhost:9200/blogs/blog/1.

If we want to query documents that contain a certain keyword, what should we do? ES provides a Query DSL syntax, which is described in JSON format and is also more convenient to use. For example, the following DSL statement indicates that the query contains any field value Document of the keyword mycat:

{
"query":{
"query string": {
" query" :"mycat"
}
}
}

We only need to post the above DSL as JSON content to the URL address of a certain index, the following is the specific code:

//search document
String dsl- "{"query":{"+
"\ " query string \": {"+
" "auery ": \ "mycat\""+
"] ]}";
System.out.println (dsl);
HttpEntity entity = new NStringEntity(dsl,
ContentType.APPLICATION_JSON);
Response response =
client.performRequest("POST", "/blogs/blog/search",
Collections.singletonMap ("pretty", "true"),entity);
System.out.println (EntityUtils.toString (response.getEntity()));

I won't discuss other programming content of ES here in depth. We mainly explain the syntax details of Query DSL, such as highlighting matching results, filtering query results, controlling result set caching, and joint query.

Write at the end

I recently compiled a complete set of "Summary of JAVA Core Knowledge Points". To be honest, as a Java programmer, whether you need an interview or not, you should take a good look at this information. There is always no loss if you get your hands~ Many of my fans have also gotten Tencent Byte Kuaishou offer, click the picture below ↓ to receive it directly , the above is the whole content of this article, if you feel that you have gained, remember Sanlian, we See you next time.