gist index related

1. Brief introduction

GiST (Generalized Search Tree) is a balanced, tree structured access method. It is equivalent to a basic template in the system, and it can be used to implement almost any indexing mode. B-tree, R-trees and many other indexing modes can all be implemented through GiSt. It can establish an extensible index structure, including the expansion of data types and query predicates.

Gist allows users (not database experts) to develop their own data types and use gist indexes on that data type through corresponding access methods. Generally, implementing a new index access method means a lot of hard work. Because you must understand the internal working mechanism of the database, such as the lock mechanism and write-ahead log.

The gist interface provides a high-level abstraction. As long as the access method implements the semantics of the accessed data type, the gist itself will handle the tasks of concurrency, logging, and search tree structure.

The source code of gist is distributed in the \src\backend\access\gist directory, including the source code of gist index creation, search, and deletion

2. The scalability of gist

pgsql supports standard search trees such as the expandable b-tree, but don't confuse the extensibility of gist with the extensibility of other standard search trees, such as the data types they can handle. For example, the b-tree index supports the creation of indexes for multiple data types, but only supports range queries

This means that pgsql can be used to build b-trees on multiple data types. But b-tree only supports range predicates (=><). Therefore, if pgsql's b-tree indexes an image set, can only the similar image a and the image b be equal? Or is image a smaller than image b? Or is image a larger than image b? Such a query. Then in these contexts, greater than or equal to less than may be meaningful or meaningless.

And using an opportunity gist index, you can create some methods to issue questions related to the field of the data type, such as: "find all horse images" or "find all exposed images"

Gist can build an extensible index structure, including the expansion of data types and query predicates. This structure supports developers to quickly develop indexing methods for new data types, and is characterized by expanding the predicate while expanding the data type.

For example, colors cannot be absolutely sorted, but you can define predicates such as red redthan blue and blue redthan green. When extending data predicates, add predicate redthan. Of course, the data type can also be a set of data such as extended to b-tree.

3. Index organization structure of gist

Gist is a balanced tree, except that the subtree of the root node is written between 2-M, the number of subtrees of each node is between k*M, and the constant k is called the minimum fill factor of the tree, which satisfies 2/M <=k<=1/2, M is the maximum number of index items that a node can hold.

The index entry form is (p, ptr), where p is the search predicate. In leaf nodes, ptr is a pointer to a tuple in the database; and in non-leaf nodes, ptr is a pointer to its subtree node.

A typical gist structure is shown in the figure:

In the figure above, SP1 and SP2 (subtree predicates) refer to the predicates that the user separates data. It can be seen that the structure of gist is similar to the structure of b-tree index.

Gist has built-in algorithms for querying, inserting, and deleting index items. Users can realize a specific index structure by defining index items and providing methods related to index item management. These methods include:

1) Consistent

2) Union

3) Same

4) Penalty

5) PickSplit

6) Compress

7) Decompress