[Redis in-depth study] Still talking about the five data structures that everyone knows? Let’s take a look at how the bottom layer is implemented. It’s definitely hard-core

Speaking of redis, the five data structures are almost well-known things. This is an important feature that distinguishes memcached from redis. So how does the bottom layer of redis implement these five data structures? There is a lot of knowledge in it, and with the blogger, it is definitely worth your time to collect and take a closer look.

Insert picture description here


4D long text takes you enough to eat the underlying structure

One: First, we first understand a few pre-knowledge

1. All the underlying main data structures in redis

Simple dynamic string SDS, double-ended linked list, dictionary, compressed list, integer set, etc.

2. Redis uses object storage key-value pairs

We must understand that both the key and the value in Redis are stored by objects. Whenever we add a key-value pair to the Redis database, at least two objects will be created at the bottom. The following is the C language for the objects in Redis Definition, students who are familiar with the C language may see that this is the basic structure

	typedef struct redisObject{
		// 类型
		unsigned type:4;
		// 编码
		unsigned encoding:4;
		// 指向底层实现数据结构的指针
		void *ptr;
	}robj;

About what is type and what is encoding, let’s first give a brief introduction

Insert picture description here
  • Type gets the five basic types of redis, namely string, set, zset, list, hash
  • Encoding obtains the basic structure that implements the five basic types, embstr here is one of them, which will be described in detail later, redis uses object encodinginstructions to obtain the corresponding encoding

3. The underlying structure corresponding to the five types

Insert picture description here

Each type corresponds to at least two underlying structures. Depending on the amount of data and various performance trade-offs, each data structure of redis may switch the corresponding underlying structure, which is the so-called "coding conversion". Let me give you a brief example. To help everyone understand

  • For the String type, the integer encoding type that can be saved with the long type is int
  • Floating point number that can be saved in long double type, encoding type is embstr or raw
  • String value, or integer that cannot be represented by long because the length is too long, or floating-point number that cannot be represented by long double, encoding type is embstr or raw

Second: Walk into the deep water area and formally enter the underlying structure

Insert picture description here

1. Simple dynamic string (SDS)

Redis is written in C language, so for the processing of strings, Redis retains some of the characteristics of C, and at the same time carries out a package upgrade.More suitable for the performance pursued by Redis is king

Insert picture description here


In a simple instruction, the key of the key-value pair is a string object, and the underlying implementation of the object stores the SDS of the string "k1", and the value of the key-value pair is the same.

A. Definition of SDS

The code of the underlying C language

struct sdshdr{
    // 记录buf数组中已经使用的子节的数量
    int len;
    // buf数组中未使用的子节数量
    int free;
    // 子节数组,用于保存字符串
    char buf[]
}

Assuming that a "Redis" SDS object is stored, the schematic diagram of the underlying structure is as follows

Insert picture description here
  • The value of len is 5, excluding the end character'\0'
  • The value of free is 1.'\0' occupies a subsection
  • Save'\0' to call library functions such as printf at the bottom of the c language, without repeatedly making wheels
A. The advantages of SDS compared to c string
  • Since the len field is stored in SDS, the length can be obtained within the complexity of o(1), but the original c string must be traversed within the complexity of o(n) to obtain the length, ensuring that the string is obtained The length operation will not affect the performance, even if it is to obtain the length of a very long string, exchange the event with space
  • To prevent buffer overflow, friends who are familiar with the C language must understand that strcat(char *dest,char *src)for string manipulation functions, you need to ensure that dest can store the spliced ​​string. If you don’t pay attention to it, buffer overflow errors may occur, but SDS will first Check if there is enough space for dynamic adjustment
  • Reduce the number of memory redistributions caused by string changes. For C language, if the append splicing operation exceeds the length of the buffer area, you need to apply for memory space again, and the operation of shortening the string similar to trim needs to be released For unused space, SDS optimizes the strategy through the free field above
  • ①. Space pre-allocation, if you want to expand the space, SDS will not only allocate the necessary space, but also allocate the unused space, free is used to record the length of the unused space, if the length of the modified SDS is less than 1M, it will be allocated The same unused space as len, if it is greater than 1M after modification, 1M of unused space will be allocated
  • ②. Inert space release. If the SDS is shortened, the extra subsections will not be reclaimed immediately, but free to record the reclaimed space, which is convenient for the next use. For example, it may come in handy during the growth operation.
B. Support binary security

For the C language, the'\0' character cannot appear in the middle of the string, otherwise it will be mistaken for the end of the string, so the c string can only store text files, not pictures, etc., but the buf array of Redis does not store Characters are a series of binary data, which is why buf is called a subsection array, so there is no problem with SDS storing special data, which is the binary security of the stomach lock.

C. A picture comparing the string of SDS and C
Insert picture description here

2. Linked list

The linked list at the bottom of Redis is not much different from the linked list we usually use. Here we only talk about some of the features in Redis.

  • Redis bottom-level linked list is a double-ended ringless linked list
  • Since the linked list is not set to a specific type, the Redis linked list can store a variety of different types
  • Linked lists are widely used to implement various functions of Redis, such as list keys, publish and subscribe, slow queries, monitors, etc.

3. Jump table

The jump table is an ordered data structure, one of the underlying structure of Zset, the average complexity of the node lookup of the jump table is o(logn), the worst is o(N), the purpose is to speed up the increase of the ordered set Delete, modify, and check. In order not to make the article too long, you can read an article by the blogger to explain the jump list in detail, and you can understand it in one article.

This article takes you to understand the underlying structure of Redis-the jump table

4. Compress the list

When the amount of list item data you add to redis is relatively small, or each list item is either a small integer value or a string with a relatively short length, then in order to save space and make the storage more compact, Redis will use compression List, List, Zset, Hash bottom layer all use compressed list

A. Analysis of the components of the compressed list
Insert picture description here

Therefore, for the entire compressed list, the storage space is very compact, and there is no waste of space, and the number of nodes in the compressed list is obtained by attributes such as zltail, zllen, and the length complexity is controlled within o(1)

B. The structure of each entry
struct entry{

        prev_entry_length;
        
        encoding;
        
        contents;
}
  • prev_entry_length: Since the bottom layer is subsection storage, it is used to encode the length of the pre-node, which is used to traverse from back to front.
  • If the length of the front node is less than 254 bytes, then 1 byte is used to save the length value
  • If the length of the front node is greater than 254 bytes, 5 bytes are used to store the length value. Among them, the first byte is set to 0xFE (254) to indicate that the length is greater than 254 bytes, and the following four Bytes are used to store the length value of the preceding node
  • encoding: encoding attribute, here is not the five basic types, but the underlying implementation type. The nodes of the ziplist can store string values ​​and integer values. The encoding attributes of the two are listed below.
  • (1) Save the value of the string: essentially a byte array, if the node saves the string value, then the encoding size may be 1 byte, 2 bytes or 5 bytes, which is related to the length of the string related. The first two digits of the coding part are 00, 01 or 10, which correspond to the above three sizes respectively, and the following digits represent the length value.
  • (2) The node saves an integer value: If the node saves an integer value, its encoding length is fixed at 1 byte, and the first two digits of the byte are fixed at 11, which is used to indicate that the node saves an integer value. Here is also a table to illustrate.
  • contents: Responsible for saving the value of the node
C: Basic operations on compressed tables

Since Redis is implemented in pure C language, so the implementation of specific methods are based on C language, let's take a look at the insertion operation

static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; // 当前长度和插入节点后需要的长度
    unsigned int prevlensize, prevlen = 0; // 前置节点长度和编码该长度值所需的长度
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; // 为了避免警告,初始化其值
    zlentry tail;

    // 找出待插入节点的前置节点长度
    // 如果p[0]不指向列表末端,说明列表非空,并且p指向其中一个节点
    if (p[0] != ZIP_END) {
        // 解码前置节点p的长度和编码该长度需要的字节
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else {
        // 如果p指向列表末端,表示列表为空
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        
        if (ptail[0] != ZIP_END) {
            // 计算尾节点的长度
            prevlen = zipRawEntryLength(ptail);
        }

    }

    // 判断是否能够编码为整数
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        // 该节点已经编码为整数,通过encoding来获取编码长度
        reqlen = zipIntSize(encoding);
    } else {
        // 采用字符串来编码该节点
        reqlen = slen;
    }
    // 获取前置节点的编码长度
    reqlen += zipPrevEncodeLength(NULL,prevlen);
    // 获取当前节点的编码长度
    reqlen += zipEncodeLength(NULL,encoding,slen);

    // 只要不是插入到列表的末端,都需要判断当前p所指向的节点header是否能存放新节点的长度编码
    // nextdiff保存新旧编码之间的字节大小差,如果这个值大于0
    // 那就说明当前p指向的节点的header进行扩展
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;

    // 存储p相对于列表zl的偏移地址
    offset = p-zl;
    // 重新分配空间,curlen当前列表的长度
    // reqlen 新节点的全部长度
    // nextdiff 新节点的后继节点扩展header的长度
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    // 重新获取p的值
    p = zl+offset;

    // 非表尾插入,需要重新计算表尾的偏移量
    if (p[0] != ZIP_END) {
        // 移动现有元素,为新元素的插入提供空间
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        // p+reqlen为新节点前置节点移动后的位置,将新节点的长度编码至前置节点
        zipPrevEncodeLength(p+reqlen,reqlen);

        // 更新列表尾相对于表头的偏移量,将新节点的长度算上
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        // 如果新节点后面有多个节点,那么表尾的偏移量需要算上nextdiff的值
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        // 表尾插入,直接计算偏移量
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    // 当nextdiff不为0时,表示需要新节点的后继节点对头部进行扩展
    if (nextdiff != 0) {
        offset = p-zl;
        // 需要对p所指向的机电header进行扩展更新
        // 有可能会引起连锁更新
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    // 将新节点前置节点的长度写入新节点的header
    p += zipPrevEncodeLength(p,prevlen);
    // 将新节点的值长度写入新节点的header
    p += zipEncodeLength(p,encoding,slen);
    // 写入节点值
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }
    // 更新列表节点计数
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

In simple terms, it is almost the same as the ordinary insertion operation. It is basically to find the insertion position and modify the node information. It is nothing more than the realization of the details. For the study of the source code, we should not dwell on the details, we only need to have a macro control.

D: Chain update

What is a chain update? As mentioned above, each node has an prev_entry_lengthattribute to store the length of the previous node. The length is 1 subsection or 5 subsections to store the length of the previous node.

Now suppose such a situation

If there are multiple consecutive nodes, the length is solved in the nodes from 250 to 253 subsections, which means that these nodes prev_entry_lengthonly need one subsection for storage. If you want to insert a length greater than 254 in the head of these nodes at this time The node of the section, then prev_entry_lengthit needs to be expanded to 5 subsections, and after adding 5 subsections, it is greater than 254 subsections, so the latter node also needs to be expanded, prev_entry_lengthleading to a series of expansion operations

Similarly, if the node is deleted, there may also be continuous compression, which is cascaded update

But don't worryIn fact, it is not common that there are multiple nodes in the compression list that meet the chain update. Even if the number of nodes is small, there will be no essential impact on the performance, so there is no need to worry about the chain update affecting the compression. List performance

E: Summary
  • The compressed list is a sequential data structure used to save memory development
  • Compressed list as one of the underlying implementations of list keys and hash keys
  • The compressed list can contain multiple values, and each node can be used to store an array of subsections or integer values
  • Adding or deleting may cause chain updates, but don’t worry about the impact on performance

5. Dictionary structure

The dictionary structure is complex. It is the underlying structure of the entire key-value database of redis. It is also the underlying structure of set, Zset, and Hash. It mainly involves the rehash process and the implementation mechanism of the dictionary underlying structure Hash. Due to space reasons, you can jump to the blog. The blog specially written by the Lord: Portal , absolutely easy to understand

6. Object encoding conversion

As mentioned above, all structures in Redis are objects, so the five data structures can be divided into five major objects: string objects, list objects, hash objects, collection objects, ordered collection objects, and the corresponding structure of each object The text has also been explained in detail

The bottom layer of each object corresponds to at least two encodings. Redis decides which data structure to use at the bottom based on multiple aspects of space and performance, that isEncoding conversion, And then make a big summary of the encoding conversion of each object

A: String object
  • For the String type, the integer encoding type that can be saved with the long type is int
  • Floating point number that can be saved in long double type, encoding type is embstr or raw
  • String value, or integer that cannot be represented by long because the length is too long, or floating-point number that cannot be represented by long double, encoding type is embstr or raw
B: List object
  • The string length saved by the list object is less than 64 subsections, or the saved elements are less than 512, the list object uses zipList encoding
  • If the above two conditions cannot be met, use linkedList encoding
C: hash object

The conditions for using zipList are the same as above, if the above two conditions cannot be met, use hashtable encoding

D: Collection object
  • When all the elements saved by the collection object are integer values ​​and the number of saved elements does not exceed 512, use intset encoding
  • Otherwise use hashtable encoding
E: ordered set
  • The number of elements stored in the ordered set is less than 128, and the length of all elements is less than 64 subsections, and the bottom layer uses zipList
  • Otherwise use skiplist encoding

The above is all the content, the creation is not easy, I think it is good, your triptych is the biggest encouragement to bloggers!