Redis supported data structure and underlying implementation


The maximum supported size of Redis Key is 512MB

Redis Key should not be too large. First, it will take up more memory. Secondly, it will be time-consuming to perform multiple Key comparisons in the process of finding elements.

Redis Key should not be too small, which will lead to poor readability.

Redis Value can support different data types, as shown in the figure below

Insert picture description here

1. String

Use simple dynamic character (SDS) string to save String type value. SDS has a special structure to record the used space length, so the value length can be obtained in O(1) time.

Pre-allocated space is reserved for future value growth

Insert picture description here

Insert picture description here

2. List

The order of the saved elements is based on the insertion order

When the amount of data is relatively small or a short string is stored, Redis will use ZipList to save the elements, where

zlbytes (the number of bytes occupied by the compressed linked list)
zltail (the offset of the last element from the starting position, which is convenient to find the last element)
zllength (number of elements)
entries (list of elements)
zlend (end flag)

ZipList can easily find the first element and the last element, the element list entries are stored in an array

Insert picture description here

 Implementation of entry

When the amount of data is large, the LinkList double-ended queue will be used to save the elements, including the head and tail pointers, and the time complexity of insertion and deletion is O(1)

LinkList is easy to operate on both ends, but the memory overhead is large, because additional pointers before and after the nodes need to be saved. Secondly, the memory addresses between nodes are not continuous, which is prone to memory fragmentation.

ZipList is a whole piece of contiguous memory, but each modification operation will carry out the realloc of the memory, and the performance is poor.

So after Redis 3.2, the List implementation was changed to QuickList , which is a combination of LinkList and ZipList

Insert picture description here

3. Hash

The bottom layer is ZipList and HashTable. When the length of the key and value strings of all key-value pairs saved by the hash object is less than 64 bytes and the number of key-value pairs saved by the hash object is less than 512, use ZipList, otherwise use HashTable

You can set the values ​​of hash_max_ziplist_entries and hash_max_ziplist_value to control the storage conversion between the two.


Insert picture description here

3.1 Redis Rehash

When the hash table needs to be expanded or contracted, the rehash operation is completed by putting the data of ht[0] in the position of ht[1]

Load factor = number of nodes saved in the hash table / hash table size
load_factor = ht[0].used / ht[0].size

When the server does not execute the BGSAVE or BGREWRITEAOF command, if the load factor >=1, the hash table will perform the expansion operation

When the server executes the BGSAVE or BGREWRITEAOF command, if the load factor >= 5, the hash table will perform the expansion operation

The rehash process is not a one-time, but the calculation work required by rehash is allocated to each operation of adding, deleting, modifying, and checking the dictionary.

The dictionary maintains the rehashidx field, starting from 0, rehash the key-value pairs above rehashidx to ht[1] each time, then rehashidx++, and finally all key-value pairs in ht[0] are moved to ht[1], then rehashidx = -1, Indicates that the rehash is complete.

In the progressive rehash process, Redis will use ht[0] and ht[1] at the same time. The delete, find, and update operations will run on the two hash tables. During the rehash process, the newly added key-value pairs are saved to ht[1] to ensure that when rehash is executed, ht[0] is an empty table, and then ht[1] = ht[0].

4. Set

The saved element is unique and unordered. When the saved element is an integer value and the number of elements does not exceed 512, use intset to save the element, otherwise use hashtable to save the element.

  • Intset

A data structure used to store integer values, which can store integer values ​​of int16, int32, and int64 types. The data items are arranged in order, and there will be no duplicate elements.

typedef struct intset{
     uint32_t encoding;
     uint32_t length;
     int8_t contents[];

  • HashTable, at this time key is the value of set, value = null

4.1 intSet upgrade and downgrade

5. ZSet

Unlike the Set type, the element is also associated with a float type score, and the elements are sorted by the score. Another difference from Set is that it supports removing elements within a range (top 10 or bottom 10)

 * 有序集合
typedef struct zset {

    // 字典,键为成员,值为分值
    // 用于支持 O(1) 复杂度的按成员取分值操作
    dict *dict;

    // 跳跃表,按分值排序成员
    // 用于支持平均复杂度为 O(log N) 的按分值定位成员操作
    // 以及范围操作
    zskiplist *zsl;

} zset;

When the amount of data is small, use ZipList to achieve, otherwise use SkipList + Dict to save elements

SkipList is an ordered data structure. Each node maintains multiple pointers to other nodes. Data can be found in less than O(N) time complexity

Insert picture description here

5.1 Why use skip tables instead of AVL trees

  1. The range lookup is more difficult than the jump table
  2. Inserting and deleting nodes triggers tree adjustments, and jumping the table only needs to change the pointer
Insert picture description here

6. Bit array (bitmaps)

It is not an actual data type, but a bit operation based on the String type. The maximum supported length of String is 512MB, so the supported bit operations can reach 2^32.

The advantage of using a bitmap is that it can use a very small space to convey more information

7. HyperLogLogs

A data structure for calculating probability, used to estimate the cardinality of the set,

8. Stream


Meituan's exploration and practice of Redis Rehash mechanism

Detailed explanation of Redis internal data structure (6)-skiplist