Simple dynamic string "Redis 5 design and source code analysis"

1 Simple Dynamic Strings (SDS) is one of the basic data structures of Redis, used to store strings and integer data.

2 The structure of sds generally has these 4 variables:

  • len: indicates the number of bytes occupied in buf
  • alloc: indicates the number of bytes allocated in buf, that is, the total length of buf allocation
  • flags: Identifies the type of the current structure, and flags is used to indicate which of the 5 types of structure is a string
  • buf: flexible array, the data space where the string is actually stored

As shown above, redis provides 5 types of string structure (1 byte, 2 bytes, 4 bytes, 8 bytes and less than 1 byte). variable

I guess this is why it is called a simple dynamic string. It will use the corresponding string type to store according to the content. For example, use the following structure to store short strings with a length less than 32

struct __attribute__ ((__packed__)) sdshdr5 {    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */    char buf[];};

Note: 3 lsb of type, and 5 msb of string length indicate the storage type of the low 3 digits, and the storage length of the high 5 digits

3 The function of __attribute__ ((__packed__)) in the source code: Generally, the structure will be byte-aligned according to the least common multiple of the size of all its variables. After being modified with packed, the structure becomes aligned by 1 byte .

4 What sds returns to the upper layer is not the first address of the structure, but the buf pointer that points to the content . Because the alignment is 1 byte at this time, after the sds is created successfully, whether it is sdshdr8, sdshdr16 or sdshdr32, the buf pointer address can be obtained through (char*)sh + hdrlen (where hdrlen is the length of the structure, calculated by sizeof) .

So after modification, no matter sdshdr8, sdshdr16 or sdshdr32, flags can be found through buf[-1]

5 Create sds : Redis creates sds through the sdsnewlen function. sdsnewlen is an alias for the hi_sdsnewlen function. In the function, the appropriate type will be selected according to the length of the string.

hisds hi_sdsnewlen(const void *init, size_t initlen) {    void *sh;    hisds s;    char type = hi_sdsReqType(initlen);    /* Empty strings are usually created in order to append. Use type 8     * since type 5 is not good at this. */    if (type == HI_SDS_TYPE_5 && initlen == 0) type = HI_SDS_TYPE_8;    int hdrlen = hi_sdsHdrSize(type);    unsigned char *fp; /* 指向flags的指针 */     sh = hi_s_malloc(hdrlen+initlen+1); // +1是为了结束符'\0'    if (sh == NULL) return NULL;    if (!init)        memset(sh, 0, hdrlen+initlen+1);    s = (char*)sh+hdrlen;   // s是指向buf的指针    fp = ((unsigned char*)s)-1; //     switch(type) {        case HI_SDS_TYPE_5: {            *fp = type | (initlen << HI_SDS_TYPE_BITS);            break;        }        case HI_SDS_TYPE_8: {            HI_SDS_HDR_VAR(8,s);        ...     if (initlen && init)        memcpy(s, init, initlen);    s[initlen] = '\0';    return s;}

6 The sdshdr5 type is basically abandoned, because this may be frequently updated and cause expansion, so it is directly created as sdshdr8

7 Release sds : The sdsfree function, a function that directly releases the memory, can locate the head of the sds structure by offsetting s, and then call s_free to release the memory

/* Free an hisds string. No operation is performed if 's' is NULL. */void hi_sdsfree(hisds s) {    if (s == NULL) return;    hi_s_free((char*)s-hi_sdsHdrSize(s[-1]));}

In order to optimize performance, sds provides a method to achieve the purpose of clearing by resetting the statistics instead of directly releasing the memory-sdsclear

void hi_sdsclear(hisds s) {    hi_sdssetlen(s, 0);    s[0] = '\0';}

8 Splicing strings: sdscatsds function, the real work inside is the hi_sdscatlen function

hisds hi_sdscatsds(hisds s, const hisds t) {    return hi_sdscatlen(s, t, hi_sdslen(t));} 
hisds hi_sdscatlen(hisds s, const void *t, size_t len) {    size_t curlen = hi_sdslen(s);     s = hi_sdsMakeRoomFor(s,len);    if (s == NULL) return NULL;    memcpy(s+curlen, t, len);    hi_sdssetlen(s, curlen+len);    s[curlen+len] = '\0';    return s;}

9 The expansion function hi_sdsMakeRoomFor has the following strategies, which can be seen from the code

hisds hi_sdsMakeRoomFor(hisds s, size_t addlen) {    void *sh, *newsh;    size_t avail = hi_sdsavail(s);    size_t len, newlen;    char type, oldtype = s[-1] & HI_SDS_TYPE_MASK;    int hdrlen;      // 1)若sds中剩余空闲长度avail大于新增内容的长度addlen,直接返回,无需扩容    if (avail >= addlen) return s;     len = hi_sdslen(s);    sh = (char*)s-hi_sdsHdrSize(oldtype);    newlen = (len+addlen);     // 2)若新增后总长度len+addlen < 1MB,按新长度的2倍扩容;    //    如果len+addlen > 1MB,新长度再加上1MB扩容    if (newlen < HI_SDS_MAX_PREALLOC)        newlen *= 2;    else        newlen += HI_SDS_MAX_PREALLOC;     // 3)根据新长度重新选取存储类型,并分配空间    type = hi_sdsReqType(newlen);     /* Don't use type 5: the user is appending to the string and type 5 is     * not able to remember empty space, so hi_sdsMakeRoomFor() must be called     * at every appending operation. */    if (type == HI_SDS_TYPE_5) type = HI_SDS_TYPE_8;     hdrlen = hi_sdsHdrSize(type);    if (oldtype==type) {        newsh = hi_s_realloc(sh, hdrlen+newlen+1);        if (newsh == NULL) return NULL;        s = (char*)newsh+hdrlen;    } else {        /* Since the header size changes, need to move the string forward,         * and can't use realloc */        newsh = hi_s_malloc(hdrlen+newlen+1);        if (newsh == NULL) return NULL;        memcpy((char*)newsh+hdrlen, s, len+1);        hi_s_free(sh);        s = (char*)newsh+hdrlen;        s[-1] = type;        hi_sdssetlen(s, len);    }    hi_sdssetalloc(s, newlen);    return s;}

10 Other API

11 sds has two points to note

  • The sds exposed to the upper layer is a pointer to the flexible array buf
  • The complexity of read operations is mostly O(1), and member variables are read directly; write operations may trigger capacity expansion