Hardcore resources! Redis five data structures and three advanced data structure analysis (detailed)

The last article shared "In-depth understanding of JVM", this article will share "Redis Five Data Structures and Three Advanced Data Structure Analysis".

Preface

The most important and most fundamental basis of Redis is its rich data structure. The big reason why Redis can stand out is that its rich data structure can support a variety of scenarios. And Redis data structure implementation and application scenarios are quite common in interviews. Next, let's talk about Redis data structure.

Redis data structures are: string, list, hash, set, sorted set,  these five are known to everyone, but Redis also has more advanced data structures, such as: HyperLogLog, Geo, BloomFilter,  these data structures, let’s talk about Talk about these data structures of Redis.

String

Basic concepts : String  is the simplest and most commonly used data structure in Redis, and it is also the only data structure in Memcached. In the usual development, String  can be said to be the most frequently used.
Low-level implementation :

  • If a string object stores an integer value, and this integer value can be represented by the long type, then the string object will store the integer value in the ptr attribute of the string object structure (convert void* to long), and Set the encoding of the string object to int.
  • If the string object saves a string value, and the length of the string value is greater than 39 bytes, then the string object will use a simple dynamic string (SDS) to save the string value, and the encoding of the object Set to raw.
  • If the string object saves a string value, and the length of the string value is less than or equal to 39 bytes, then the string object will use embstr encoding to save the string value.

Use :

> redis_cli            # 启动redis-cli 客户端> set hello world      # 将键 hello 的值设置为 world   OK                     # set 命令成功后 会返回 OK> get hello            # 通过 get 命令获取 键为 hello 的值"world"                # 获得到的值> del hello            # 删除键为 hello 的值(integer) 1            # 返回的是删除的数量> mset a 10 b 20 c 30  # 批量的设置值OK> mget a b c           # 批量的返回值1)"10"2)"20"3)"30"> exists hello         # 是否存在该键(integer) 1            # 1 表示存在,0 表示不存在> expire hello 10      # 给 hello 设置过期时间,单位,秒(integer) 1            # 返回1代表成功,0代表key不存在或无法设置过期时间> pexpire hello 10     # 给 hello 设置过期时间,单位,毫秒(integer) 1            # 返回1代表成功,0代表key不存在或无法设置过期时间

Next, I will focus on the set key value [EX seconds] [PX milliseconds] [NX|XX] series of commands, which is still very important and easy to confuse.
Each time reids overwrites the previous value, the TLL value will be cleared. (TTL is the expiration time)

  • EX second: Set the expiration time of the key to second seconds. SET key value EX second has the same effect as SETEX key second value.
  • PX millisecond: Set the expiration time of the key to millisecond milliseconds. SET key value PX millisecond has the same effect as PSETEX key millisecond value.
  • NX: Only when the key does not exist, can the key be set. SET key value NX has the same effect as SETNX key value.
  • XX: Only when the key already exists, can the key be set.
# 使用 EX 选项> set key1 hello EX 1000   # 设置 过期时间 1000sOK> ttl hello                # 获取 hello 的过期时间(integer) 1000# 使用 PX 选项> set key1 hello PX 1000   # 设置 过期时间 1000msOK> ttl hello                # 获取 hello 的过期时间(integer) 1000# 使用 NX 选项> set hello world NXOK                         # 键不存在,设置成功> get hello"value"> set hello world NX(nil)                      # 键已经存在,设置失败> get hello"world"                    # 维持原值不变# 使用 XX 选项> exists hello             # 先确定 hello 不存在(integer) 0> set hello world XX(nil)                      # 因为键不存在,设置失败> set hello wolrd          # 先给 hello 设置一个值OK> set hello newWolrd XXOK                         # 这回设置成功了> get hello"newWorld"# NX 或 XX 可以和 EX 或者 PX 组合使用> set hello world EX 1000 NXOK> get hello"world"> ttl hello(integer)1000> set hello wolrd PX 30000 NXOK> pttl hello(integer)30000          # 实际操作中 这个值肯定小于 30000,这次是为了效果才这么写的# EX 和 PX 可以同时出现,但后面给出的选项会覆盖前面给出的选项> set hello wolrd EX 1000 PX 30000OK> ttl hello(integer)30            # 这个是 PX 设置的参数,> pttl hello(integer)30000> set number 1OK> incr number          # 对 number 做自增操作(integer) 2

In the development process, using redis to implement locks is a very common operation. Combined with NX and EX to achieve.

> set hello world NX EX 10     # 成功加锁,过期时间是 10sOK> set hello wolrd NX EX 10     # 在10s内执行这个命令返回错误,因为上一次的锁还没有释放(nil)> del hello                    # 释放了锁OK> set hello world NX EX 10     # 成功加锁,过期时间是 10sOK> setnx hello world            # 也可以这么写> setex hello 10 wolrd

The lock can be released by setting the expiration time and manually del delete.
The string commands are more commonly used, so I will introduce more points, and I will focus on the following commands.

Application scenarios :

  • Cache function: The most commonly used string is the cache function, which caches some data that is not frequently updated but frequently queried to reduce the pressure on the DB.
  • Counter: It can be used to count and operate through incr, such as counting website visits, article visits, etc.
    -

List

Basic concepts : List is an ordered and repeatable list, which is quite similar to Java's List. The query speed is fast and can be queried by index; insert and delete are slow.

Low-level implementation :

  • The encoding of the list object can be ziplist or linkedlist.
  • The length of all string elements stored in the list object is less than 64 bytes and the number of stored elements is less than 512, using ziplist encoding; otherwise, using linkedlist;

Use :

> lpush mylist a     # 从左边插入数据(ineteger)1> lpush mylist b(integer)1> rpush mylist c     # 从右边插入数据(integer)1> lrange mylist 0 -1 # 检索数据,lrange 需要两个索引,左闭右闭;0 就是从第 0 个,-1 是倒数第一个,-2 倒数第二个...以此类推1)"b"2)"a"3)"c"> lrange mylist 0 -2 # 0 到 倒数第 2 个 1)"b"2)"a"> lpush mylist a b c # 批量插入(integer)3> lpop mylist        # 从左侧弹出元素"b"> rpop mylist        # 从右侧弹出元素"c"> rpop mylist        # 当列表中没有元素时返回 null(nil)> brpoop mylist 5    # 从右侧弹出元素,如果列表没有元素,会阻塞住,如果 5 s后还是没有元素则返回1)"mylist"   # 列表名        2)"b"        # 弹出元素> del mylist         # 删除列表 (integer)1

Usage scenarios :

  • Message queue: Redis's list is an ordered list structure, which can implement blocking queues, using left-in and right-out methods. Lpush is used to produce data inserted from the left side, and Brpop is used to consume data that is blocked from the right side  .
  • Data paging display: The lrange command needs two indexes to obtain data. This can be used to implement paging. You can calculate two index values ​​in the code, and then fetch the data from redis.
  • It can be used to implement functions such as a fan list and the latest news ranking.
    -

Hash

Introduction : Redis hash can store the mapping between multiple key-value pairs. Like character strings, the value stored in the hash can be either a character string or a numeric value, and the user can also perform an increment or decrement operation on the numeric value stored in the hash. This is very similar to Java's HashMap. Each HashMap has its own name and can store multiple k/v pairs at the same time.
Low-level implementation :

  • The encoding of the hash object can be ziplist or hashtable.
  • The key and value string length of all key-value pairs saved by the hash object is less than 64 bytes and the number of saved key-value pairs is less than 512, use ziplist encoding; otherwise, use hashtable;

Use :

> hset student name 张三    # 可以理解为忘名叫student的map中添加 kv 键值对(integer)1            # 返回1 代表 不存在这个key,并且添加成功> hset student sex 男(integer)1> hset student name 张三(integer)0            # 返回0 因为这个key已经存在> hgetall student1)"name"2)"张三"3)"sex"4)"男"> hdel student name       #删除这key(integer)1           # 返回 1 同样代表整个 key 存在 并且删除成功> hdel student name(integer)0           # 返回 0 是因为 该 key 已经不存在

Application scenarios :

  • Hash is more suitable for storing structured data, such as objects in Java; in fact, objects in Java can also be stored as strings. You only need to serialize the object into a json string, but if a certain attribute of the object is updated frequently , Then you need to re-serialize and store the entire object each time, which consumes a lot of overhead. But if you use hash to store each attribute of the object, you only need to update the attribute to be updated each time.
  • Shopping cart scenario: You can use the user's id as the key, the product id as the stored field, and the number of the product as the value of the key-value pair, which constitutes the three elements of the shopping cart.
    -

Set

Basic concept : Redis set and list can store multiple strings. The difference between them is that list is ordered and repeatable, while set is disordered and non-repeatable.
Low-level implementation :

  • The encoding of the collection object can be intset or hashtable.
  • All elements stored in the collection object are integer values ​​and the number of stored elements does not exceed 512, using intset encoding; otherwise, hashtable is used;

Use :

> sadd family mother          # 尝试将 mother 添加进 family 集合中(integer)1       # 返回 1 表示添加成功,0 表示元素已经存在集合中> sadd family father(integer)1> sadd family father(intger)0> smembers family             # 获取集合中所有的元素1)"mother"2)"father"> sismember family father     # 判断 father 是否在 family 集合中 (integer)1      # 1 存在;0 不存在> sismber family son(integer)0> srem family son             # 移除 family 集合中元素 son(integer)1     # 1 表示存在并且移除成功;0 表示存在该元素> srem family som(integer)0> sadd family1 mother(integer)1> smembers family 1)"mother"2)"father"> smember family11)"mother"> sinter family family1     # 获取 family 和 family1 的交集1)"mother"> sadd family1 son(integer)1> sunion family family1     # 获取 family 和 family1 的并集1)"mother"2)"father"> sdiff family family1      # 获取 family 和 family1 的差集(就是family有但是family1没有的元素)1)"father"

Application scenarios :

  • Tags: The tags of each person on the blog site can be stored in a set collection, and then users can be grouped according to each tag.
  • Store friends/fans: set has a de-duplication function; you can also use the set union function to get common friends and other functions.
    -

Sorted Set

Basic concept : The ordered set is the same as the hash, which is used to store key-value pairs: each key of the ordered set is called a member, which is unique, and each value of the ordered set is called a score. The value (score) must be a floating point number. It can be sorted by score. Ordered set is the only structure in Redis that can access elements based on members (this is the same as hashing), but also based on scores and the order in which they are arranged. Like other structures of Redis, users can perform operations such as adding, removing, and getting to an ordered collection.
Low-level implementation :

  • The encoding of ordered sets can be ziplist or skiplist
  • The number of elements saved in an ordered set is less than 128 and the length of all element members saved is less than 64 bytes. Use ziplist encoding; otherwise use skiplist;

Use :

> zadd class 100 member1 # 将member1元素及其score值100加入到 有序集合 class中(integer)1> zadd class 90 member2 80 member3 # 批量添加(integer)2> zrange class 0 -1 withscores  # 获取有序集合中的值与score,并按 score 排序1)"member3" 2)"80"3)"member2"4)"90"5)"member1"6)"100"> zrem class member1   # 删除 class 中 的member1(integer)1

Application scenarios :

  • Leaderboard: An orderly collection of the most commonly used scenes. For example, a news website sorts hot news, for example, according to the amount of clicks, the amount of likes, and so on.
  • Weighted message queue: The score of important messages is larger, and the score of ordinary messages is smaller, so that tasks with higher priority can be executed first. Basic concepts of HyperLogLog :
    Redis added the HyperLogLog structure in version 2.8.9.

Redis HyperLogLog is an algorithm for base statistics. The advantage of HyperLogLog is that when the number or volume of input elements is very very large, the space required to calculate the base is always fixed and small.

In Redis, each HyperLogLog key only needs 12 KB of memory to calculate the base of close to 2^64 different elements. This is in sharp contrast to a collection where more elements consume more memory when calculating cardinality.

However, because HyperLogLog only calculates the cardinality based on the input element, and does not store the input element itself, HyperLogLog cannot return the input elements like a collection.
Usage :
Here is an example of how many users logged in on a statistical website on May 23, 2021

> pfadd user_login_20210523 tom #  user_login_20210523是key;tom 是登录的用户(integer)1> pfadd user_login_20210523 tom jack lilei 的用户(integer)1> pfcount user_login_20210523  # 获取 key 对应值的数量,同一个用户多次登录只统计一次(integer) 3  > pfadd user_login_20210522 sira (integer)1> pfcount user_login_20210523 user_login_20210522 # 统计22号和23号一共有多少登陆的用户(integer)4>pfmerge user_login_20210522_23 user_login_20210522 user_login_20210523 # 将连个键内容合并"OK"> pfcount user_login_20210522_23(integer)4

Application scenarios :

  • It can be used to count the number of logins on the website and other indicators

GEO

Basic concept :
In Redis 3.2, a new data structure called geo is added, which is mainly used to store geographic location information and operate on the stored information.
Use :
geoadd is used to store the specified geographic location, and one or more longitude (longitude), latitude (latitude), location name (member) can be added to the specified key.

> GEOADD beijing 116.405285 39.912835 "蘑菇睡不着"(integer)2

geopos is used to return all the positions (longitude and latitude) of the specified name (member) from the given key, and return nil if it does not exist.

> GEOPOS beijing "蘑菇睡不着" "故宫"1) 1)116.405285   2)39.9128352)(nil)   

geodist is used to return the distance between two given locations.
Unit parameters:
m: meter, the default unit.
km: kilometer.
mi: Miles.
ft: Feet.

> GEOADD beijing 116.403681 39.921156 "故宫"(integer)1> GEODIST beijing "蘑菇睡不着" "故宫" km"0.936"

Application scenarios : scenarios
used to store geographic information and operate on geographic information.
** A small geographical knowledge of popular science:
longitude range: -180-180. Counting from the 0° longitude, the east and west are divided into 180°, 180° to the east belongs to the east longitude, and “E” is customarily used as the code, and 180° to the west belongs to the west longitude, and “W” is customarily used as the longitude. Codename. The position of 0° is: The meridian of the meridian center of the Greenwich Observatory in Greenwich, England.
Latitude range: -90-90. The latitude of a point located north of the equator is called north latitude, which is N; the latitude of a point located south of the equator is called southern latitude, which is S. For the convenience of studying the problem, people divide the latitudes into low, middle and high latitudes. 0°~30° is the low latitude, 30°~60° is the middle latitude, and 60~90° is the high latitude. **

BloomFilter

Basic concept :
A data structure is composed of a string of very long binary vectors, which can be regarded as a binary
array. Since it is binary, it stores either 0 or 1, but the initial default value is all 0. Its main function is to determine whether an element is in a certain set . For example, I want to judge whether there is a certain number among the 2 billion numbers. If you insert the DB directly, the amount of data will be too large and the time will be very slow; if you put 2 billion data in the cache, the cache will not fit. At this time, the Bloom filter is most suitable. The principle of the Bloom filter is:

  • Adding elements
    When we want to add an element key to the Bloom filter, we use multiple hash functions to calculate a value, and then set the square where this value is located to 1.
  • Judging whether the element exists: To
    determine whether the element exists, the element is calculated through multiple hash functions to calculate multiple subscript values, and then determine whether the element values ​​corresponding to these subscripts are all 1, and if it exists, it is not 1, then The elements are definitely not in the set; if they are all 1, then the elements are in the set with a high probability, and it is not 100% sure that the elements are in the set, because the results of multiple different data calculated by the hash function will be repeated, so there will be A certain position is set to 1 by other data through the hash function.
    In general: Bloom filter can determine that a certain data must not exist, but it cannot determine that it must exist.
  • The advantages and disadvantages of Bloom filters:
  • Advantages: The advantages are obvious. The binary array takes up very little memory, and the insertion and query speeds are fast enough.
  • Disadvantages: With the increase of data, the rate of misjudgment will increase; there is also the inability to determine that the data must exist; there is also an important shortcoming, the data cannot be deleted.

Use : After
redis 4.0, you can use the Bloom filter plug-in RedisBloom , the command is as follows:

bf.add 添加元素到布隆过滤器bf.exists 判断元素是否在布隆过滤器bf.madd 添加多个元素到布隆过滤器,bf.add只能添加一个bf.mexists 判断多个元素是否在布隆过滤器 > bf.add boomFilter tc01(integer) 1            # 1:存在;0:不存在> bf.add boomFilter tc02(integer) 1> bf.add boomFilter tc03(integer) 1> bf.exists boomFilter tc01(integer) 1> bf.exists boomFilter tc02(integer) 1> bf.exists boomFilter tc03(integer) 1> bf.exists boomFilter tc04(integer) 0> bf.madd boomFilter tc05 tc06 tc071) (integer) 12) (integer) 13) (integer) 1> bf.mexists boomFilter tc05 tc06 tc07 tc081) (integer) 12) (integer) 13) (integer) 14) (integer) 0
  1. Redisson uses Bloom filters:
public static void main(String[] args) {         Config config = new Config();        config.useSingleServer().setAddress("redis://192.168.15.105:6379");        config.useSingleServer().setPassword("password123");                //构造Redisson        RedissonClient redisson = Redisson.create(config);         RBloomFilter<String> bloomFilter = redisson.getBloomFilter("userPhones");                //初始化布隆过滤器:预计元素为500000000L,误差率为3%        bloomFilter.tryInit(500000000L,0.03);                //将号码10086插入到布隆过滤器中        bloomFilter.add("18846014678");          //判断下面号码是否在布隆过滤器中        System.out.println(bloomFilter.contains("18846014678")); //true        System.out.println(bloomFilter.contains("1111111222")); //false }
  1. Guava uses Bloom filters:
    Guava is a Java toolkit provided by Google, which is very powerful
public static void main(String[] args) {        BloomFilter<String> bloomFilter = BloomFilter.create(Funnels.stringFunnel(Charsets.UTF_8), 500000, 0.01);         bloomFilter.put("18846047789");         System.out.println(bloomFilter.mightContain("18846047789")); // true        System.out.println(bloomFilter.mightContain("1122222"));     //false    }}

Application scenarios :

  • Solve the cache penetration problem: The general query scenario is to query the cache first. If the cache is not available, then go to the DB query. If it is found, it will be stored in the cache first, and then returned to the caller. If it can't be found, it will return to empty. In this case, if someone frequently requests data that is not available in the cache, such as id = -1, it will cause great pressure on the DB. In this case, you can use the redis bloom filter, you can first check the possibility The id is stored in the bloom filter, when the query comes, go to the bloom filter first, if it is not found, return directly, do not request the cache and DB, if it exists in the bloom filter, then go to the cache Get data.
  • Blacklist verification: You can put the ip in the blacklist into the Bloom filter, so you don't need to check it in the db every time you come.

to sum up

The rich data structure of Redis is one of the important cornerstones supporting Redis. He enables Redis to adapt to many complex scenarios. This piece of content can be said to be a must in the interview, so more work must be done in this area.

  • The above is the sharing of "Redis Five Data Structures and Three Advanced Data Structure Analysis".
  • Everyone is also welcome to exchange and discuss. If there are any errors in this article, I hope you can forgive me.
  • It is not easy to create, your support is my biggest motivation, if you are helpful to everyone, please give me a thumbs up~~~