Mysql caching scheme

Read and write separation

Why do I need to read and write separation

Mysql mainly operates on disks. Although Mysql has some internal buffers, the read performance is very low and the speed is slow. In most projects, the read operation is much larger than the write operation, so if you separate the read and write, you can better improve the read and write performance.
However, the separation of read and write may cause some problems, such as inconsistent data for read and write operations, and difficult deployment and maintenance.

mysql master-slave replication

binlog

The binary log is generated after the transaction is committed and has nothing to do with the storage engine. And redologafter the transaction commits brush also has a whole disk operation. binlogWhat is recorded is the modified row information.

Insert picture description here
  1. Main library update events (update, insert, delete) are written to binlog through io-thread;
  2. Request to read binlog from the library, and write (write) from the library local relay log (relay log) through io-thread;
  3. Read the relay log from the library through sql-thread, and replay the update event in the slave library;

After the transaction is committed binlog, our slave database will start an IO thread to copy from the master database binlogto the slave database. Write it from the database to a relay log. After that, we start another thread SQL thread, which is responsible for reading out the content of the relay log and then playing it back, that is, executing the SQL statement that has been executed in the main database again. This is where the master-slave database will remain consistent.

Insert picture description here


If there are more from the database, the slower the synchronization

Final consistency, strong consistency

Final consistency

Write master and read slave

Insert picture description here

Strong consistency

Write the master; if the consistency requirement is high, read from the master database, if the consistency requirement is not high, read from the database

Insert picture description here

Caching scheme

Prerequisite: read more and write less, a single master node can bear the amount of project data

The hot read data in the cache database backup, the operation proceeds to the read buffer hotspot database

Insert picture description here

Application scenario analysis

  1. Memory access speed is one hundred thousand times faster than disk access speed
  2. The number of reads in the big business scenario is more than ten times the number of writes

So we need to optimize its read performance

mysql comes with its own internal cache, but this cache is mainly for mysql's own needs. What mysql mainly needs to solve is the speed difference between memory access and disk access, which is different from our business needs.

For example, in the hourly spike activity, we can cache some user data in advance to prevent the decline in MySQL read performance caused by a large number of users logging in

Usually use relational database as the main database, which is convenient for analysis (because memory is much smaller than disk).
Use cache database (memory database) to store hot data

Insert picture description here

Synchronization problem analysis

Because the cache database is introduced, it is necessary to introduce a synchronization strategy to keep the cache database and mysql data consistent

The cache can be down and abnormal, but we need to ensure that our system must be able to continue to run.

Presence state

mysql yes, cache no

In this case, it is acceptable, but a strategy is needed to synchronize the data in mysql to the cache, so that it is convenient for us to fetch data directly from the cache next time

mysql no, cache has

Not acceptable, because mysql is used as the basis and benchmark for our main data. If there is no data in mysql, then other things, such as accessing the cache database first, if there is in the cache, will cause inconsistent data in our entire system. So we need to adopt a certain strategy to avoid this situation

mysql has, cache has, but the data is inconsistent

The same is unacceptable for the same reason as above. It also needs to be avoided through strategies.

Both mysql and cache have data and are consistent

Acceptable

Neither mysql nor cache has data

Acceptable

Target scene

Single data center

Part of the game business scenario, there is a separate DBServer

Multiple data centers

For most web projects, the server side does not cache data, but directly manipulates the database; if the data is cached, the update may be lost

Insert picture description here

Eventually consistent solution

Read and write separation, it takes time for the master library to synchronize data to the slave library, so during the synchronization period, there is a difference in data between the master and the slave.

Reading plan: read the cache first, and return directly if the cache exists; if the cache does not exist, go to mysql to get it, and then write to Redis.

There are two options:

  1. Write mysql directly, wait for mysq|synchronize data to redis
  2. Write redis first, set the expiration time of the key to 200ms (experience value), wait for mysq|write back redis, overwrite the key, and set a longer expiration time (because, if someone else needs to access this data at this time, you can go directly from Read in Redis, no need to go to mysql, speed up); 200ms default is the time for writing mysq| to mysq to synchronize to redis; this needs to be set according to the actual environment. There may be multiple accesses to mysql (that is, if there is no return after 200ms, the key does not exist when returning, so Redis will access mysql again), or there may be update loss (that is, multiple servers operate the same process, we have finished writing in Redis , When it is synchronized, another server will modify it, which will cause inconsistent data in the memory)
Insert picture description here


Insert picture description here

Strong consistency solution

Reading scheme: same as synchronization consistency, read the cache first, and return directly if the cache exists; if the cache does not exist, go to mysql to get it, and then write to Redis

Write plan: delete the cache first, then write to mysql, and wait for mysql to synchronize to the cache (otherwise, data inconsistency will occur when the synchronization is not successful)

Insert picture description here


Insert picture description here

Data synchronization scheme

Regardless of strong consistency or eventual consistency, a synchronization process is required to synchronize mysql data to Redis. Next, explain several scenarios in synchronization.

Option one (canel)

According to the principle of master-slave replication, this caneldisguised as a slave database realizes the following part (red), namely reading binlog.
And canel clientit realizes the binlogparse of this binary file into a readable format

Insert picture description here
Insert picture description here

Option II

It is not recommended to use, it is easy to make mistakes in transaction scenarios; although the real strong consistency is guaranteed;
this implementation requires a resume redis connection for every insertion modification, and the redis connection is released after the operation is completed;

Insert picture description here
Insert picture description here

Cache failure

The following three questions are that there is no read cache , and then there is a problem in the process of reading mysql

Cache penetration

Problem: When the cache and mysql do not exist: read the cache first, read mysql if the cache does not exist, and return directly if mysql does not exist. If a hacker uses data that does not exist in mysql, it will keep requesting data that does not exist in mysql, and the pressure will all accumulate in mysql. Then, at this time, it may cause mysql to crash (including suspended animation, and the database cannot provide services. status).

solve:

  1. The setting <key, nil>and expiration time in the cache (to prevent excessive accumulation <key, nil>), the next time you access the key, you will no longer access mysql.
  2. Deploy bloom filter (write the key existing in mysql to bloom filter) (the bloom filter does not support the delete function, only supports the added function, so there are too many restrictions)

Cache breakdown

Problem: Some data is not available in Redis, but mysql does. At this time, there are a large number of concurrent requests for this type of data , which will also cause excessive pressure on mysql. (This very hot key has expired)

solve:

  1. Locking: Acquire the lock when requesting data from mysql, if the acquisition is successful, proceed to the next operation; if the acquisition fails, sleep for a period of time (200ms) to acquire; if the acquisition is successful, synchronize to Redis, and finally release lock. After the lock is released, other processes can access Redis concurrently, and there is no problem with concurrent access to Redis.
  2. Set hot keys not to expire
Insert picture description here

Cache avalanche

Question: It means that a large number of caches are invalidated in a period of time (redis does not have mysql), causing all requests to go to mysql, which may bring down the database and invalidate the entire service

solve:

  1. If the cache database is down, causing all data to flow to mysql; use highly available cluster solutions, such as sentinel mode (an election will elect another node to serve as the main cache database), cluster mode;
  2. If the same expiration time is set, the cache will become invalid; set a random expiration value or other mechanisms to stagger the invalidation
  3. If the cached data disappears when the system restarts (Redis is an in-memory database); the restart time is short, redis is enabled for persistence (expired information will also be persisted); the restart time is long, and the hot data is imported into redis in advance;
Insert picture description here