Read and write separation
Why do I need to read and write separation
Mysql mainly operates on disks. Although Mysql has some internal buffers, the read performance is very low and the speed is slow. In most projects, the read operation is much larger than the write operation, so if you separate the read and write, you can better improve the read and write performance.
However, the separation of read and write may cause some problems, such as inconsistent data for read and write operations, and difficult deployment and maintenance.
mysql master-slave replication
The binary log is generated after the transaction is committed and has nothing to do with the storage engine. And
redologafter the transaction commits brush also has a whole disk operation.
binlogWhat is recorded is the modified row information.
- Main library update events (update, insert, delete) are written to binlog through io-thread;
- Request to read binlog from the library, and write (write) from the library local relay log (relay log) through io-thread;
- Read the relay log from the library through sql-thread, and replay the update event in the slave library;
After the transaction is committed
binlog, our slave database will start an IO thread to copy from the master database
binlogto the slave database. Write it from the database to a relay log. After that, we start another thread
SQL thread, which is responsible for reading out the content of the relay log and then playing it back, that is, executing the SQL statement that has been executed in the main database again. This is where the master-slave database will remain consistent.
If there are more from the database, the slower the synchronization
Final consistency, strong consistency
Write master and read slave
Write the master; if the consistency requirement is high, read from the master database, if the consistency requirement is not high, read from the database
Prerequisite: read more and write less, a single master node can bear the amount of project data
The hot read data in the cache database backup, the operation proceeds to the read buffer hotspot database
Application scenario analysis
- Memory access speed is one hundred thousand times faster than disk access speed
- The number of reads in the big business scenario is more than ten times the number of writes
So we need to optimize its read performance
mysql comes with its own internal cache, but this cache is mainly for mysql's own needs. What mysql mainly needs to solve is the speed difference between memory access and disk access, which is different from our business needs.
For example, in the hourly spike activity, we can cache some user data in advance to prevent the decline in MySQL read performance caused by a large number of users logging in
Usually use relational database as the main database, which is convenient for analysis (because memory is much smaller than disk).
Use cache database (memory database) to store hot data
Synchronization problem analysis
Because the cache database is introduced, it is necessary to introduce a synchronization strategy to keep the cache database and mysql data consistent
The cache can be down and abnormal, but we need to ensure that our system must be able to continue to run.
mysql yes, cache no
In this case, it is acceptable, but a strategy is needed to synchronize the data in mysql to the cache, so that it is convenient for us to fetch data directly from the cache next time
mysql no, cache has
Not acceptable, because mysql is used as the basis and benchmark for our main data. If there is no data in mysql, then other things, such as accessing the cache database first, if there is in the cache, will cause inconsistent data in our entire system. So we need to adopt a certain strategy to avoid this situation
mysql has, cache has, but the data is inconsistent
The same is unacceptable for the same reason as above. It also needs to be avoided through strategies.
Both mysql and cache have data and are consistent
Neither mysql nor cache has data
Single data center
Part of the game business scenario, there is a separate DBServer
Multiple data centers
For most web projects, the server side does not cache data, but directly manipulates the database; if the data is cached, the update may be lost
Eventually consistent solution
Read and write separation, it takes time for the master library to synchronize data to the slave library, so during the synchronization period, there is a difference in data between the master and the slave.
Reading plan: read the cache first, and return directly if the cache exists; if the cache does not exist, go to mysql to get it, and then write to Redis.
There are two options:
- Write mysql directly, wait for mysq|synchronize data to redis
- Write redis first, set the expiration time of the key to 200ms (experience value), wait for mysq|write back redis, overwrite the key, and set a longer expiration time (because, if someone else needs to access this data at this time, you can go directly from Read in Redis, no need to go to mysql, speed up); 200ms default is the time for writing mysq| to mysq to synchronize to redis; this needs to be set according to the actual environment. There may be multiple accesses to mysql (that is, if there is no return after 200ms, the key does not exist when returning, so Redis will access mysql again), or there may be update loss (that is, multiple servers operate the same process, we have finished writing in Redis , When it is synchronized, another server will modify it, which will cause inconsistent data in the memory)
Strong consistency solution
Reading scheme: same as synchronization consistency, read the cache first, and return directly if the cache exists; if the cache does not exist, go to mysql to get it, and then write to Redis
Write plan: delete the cache first, then write to mysql, and wait for mysql to synchronize to the cache (otherwise, data inconsistency will occur when the synchronization is not successful)
Data synchronization scheme
Regardless of strong consistency or eventual consistency, a synchronization process is required to synchronize mysql data to Redis. Next, explain several scenarios in synchronization.
Option one (canel)
According to the principle of master-slave replication, this
caneldisguised as a slave database realizes the following part (red), namely reading
canel clientit realizes the
binlogparse of this binary file into a readable format
It is not recommended to use, it is easy to make mistakes in transaction scenarios; although the real strong consistency is guaranteed;
this implementation requires a resume redis connection for every insertion modification, and the redis connection is released after the operation is completed;
The following three questions are that there is no read cache , and then there is a problem in the process of reading mysql
Problem: When the cache and mysql do not exist: read the cache first, read mysql if the cache does not exist, and return directly if mysql does not exist. If a hacker uses data that does not exist in mysql, it will keep requesting data that does not exist in mysql, and the pressure will all accumulate in mysql. Then, at this time, it may cause mysql to crash (including suspended animation, and the database cannot provide services. status).
- The setting
<key, nil>and expiration time in the cache (to prevent excessive accumulation
<key, nil>), the next time you access the key, you will no longer access mysql.
- Deploy bloom filter (write the key existing in mysql to bloom filter) (the bloom filter does not support the delete function, only supports the added function, so there are too many restrictions)
Problem: Some data is not available in Redis, but mysql does. At this time, there are a large number of concurrent requests for this type of data , which will also cause excessive pressure on mysql. (This very hot key has expired)
- Locking: Acquire the lock when requesting data from mysql, if the acquisition is successful, proceed to the next operation; if the acquisition fails, sleep for a period of time (200ms) to acquire; if the acquisition is successful, synchronize to Redis, and finally release lock. After the lock is released, other processes can access Redis concurrently, and there is no problem with concurrent access to Redis.
- Set hot keys not to expire
Question: It means that a large number of caches are invalidated in a period of time (redis does not have mysql), causing all requests to go to mysql, which may bring down the database and invalidate the entire service
- If the cache database is down, causing all data to flow to mysql; use highly available cluster solutions, such as sentinel mode (an election will elect another node to serve as the main cache database), cluster mode;
- If the same expiration time is set, the cache will become invalid; set a random expiration value or other mechanisms to stagger the invalidation
- If the cached data disappears when the system restarts (Redis is an in-memory database); the restart time is short, redis is enabled for persistence (expired information will also be persisted); the restart time is long, and the hot data is imported into redis in advance;