In the last issue, we talked about indexing, the selection of MyISAM and InnoDB and other related database interview questions. How are the friends mastered? In this issue, Sloth decided to share the interview questions about the sub-database and sub-tables. This is a very classic interview question~
First of all, you need to know that sub-library and sub-meter are two different things. Don't confuse it. It may be that the optical sub-library does not separate the table, or the optical sub-library does not separate the table. It is possible. Directly on the problem below!
1. What are the partition schemes?
The partition table is implemented by multiple related bottom tables. These basic tables are also represented by handle objects, so we can also directly access various areas. The basic tables of the storage engine management area are the same as the management ordinary tables (all basic tables must use the same storage engine), and the index of the area table is only in The same index for each underlying table. This scheme shields the user's details, even if the query condition does not have a sharding column, it can still work normally.
2. What can MySQL partition do?
- Split logical data
- Improve the speed of a single writing and reading application.
- Improve the read query speed of the partition range.
- Split data can have multiple different physical file paths
- Save historical data efficiently.
3. Types of partitions
- RANGE area: According to the column value of a given continuous interval, multiple rows are allocated to the area. mysql puts the data into different table files according to the specified split strategy. It is equivalent to a file, broken into small pieces. However, the external perception to customers is a manifestation and transparency.
- According to the range, the continuous data of each database, which is generally in the time range, such as transaction tables, sales tables, etc., can be kept according to the year and month. There may be hot issues, and a lot of traffic is on the latest data.
- The advantage of range division is that it is easy to expand.
- Similar to RANGE partitions, each partition must be clearly defined. The main difference is that the definition and selection of each partition in the LIST partition is based on the fact that the value of a column belongs to a column and the value of a column is concentrated in the value of a column, while the RANGE partition belongs to a continuous range of values.
- Select the area based on the user-defined expression regression value, and insert the expression into the column value in the table for calculation. This function includes valid, non-negative integer value performance in MySQL.
- The advantage of hash distribution is that it can evenly distribute the data volume and request pressure of each warehouse. The disadvantage is that it is troublesome. There is a data transfer process. The previous data needs to be recalculated and re-distributed to different libraries and tables.
- KEY partition: Similar to the HASH area, the KEY area only supports the calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.
4. Why do most of the Internet do not use partitions, but instead sub-databases and tables?
Many resources are limited by a single unit, such as the number of connections and network throughput. How to partition is one of the most critical elements in practical applications.
5. Why sub-database and sub-table?
From the performance point of view
As the amount of data in a single database becomes larger and larger, the QPS of database queries becomes higher and higher, and the time required for database reads and writes is also increasing. The read and write performance of the database may become a bottleneck for business development. Accordingly, the performance of the database needs to be optimized. This article only discusses database-level optimization, and does not discuss application-level optimization methods such as caching.
If the database query QPS is too high, you need to consider splitting the database, and sharing the connection pressure of a single database by splitting the database. For example, if the query QPS is 3500, assuming that a single database can support 1000 connections, you can consider dividing it into 4 databases to spread the query connection pressure.
When the amount of data in a single table is too large, after the amount of data exceeds a certain level, whether it is data query or data update, there may be performance problems after pure database-level traditional optimization methods such as index optimization. This is a quantitative change that has produced a qualitative change. At this time, it is necessary to change the idea of solving the problem. For example, solve problems from the source of data production and data processing. Since the amount of data is very large, we will treat them separately to zero. This generates minutes, divides the data into multiple watches according to certain rules, and solves the access performance problem that cannot be solved in the watch environment.
From the point of view of usability
If an accident occurs in a single database, all data is likely to be lost. Especially in the cloud era, many databases are running on virtual machines. If the virtual machine/host machine has an accident, it may cause irreparable losses. Therefore, in addition to the traditional Master-Slave, Master-Master and other deployment levels, you can also consider solving this problem at the data segmentation level.
Here we take the database downtime as an example:
- In the case of a single database deployment, if the database is down, the impact of the failure is 100%, and recovery may take a long time.
- If we split into two libraries and deploy them on different machines, and one of the libraries is down, the impact of the failure will be 50%, and 50% of the data can continue to be served.
- If we split into 4 libraries and deploy them on different machines, and one of the libraries is down, the impact of the failure will be 25%, and 75% of the data can continue to be served, and the recovery time will be very short. .
Of course, we can't unrestrictedly dismantle the library. This is also a way to improve performance and availability at the expense of storage resources. After all, resources are always limited.
6. How to sub-database and sub-table (sub-library? Sub-table? Or both sub-database and sub-table?)
The sub-database sub-table scheme can be divided into the following 3 types
7. How to segment the data?
It is usually divided according to two ways: vertical split and horizontal split. Of course, some complex business scenarios may also choose a combination of the two.
The vertical sub-table usually puts the main popular fields together as the main table according to the frequency of use of the business function. Then, gather the infrequently used things according to their respective business attributes, and divide them into different secondary tables. The relationship between the main table and the secondary table is generally one-to-one.
Horizontal split (data sharding)
The capacity of a single meter does not exceed 500W, otherwise it is recommended to classify. Copy a watch into different watches of the same watch structure, and save the data in these watches according to certain rules to ensure that the capacity of the watch is not too large and improve performance. Of course, these watches with the same structure can be placed in one or more watches. In a database.
Several methods of horizontal split:
- Use MD5 hash to encrypt UIDmd5, take the top few (take the top two here), and then put different UID hashes into different user tables.
- Different tables can be put in according to the time. For example, article_201601, article_201602.
- Split by popularity, the entries with high click-through rate generate their own tables, the low-hot entries are placed in the large table, and the low-hot entries reach a certain number of posts, and the low-hot entries are divided into tables separately.
- Add the corresponding table according to the ID value. The first table user_0000 and the second one million user data are added to the second table user_0001. As users increase, just add the user table directly.
To learn more about database knowledge, click the full text link: