This article explains the
varcharprinciple of storage. The knowledge is difficult and involves calculations. Those who are interested are welcome to read it.
- 1. What does InnoDB do?
- 2. How does InnoDB read and write data?
- 3. Tens of thousands of varchar questions-InnoDB row format
- 3.1 dynamic-innodb default row format
- 3.2 How does innodb know how long a varchar really is? ——Variable length field length list
- 3.3 How many characters can be stored in varchar(M), why is the maximum 16383 displayed?
- 3.4 If the record is NULL, how does innodb handle it? ——NULL value list
- 3.5 What should I do if a column of data occupies a lot of bytes? ——Overflow column in dynamic row format
1. What does InnoDB do?
InnoDBIt is a storage engine that stores the data in the table to disk.
2. How does InnoDB read and write data?
InnoDBThe process of processing data takes place in the memory. The data in the disk needs to be loaded into the memory. If it is to process a write or modify request, the content in the memory needs to be flushed to the disk.
The speed of reading and writing to the disk is very slow, which is several orders of magnitude worse than the reading and writing of the memory, so when we want to get some records from the table, the
InnoDBstorage engine divides the data into several pages, and the page is used as the interaction between the disk and the memory. The basic unit of the
InnoDBmiddle page size is by default
16 KB. That is, under normal circumstances,
16KBthe content read from the disk at least once to the memory, or the content in the memory is
16KBflushed to the disk at least at a time .
innodb_page_sizeVariables cannot be changed while the server is running, and can only be
MySQLspecified when the data directory is initialized for the first time . Therefore, the size of the page cannot be changed at runtime.
3. Tens of thousands of varchar questions-InnoDB row format
Seeing this, you must have the same questions as me, such as
varchar(255)how to choose the maximum length later? Why can't
varchar(65535)it be the best
varchar(16383)? Let me show you!
We usually insert data into the table in record units . The storage method of these records on disk is also called row format or record format. There are four row format, respectively
MySQL 5+The default line format is
MySQL 8after verification is indeed yes.
SHOW VARIABLES LIKE "innodb_default_row_format"
Everyone has hardly modified or paid attention to the
InnoDBline format in business and normal use , so I will only focus on the default line format
dynamic to give you a deeper understanding of the usual development
Please remember this table structure, we will talk about this later
CREATE TABLE test ( c1 VARCHAR(10), c2 VARCHAR(10) NOT NULL, c3 CHAR(10), c4 VARCHAR(10)) CHARSET = utf8mb4;
Now business database character sets are all
utf8mb4, I will use this to minimize the difficulty of understanding.
INSERT INTO test ( c1, c2, c3, c4 ) VALUES('aaaa', '你好啊', 'cc', 'd'),('eeee', 'fff', NULL, NULL);
Now, the record in the table is like this
3.1 dynamic-innodb default row format
The additional information about the record is some additional information that the server has to add in order to describe the record. The additional information is divided into
3categories, namely the variable-length field length list , the
NULLvalue list, and the record header information .
Here I only talk about variable-length field length lists and
NULLvalue lists . Because the record header information is very convoluted, it has nothing to do with this article.
3.2 How does innodb know how long a varchar really is? ——Variable length field length list
Some variable-length data types, such as
BLOBtypes of variable-length data type stored in the field how many bytes of data are not fixed, the need to store the actual data when the number of bytes occupied by data Save it too .
The storage space occupied by these variable-length fields (for example
varchar) is divided into two parts:
- The real data content part is placed in the corresponding column
- The actual number of bytes occupied, placed in the variable length field list part
testthe first record in the table as an example. Because the test table
c4the column is
VARCHAR(10)the type described maximum
10characters, so that three columns are required length value stored at the beginning of the recording, since the
testrespective columns in the table are used in a
utf8mb4character set, each Characters need
4bytes to encode . Let’s take a look at the length of the variable-length fields in the first record:
|Column name||Store content||Content length (decimal representation)||Content length (hexadecimal representation)|
The number of bytes occupied by each variable-length field data is stored in the reverse order of the column order ! !
Since the first row record
c4the string is shorter than the column, that is to say the number of bytes occupied by the content is relatively small, with the
1bytes can be expressed, but if the content of variable length the number of bytes occupied by the column comparison More, it may need
2to be represented by a byte. In the end
varcharhow many characters can put it? Continue to look down.
3.3 How many characters can be stored in varchar(M), why is the maximum 16383 displayed?
We must first understand
Mmean number of characters, not bytes.
20000a character that can't fit?
Why can the prompt only have a maximum
16383of characters? How is this number calculated?
I have to chat with you about this!
varcharIt is variable length.
varchar(64)I may only store
64of characters. Who knows how many characters are stored in this type?
innodbWhen designing, it has been taken into account, but the unit is in bytes, and the
innodbactual number of bytes occupied by the variable-length field must be recorded
L. Of course, it can't be too long, because at
2a byte of space is used to record this
InnoDBWith its set of rules, we introduce
- Assuming that a character set requires at most
Wbytes to represent a character
utf8mb4The character set
utf8The character set
gbkThe character set
asciiThe character set
- For variable length types
VARCHAR(M), this type represents the maximum number
Mof characters that can be stored (note that characters are not bytes),
so the maximum number of bytes occupied by the string that this type can represent is
M × W.
- Suppose it actually stores the number of bytes occupied by the string
Looking at the extreme boundary conditions, in
innodborder to record
varcharhow many bytes are actually stored , at most
2a byte of space is provided to record,
16and every bit, all of them
1, the maximum number that can be recorded
65535one, and the
innodbmaximum number that can be recorded is
varcharoccupied. The number of bytes is just
utf8mb4A character in
4a character set is a byte.
65535 / 4 = 16383.75As long as the
varcharnumber of characters does not exceed
innodb, the actual occupied length can be recorded , and no more can be recorded! So I can explain the picture just now, I’ll post it again here,
varchar(20000)no, the maximum is only
Let's explain the rules (the character set used in the explanation
Rule 1: If the maximum number of bytes allowed to be stored
M × W <= 255, that is , the maximum number
M <= 63 of characters in varchar(M) ,
1bytes to represent
varcharthe actual number of bytes occupied.
InnoDBWhen the read variable-length field length list to view the record table structure, if a variable length field allows the maximum number of bytes is not larger than the storage
255time, i.e. the number of characters is not larger than
63the time , you can only
1bytes to represent real data occupies byte.
Rule 2: If the maximum number of bytes allowed to be stored
M × W > 255, that is, the maximum number
M > 63 of characters in varchar(M) , there are two cases:
If the actual storage of bytes
L <= 127, that is, the actual storage of characters <= ⌊127 / 4⌋ = 31 ,
1one byte can represent
varcharthe actual number of bytes occupied. (⌊… ⌋ means rounding down)
If the actual storage of bytes
L > 127, that is, the actual storage of characters> ⌊127 / 4⌋ = 31 ,
2bytes to represent
varcharthe actual number of bytes occupied.
Another thing to note is that the variable-length field list only stores
NULLthe length of the non- column.
The table record is like this.
For the second record, the
c4column value is
NULL, so just store
The variable-length field length list part of the first record occupies
3byte space, and the variable-length field length list part of the second record occupies
Of course, not all records have this variable-length field length list section, for example in the table all the columns are not variable-length data types or values for all columns are
NULL , then, this part does not need to have. In actual business development, almost nothing is not used
varchar, so records in actual development will have variable-length field length list part
3.4 If the record is NULL, how does innodb handle it? ——NULL value list
If you can see here carefully, you must be a master. If you are like me, it is not recommended in the development specification
NULL, generally write it
NOT NULL. In fact, there is no
NULLvalue list in the record , which also saves space.
If some columns in the table may store
NULLvalues, it will take a lot of space to store these
NULLvalues in the recorded real data . Therefore, the
NULLmanages the columns with these values in a unified manner and stores them in the
NULLvalue list. Its processing The process is like this:
NULLWhich columns are allowed to be stored in the statistics table .
Primary key columns and
NOT NULLmodified columns cannot store
NULLvalues, so these columns will not be counted in the statistics. For example
3each column of the table
c1、c3、c4is allowed to store
NULLvalues, and the
NOT NULLmodified and are not allowed to store
If the table does not allow the storage
NULLvalue of the list does not exist , or would allow each store
NULLa corresponding bit of a row, column order bits in reverse order . Bit binary value
1when the representative value of the column
0when the representative value is not the column
NULL. Since the table
c1、c3、c4all allow the storage
NULLof allowed values for the
NULLcolumn, so that
3the correspondence relationship columns and bits is such:
NULLThe value list must be represented by an integer number of bytes. If the number of binary bits used is not an integer number of bytes, the high-order
0 bit of the byte is filled .
By analogy, if
9all fields in the table are allowed to be
NULL, then the
NULLvalue list of this record needs
2one byte to represent.
For the first record ,
NULL, the corresponding bit is binary
0, hexadecimal notation is
For the second record ,
NULL, for the corresponding binary digit
1, hexadecimal representation is
After these two records are filled with the
NULLvalue list, the schematic diagram is as follows:
3.5 What should I do if a column of data occupies a lot of bytes? ——Overflow column in dynamic row format
If the data stored in a column occupies a lot of bytes, the column may be called an overflow column.
For a column that takes up a lot of storage space, when recording real data, the column will only use
20byte space , and this
20byte of space does not store data, because the data is scattered and stored in several other pages. This
20byte of space stores the addresses of scattered pages and the number of bytes occupied. The scattered pages are the structure of singly linked lists.
Follow-up: If you are
innodbinterested in other row formats of the storage structure, or the record header information that I did not say, you can read the book "How MySQL Works". The difference between me and the book is that the
Compactformat and character set described in the book are
ascii, I chose the default
dynamicformat used in usual development , the character set is
utf8mb4, and the difference from the book is that after the character set change, I have recalculated all the data in the text and diagrams. Everyone may not pay attention to the line format, so just follow the
dynamicformat to understand it, which is closer to the actual development.
Welcome to one-click triple connection~ If you
have any questions, please leave a message, let’s discuss and learn together
----------------------Talk is cheap, show me the code----- ------------------