JSON enhancements in MySQL 8.0

Click the "blue word" above

Follow us and enjoy more dry goods!

Now many application environments can see the flexible shadow of JSON. The recursive level of the data level at each stage can be well distinguished. I have been looking forward to MySQL's JSON, and I have only recently had time to study it.

JSON understanding

JSON is just a string of strings, but the elements are marked with specific symbols. such as:

  • {} Double brackets indicate objects
  • [] Brackets indicate an array
  • "" Enclosed in double quotation marks are attributes or values
  • : The colon indicates that the latter is the value of the former

The difficulty in implementing JSON for relational databases is that relational databases need to define the database and table structure. To cope with this, starting from MySQL 5.7, MySQL supports the JavaScript Object Notation (JSON) data type. Previously, this type of data was not a separate data type and would be stored as a string. The new JSON data type provides an automatically validated JSON document and an optimized storage format.


JSON documents in MySQL are stored in binary format, which provides the following functions:

  • Automatically validate JSON documents stored in JSON columns. An invalid document generates an error.
  • Optimized storage format. The JSON document stored in the JSON column is converted to an internal format that allows quick read access to document elements. JSON value stored in binary format.
  • Fast read access to document elements. When the server reads the JSON document again, it does not need to re-parse the text to obtain the value. Directly find sub-objects or nested values ​​by key or array index without reading all the values ​​in the document.
  • The space required to store JSON documents is roughly the same as LONGBLOB or LONGTEXT.
  • The size of any JSON document stored in a JSON column is limited to the value of the max_allowed_packet system variable.
  • Before MySQL 8.0.13, JSON columns cannot have non-NULL default values.

JSON operation

Data is saved to MySQL, what kind of support is provided in terms of operation? At present, the JSON of MySQL 8.0 version supports a total of 32 ordinary functions and 2 spatial functions:

1. Index:

  • JSON columns, like other binary type columns, are not directly indexed; instead, you can create indexes on the generated columns to extract scalar values ​​from JSON columns. For a detailed example, see Indexing the generated columns to provide a JSON column index.
  • The MySQL optimizer will also look for compatible indexes on virtual columns that match the JSON expression.
  • In MySQL 8.0.17 and later versions, the InnoDB storage engine supports multi-value indexes on JSON arrays. See the multi-value index.
  • MySQL NDB Cluster 8.0 supports JSON columns and MySQL JSON functions, including creating indexes on columns generated from JSON columns, as a solution that cannot index JSON columns. Each NDB table supports up to 3 JSON columns.

2. Comparison and sorting of JSON values:

  • JSON values ​​can be compared using =, <, <=, >, >=, <>, != and <=> operators.
  • JSON values ​​do not support the following comparison operators and functions:
    BETWEEN
    IN()
    GREATEST()
    LEAST()
  • For the comparison operators and functions listed, a workaround is to convert the JSON value to a native MySQL numeric or string data type so that they have a consistent non-JSON scalar type. That is to say, the conversion to the required MySQL field continues to be converted, which can be regarded as a compromise solution.
  • The comparison of JSON values ​​is divided into two levels. The first level of comparison is based on the JSON type of the comparison value. If the types are different, only which type has a higher priority determines the comparison result. If two values ​​have the same JSON type, the second-level comparison is performed using the rules of the specific type.
    BLOB> BIT> OPAQUE> DATETIME> TIME> DATE> BOOLEAN> ARRAY> OBJECT> STRING> INTEGER, DOUBLE> NULL.

3. Conversion between JSON and non-JSON values:

The rules that MySQL follows when converting between JSON values ​​and other types of values:

CAST(other type AS JSON)

The result is a NULL value of JSON type.

mysql>SET @j5 = '{"id":123, "name":"kevin","age":20, "time":"2021-06-01 01:00:00"}';Query OK, 0 rows affected (0.00 sec) mysql>SELECT CAST(JSON_EXTRACT(@j5, '$.age') AS UNSIGNED);+----------------------------------------------+| CAST(JSON_EXTRACT(@j5, '$.age') AS UNSIGNED) |+----------------------------------------------+|                                           20 |+----------------------------------------------+1 row in set (0.00 sec) 

4.JSON value aggregation:

For aggregation of JSON values, NULL values ​​are ignored like other data types. Except for MIN(), MAX() and GROUP_CONCAT(), non-NULL values ​​are converted to numeric types and aggregated. For numeric scalar JSON values, (depending on the value) truncation and loss of precision may occur.

JSON uses the index method:

Indexes cannot be created on MySQL JSON columns. The index is created by extracting scalar values ​​from JSON columns. This can more effectively combine the advantages of MySQL.

  • The MySQL optimizer will look for compatible indexes on virtual columns that match the JSON expression.
  • In MySQL 8.0.17 and later versions, the InnoDB storage engine supports multi-value indexes on JSON arrays
  • MySQL NDB Cluster 8.0 supports JSON columns and MySQL JSON functions, including creating indexes on columns generated from JSON columns, as a solution that cannot index JSON columns. Each NDB table supports up to 3 JSON columns.

1. Virtual column index:

col_name data_type [GENERATED ALWAYS] AS (expr)  [VIRTUAL | STORED] [NOT NULL | NULL]  [UNIQUE [KEY]] [[PRIMARY] KEY]  [COMMENT 'string']

The VIRTUAL or STORED keywords indicate how the column value is stored, which has a great impact on the use of the column:

  • VIRTUAL: does not store the column value, but when reading the row, the column value is calculated immediately after any [BEFORE trigger]. Virtual column does not occupy storage space, but temporarily resides in memory. Currently, this limit is not set in the government.
  • STORED: When inserting or updating a row, the column value will be calculated and stored. The stored columns require storage space and can be indexed.
  • If no keyword is specified, the default is VIRTUAL.
mysql> DROP TABLE IF EXISTS `jemp`;mysql> CREATE TABLE  `jemp` (        id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,        c JSON,	d JSON,        g INT GENERATED ALWAYS AS  (c->"$.id") STORED,        INDEX i (g)    );Query OK, 0 rows affected (0.02 sec)mysql> INSERT INTO jemp (c,d) VALUES ('{"id": "1", "name": "Fred"}' , '{"user":"Fred",   "user_id":1, "zipcode":"[14471,14531]"}'),('{"id": "2", "name": "Wilma"}', '{"user":"Wilma",  "user_id":2, "zipcode":[24472,24532]}'  ),('{"id": "3", "name": "Jack"}' , '{"user":"Jack", "user_id":3, "zipcode":[34473,34533]}'    ), ('{"id": "4", "name": "Betty"}', '{"user":"Betty",  "user_id":4, "zipcode":[44474,44534]}'  ); Query OK, 4 rows affected (0.02 sec)Records: 4  Duplicates: 0  Warnings: 0 mysql> EXPLAIN SELECT c->>"$.name" AS name  FROM jemp WHERE g > 2\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: jemp   partitions: NULL         type: rangepossible_keys: i          key: i      key_len: 5          ref: NULL         rows: 2     filtered: 100.00        Extra: Using where1 row in set, 1 warning (0.00 sec)ERROR:No query specified mysql> SHOW WARNINGS\G*************************** 1. row *************************** Level: Note   Code: 1003Message: /* select#1 */ select json_unquote(json_extract(`db1`.`jemp`.`c`,'$.name')) AS `name` from `db1`.`jemp` where (`db1`.`jemp`.`g` > 2)1 row in set (0.00 sec) 

2. Use multi-value index

Direct interface: MEMBER OF(), JSON_CONTAINS(), JSON_OVERLAPS()

mysql> ALTER TABLE jemp ADD INDEX zips( (CAST(d->'$.zipcode' AS UNSIGNED ARRAY)) ); #MEMBER  OFmysql> EXPLAIN  SELECT * FROM jemp    WHERE  24472 MEMBER  OF(d->'$.zipcode')\G*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: jemp   partitions: NULL         type: refpossible_keys: zips          key: zips      key_len: 9          ref: const        rows: 1     filtered: 100.00        Extra: Using where1 row in set, 1 warning (0.00 sec)#JSON_CONTAINSmysql> EXPLAIN SELECT * FROM jemp   WHERE JSON_CONTAINS(d->'$.zipcode', CAST('[14471,14531]' AS JSON))\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: jemp   partitions: NULL         type: rangepossible_keys: zips          key: zips      key_len: 9          ref: NULL         rows: 2     filtered: 100.00        Extra: Using where1 row in set, 1 warning (0.00 sec) #JSON_OVERLAPSmysql> EXPLAIN SELECT * FROM jemp   WHERE JSON_OVERLAPS(d->'$.zipcode', CAST('[44474,94582]' AS JSON))\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: jemp   partitions: NULL         type: rangepossible_keys: zips          key: zips      key_len: 9          ref: NULL         rows: 2     filtered: 100.00        Extra: Using where1 row in set, 1 warning (0.00 sec)

From the above example, the data query is still based on MySQL B+tree, and JSON is just a mechanism for data storage. By providing fast access to the virtual column mode, it solves the JSON support problem very well.

to sum up

  • The combination of JSON in MySQL is very practical, and the virtual column index solves the performance problem of the query.
  • The size of JSON is indeed a hard problem. Use it with caution (the space is roughly the same as LONGBLOB or LONGTEXT, and the size of the document is limited to the value of the max_allowed_packet system variable).
  • In actual scenarios, you can only choose a moderate JSON length, and you can consider using it with large pages.

About the author

Cui Hulong, Yunhe Enmo MySQL technical consultant, has long served data centers in the financial, gaming, logistics and other industries, designing data storage architecture, and is familiar with the process and specifications of data center operation and management, and automated operation and maintenance. Good at MySQL, Redis, MongoDB database high availability design and operation and maintenance fault handling, backup and recovery, upgrade migration, performance optimization. Self-study has passed MySQL OCP 5.6 and MySQL OCP 5.7 certification. More than 2 years of development experience, 10 years of database operation and maintenance work experience, including 8 years of full-time MySQL work; once served as project manager, database manager, data warehouse architect, MySQL technical expert, DBA and other positions; involved industry: finance (bank , Financial management), logistics, games, medical care, heavy industry, etc.

Original link of Mo Tianlun: https://www.modb.pro/db/68425 (copy to browser or click "read original" to view immediately)

END

Recommended reading: 267 pages! 2020 Annual Database Technology Annual

Recommended download: 2020 Data Technology Carnival PPT download


Nearly 50 PPT downloads and video playbacks of the 2020 Data Technology Carnival have been uploaded to the Motianlun platform, which can be obtained by replying to the keyword " 2020DTC " on the " Data and Cloud " official account !

The brand-new video program "Three Minutes of Data" jointly produced by ACDU (China DBA Alliance) and Mo Tianlun has been released for multiple issues. Quickly understand the dynamics of the data industry, please pay attention to our video number! ↓↓↓

Click the picture below to see more ↓

Yunhe Enmo Lecture Hall| A place to share and communicate

Long press to identify the QR code and join the 10,000-person exchange community

Please note: Yunhe Enmo Lecture Hall

Click "Looking"

Your likes will be seen ❤