Precautions for mysql using group by data grouping

Pit in mysql group

As introduced at the beginning of this article, there can only be 2 types of columns after select in the grouping:

The column that appears after the group by

Columns using aggregate functions

Oracle, sqlserver, and db2 also follow this specification.

The 5.7 version is used in the article, and the default is to follow this specification.

Some early versions of mysql do not have the above requirements, and select can be followed by any legal column.

Example

Requirement: Get the maximum amount of each user's order and the year of the order, output: user id, maximum amount, year, written as follows:

mysql> select          user_id 用户id, max(price) 最大金额, the_year 年份        FROM t_order t        GROUP BY t.user_id;ERROR 1055 (42000): Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'javacode2018.t.the_year' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

The above SQL error is reported because it the_yeardoes not meet the 2 rules mentioned above (the column after select must appear in the group by or use an aggregate function), which sql_modelimits this rule. Let's take sql_modea look at the configuration:

mysql> select @@sql_mode;+-------------------------------------------------------------------------------------------------------------------------------------------+| @@sql_mode                                                                                                                                |+-------------------------------------------------------------------------------------------------------------------------------------------+| ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |+-------------------------------------------------------------------------------------------------------------------------------------------+1 row in set (0.00 sec)

Included in sql_mode ONLY_FULL_GROUP_BY, this means that the column behind select must conform to the 2 points mentioned above.

You can ONLY_FULL_GROUP_BYremove it, and you can add any column after select. Let's take a look at the effect.

Modify the my.inifile in mysql :

sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

Restart mysql and run it again, the effect is as follows:

mysql> select          user_id 用户id, max(price) 最大金额, the_year 年份        FROM t_order t        GROUP BY t.user_id;+----------+--------------+--------+| 用户id   | 最大金额     | 年份   |+----------+--------------+--------+|     1001 |        88.88 |   2017 ||     1002 |        44.44 |   2018 ||     1003 |        66.66 |   2018 |+----------+--------------+--------+3 rows in set (0.03 sec)

Take a look at the above data, the first 88.88year is the 2017year, let’s take a look at the original data:

mysql> select * from t_order;+----+---------+---------------+-------+----------+| id | user_id | user_name     | price | the_year |+----+---------+---------------+-------+----------+|  1 |    1001 | 路人甲Java    | 11.11 |     2017 ||  2 |    1001 | 路人甲Java    | 22.22 |     2018 ||  3 |    1001 | 路人甲Java    | 88.88 |     2018 ||  4 |    1002 | 刘德华        | 33.33 |     2018 ||  5 |    1002 | 刘德华        | 12.22 |     2018 ||  6 |    1002 | 刘德华        | 16.66 |     2018 ||  7 |    1002 | 刘德华        | 44.44 |     2019 ||  8 |    1003 | 张学友        | 55.55 |     2018 ||  9 |    1003 | 张学友        | 66.66 |     2019 |+----+---------+---------------+-------+----------+9 rows in set (0.00 sec)

By comparison, user_id=1001 and price=88.88 are the third data, that is, the_year is 2018, but the above grouping result is 2017, and the result is inconsistent with our expectations. At this time, mysql treats this column that is not in accordance with the specification. , Out of order, mysql takes the first one.

Two correct writing methods are provided, as follows:

mysql> SELECT          user_id 用户id,          price 最大金额,          the_year 年份        FROM          t_order t1        WHERE          (t1.user_id , t1.price)          IN          (SELECT             t.user_id, MAX(t.price)           FROM             t_order t           GROUP BY t.user_id);+----------+--------------+--------+| 用户id   | 最大金额     | 年份   |+----------+--------------+--------+|     1001 |        88.88 |   2018 ||     1002 |        44.44 |   2019 ||     1003 |        66.66 |   2019 |+----------+--------------+--------+3 rows in set (0.00 sec) mysql> SELECT          user_id 用户id,          price 最大金额,          the_year 年份        FROM          t_order t1,(SELECT                        t.user_id uid, MAX(t.price) pc                      FROM                        t_order t                      GROUP BY t.user_id) t2        WHERE          t1.user_id = t2.uid        AND  t1.price = t2.pc;+----------+--------------+--------+| 用户id   | 最大金额     | 年份   |+----------+--------------+--------+|     1001 |        88.88 |   2018 ||     1002 |        44.44 |   2019 ||     1003 |        66.66 |   2019 |+----------+--------------+--------+3 rows in set (0.00 sec)

The first wording above is relatively rare, inand it uses multi-field query.

Suggestion: When writing a grouping query, it is best to write it in accordance with the standard specification. The column that appears after the select must be in the group by or an aggregate function must be used.

to sum up

When writing a grouping query, it is best to write it in accordance with the standard specification. The columns appearing after select must be in the group by or aggregate functions must be used .

Select syntax order: select, from, where, group by, having, order by, limit, the order can not be wrong, otherwise an error will be reported.

The use of in multi-column query, go on to try