GROUP BY, or at least to make it harder for query writing developers to be burned by it.
For a long time now, MySQL has contained a notorious nonstandard extension to
GROUP BY, which allows oddball behavior in the name of efficiency. This extension has allowed countless developers around the world to use
GROUP BY in production code without completely understanding what they were doing.
In particular, it's a bad idea to use
SELECT * in a
GROUP BY query, because a standard
GROUP BY clause requires enumerating the columns. Many developers have, unfortunately, done that.
The MySQL team has been trying to fix this misfeature without messing up production code. They added a
sql_mode flag in 5.7.5 named
ONLY_FULL_GROUP_BY to compel standard behavior. In a recent release, they turned on that flag by default. When you upgraded your local MySQL to 5.7.14, the flag got switched on and your production code, dependent on the old extension, stopped working.
If you've recently started getting 1055 errors, what are your choices?
sql_modeto get rid of the newly set
You can change the mode by doing a
SET sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
should do the trick if you do it right after your application connects to MySQL.
Or, you can find the init file in your MySQL installation, locate the
sql_mode= line, and change it to omit
ONLY_FULL_GROUP_BY, and restart your server.
SELECT item.item_id, item.name, /* not SQL-92 */ COUNT(*) number_of_uses FROM item JOIN uses ON item.item_id, uses.item_id GROUP BY item.item_id
will show the rows in a table called
item, and show the count of related rows in a table called
uses. This works well, but unfortunately it's not standard SQL-92.
Why not? because the
SELECT clause (and the
ORDER BY clause) in
GROUP BY queries must contain columns that are
GROUP BYclause, or
MIN(), and the like.
SELECT clause mentions
item.name, a column that does not meet either of those criteria. MySQL 5.6 and earlier will reject this query if the SQL mode contains
This example query can be made to comply with the SQL-92 standard by changing the
GROUP BY clause, like this.
SELECT item.item_id, item.name, COUNT(*) number_of_uses FROM item JOIN uses ON item.item_id, uses.item_id GROUP BY item.item_id, item.name
The later SQL-99 standard allows a
SELECT statement to omit unaggregated columns from the group key if the DBMS can prove a functional dependence between them and the group key columns. Because
item.name is functionally dependent on
item.item_id, the initial example is valid SQL-99. MySQL gained a functional dependence prover in version 5.7. The original example works under
SELECT item.item_id, uses.category, /* nonstandard */ COUNT(*) number_of_uses FROM item JOIN uses ON item.item_id, uses.item_id GROUP BY item.item_id
will show the rows in a table called item, and show the count of related rows in a table called uses. It will also show the value of a column called
This query works in MySQL (before the
ONLY_FULL_GROUP_BY flag appeared). It uses MySQL's nonstandard extension to
But the query has a problem: if several rows in the
uses table match the
ON condition in the
JOIN clause, MySQL returns the
category column from just one of those rows. Which row? The writer of the query, and the user of the application, doesn't get to know that in advance. Formally speaking, it's unpredictable: MySQL can return any value it wants.
Unpredictable is like random, with one significant difference. One might expect a random choice to change from time to time. Therefore, if a choice were random, you might detect it during debugging or testing. The unpredictable result is worse: MySQL returns the same result each time you use the query, until it doesn't. Sometimes it's a new version of the MySQL server that causes a different result. Sometimes it's a growing table causing the problem. What can go wrong, will go wrong, and when you don't expect it. That's called Murphy's Law.
The MySQL team has been working to make it harder for developers to make this mistake. Newer versions of MySQL in the 5.7 sequence have a
sql_mode flag called
ONLY_FULL_GROUP_BY. When that flag is set, the MySQL server returns the 1055 error and refuses to run this kind of query.
Sometimes a query looks like this, with a
* in the
SELECT item.*, /* nonstandard */ COUNT(*) number_of_uses FROM item JOIN uses ON item.item_id, uses.item_id GROUP BY item.item_id
Such a query needs to be refactored to comply with the
To do this, we need a subquery that uses
GROUP BY correctly to return the
number_of_uses value for each
item_id. This subquery is short and sweet, because it only needs to look at the
SELECT item_id, COUNT(*) number_of_uses FROM uses GROUP BY item_id
Then, we can join that subquery with the
SELECT item.*, usecount.number_of_uses FROM item JOIN ( SELECT item_id, COUNT(*) number_of_uses FROM uses GROUP BY item_id ) usecount ON item.item_id = usecount.item_id
This allows the
GROUP BY clause to be simple and correct, and also allows us to use the
Note: nevertheless, wise developers avoid using the
* specifier in any case. It's usually better to list the columns you want in a query.
SELECT item.item_id, ANY_VALUE(uses.tag) tag, COUNT(*) number_of_uses FROM item JOIN uses ON item.item_id, uses.item_id GROUP BY item.item_id
shows the rows in a table called
item, the count of related rows, and one of the values in the related table called
You can think of this
ANY_VALUE() function as a strange a kind of aggregate function. Instead of returning a count, sum, or maximum, it instructs the MySQL server to choose, arbitrarily, one value from the group in question. It's a way of working around Error 1055.
Be careful when using
ANY_VALUE() in queries in production applications.
It really should be called
SURPRISE_ME(). It returns the value of some row in the GROUP BY group. Which row it returns is indeterminate. That means it's entirely up to the MySQL server. Formally, it returns an unpredictable value.
The server doesn't choose a random value, it's worse than that. It returns the same value every time you run the query, until it doesn't. It can change, or not, when a table grows or shrinks, or when the server has more or less RAM, or when the server version changes, or when Mars is in retrograde (whatever that means), or for no reason at all.
You have been warned.