Don't use DISTINCT and GROUP BY in the same SELECT.
Don't paginate via OFFSET, "remember where you left off".
WHERE (a,b) = (22,33) does not optimize at all.
Explicitly say ALL or DISTINCT after UNION -- it reminds you pick between the faster ALL or the slower DISTINCT.
Don't use SELECT *, especially if you have TEXT or BLOB columns that you don't need. There is overhead in tmp tables and transmission.
It is faster when the GROUP BY and ORDER BY can have exactly the same list.
Don't use FORCE INDEX; it may help today, but will probably hurt tomorrow.
See also discussions about ORDER BY, LIKE, REGEXP, etc. Note: this needs editing with links and more Topics.
This is a huge topic, but it is also the most important "performance" issue.
The main lesson for a novice is to learn of "composite" indexes. Here's a quick example:
is excellent for these:
WHERE last_name = '...' WHERE first_name = '...' AND last_name = '...' -- (order in WHERE does not matter)
but not for
WHERE first_name = '...' -- order in INDEX _does_ matter WHERE last_name = '...' OR first_name = '...' -- "OR" is a killer
innodb_buffer_pool_size should be about 70% of available RAM.
x IN ( SELECT ... )
turn into a
When possible, avoid
Do not 'hide' an indexed column in a function, such as
WHERE DATE(x) = ...; reformulate as
WHERE x = ...
You can generally avoid
WHERE LCASE(name1) = LCASE(name2) by having a suitable collation.
Do no use
OFFSET for "pagination", instead 'remember where you left off'.
SELECT * ... (unless debugging).
Note to Maria Deleva, Barranka, Batsu: This is a place holder; please make remove these items as you build full-scale examples. After you have done the ones you can, I will move in to elaborate on the rest and/or toss them.
Here are some things that are not likely to help performance. They stem from out-of-date information and/or naivety.
PARTITIONingrarely provides performance benefits; it can even hurt performance.
query_cache_sizebigger than 100M will usually hurt performance.
my.cnfmay lead to 'swapping', which is a serious performance problem.
INDEX(foo(20))) are generally useless.
OPTIMIZE TABLEis almost always useless. (And it involves locking the table.)
The most important thing for speeding up a query on any non-tiny table is to have a suitable index.
WHERE a = 12 --> INDEX(a) WHERE a > 12 --> INDEX(a) WHERE a = 12 AND b > 78 --> INDEX(a,b) is more useful than INDEX(b,a) WHERE a > 12 AND b > 78 --> INDEX(a) or INDEX(b); no way to handle both ranges ORDER BY x --> INDEX(x) ORDER BY x, y --> INDEX(x,y) in that order ORDER BY x DESC, y ASC --> No index helps - because of mixing ASC and DESC
A common mistake is to hide an indexed column inside a function call. For example, this can't be helped by an index:
WHERE DATE(dt) = '2000-01-01'
INDEX(dt) then these may use the index:
WHERE dt = '2000-01-01' -- if `dt` is datatype `DATE`
This works for
TIMESTAMP, and even
WHERE dt >= '2000-01-01' AND dt < '2000-01-01' + INTERVAL 1 DAY
OR kills optimization.
WHERE a = 12 OR b = 78
INDEX(a,b), and may or may not use
INDEX(a), INDEX(b) via "index merge". Index merge is better than nothing, but only barely.
WHERE x = 3 OR x = 5
is turned into
WHERE x IN (3, 5)
which may use an index with
x in it.
Subqueries come in several flavors, and they have different optimization potential. First, note that subqueries can be either "correlated" or "uncorrelated". Correlated means that they depend on some value from outside the subquery. This generally implies that the subquery must be re-evaluated for each outer value.
This correlated subquery is often pretty good. Note: It must return at most 1 value. It is often useful as an alternative to, though not necessarily faster than, a
SELECT a, b, ( SELECT ... FROM t WHERE t.x = u.x ) AS c FROM u ... SELECT a, b, ( SELECT MAX(x) ... ) AS c FROM u ... SELECT a, b, ( SELECT x FROM t ORDER BY ... LIMIT 1 ) AS c FROM u ...
This is usually uncorrelated:
SELECT ... FROM ( SELECT ... ) AS a JOIN b ON ...
Notes on the
( SELECT @n := 0 ), thereby initializing an `@variable for use in the rest or the query.
( SELECT ... )with many rows, then efficiency can be terrible. Pre-5.6, there was no index, so it became a
CROSS JOIN; 5.6+ involves deducing the best index on the temp tables and then generating it, only to throw it away when finished with the
A common problem that leads to an inefficient query goes something like this:
SELECT ... FROM a JOIN b ON ... WHERE ... GROUP BY a.id
JOIN expands the number of rows; then the
GROUP BY whittles it back down the the number of rows in
There may not be any good choices to solve this explode-implode problem. One possible option is to turn the
JOIN into a correlated subquery in the
SELECT. This also eliminates the