Mastering the Sort Buffer and sort_buffer_size
October 2, 2015 |
The sort buffer, which is controlled by the
sort_buffer_size setting, is probably one of the most-discussed MySQL settings. Interestingly, in many cases it has a lot less potential to help performance than to hurt it, so configuring it is often about avoiding trouble rather than gaining a lot of performance.
The variable controls the size of a buffer that’s created whenever MySQL has to sort rows. It is per-query, meaning each query gets its own buffer, and it’s allocated to its full size, not as-much-as-needed. This makes large settings potentially dangerous.
The worst abuse of this variable we’ve seen came from a server that was tuned with a script. The script relied on a naive formula that looked at a simplistic ratio of some server variables. Due to the server’s workload the script was never satisfied and continually suggested increasing this variable, which eventually was set to 1GB. The effect was that every time the server needed to sort a few rows, 1GB of memory was allocated. As you might expect, this was not only slow, but crashed the server pretty frequently due to the out-of-memory killer being invoked.
What if the variable is too small? This matters most when there’s a large number of rows to sort (the type of situation you might see with a large analytical query). In this case, the server will generate rows and write them to the buffer in memory, fill the buffer, then write the buffer to a sort file and repeat. The server then sorts these temporary files and merges them to sort the whole result-set. If this happens many times, it can be slow.
This variable can be set per-connection temporarily if such a large query is anticipated, and that’s probably better than setting it large globally. Settings that help large queries can seriously hurt small, fast queries.
You can find out whether a query is potentially a candidate for a larger sort buffer by looking at the number of rows sorted or the sort merge passes required. Database Performance Monitor can use regression to estimate this on a per-query basis (MySQL makes it available as a global status counter, not per-query, unless you are using the Performance Schema in MySQL 5.6 and later). If you identify queries that might benefit from a larger sort buffer, you can then try some experiments by hand. You can look at the
Sort_merge_passeslocal (not global) status counter before and after running the query, and tune and time until you determine whether the setting helps.
If you do configure a large per-connection setting, be sure that you either close the connection or reset the variable before, for example, returning the connection to a connection pool where the setting might impact other queries.