Runaway history list

On one of the clusters at Spil we noticed a sudden increase in the length of the history list and a steep increase in the ibdata file in the MySQL directory.
I did post a bit about this topic earlier regarding MySQL 5.5 but this cluster is still running 5.1 and unfortunately 5.1 does not have the same configurable options to influence the purging of the undo log…

History list

Now I did find a couple of great resources that explain the purge lag problem into detail: Pythian, DimitriK and Marco Tusa.

What it boils down to is that the purge lag is largely influenced by the length of the history list and the purge lag:
((purge_lag/innodb_max_purge_lag)×10)–5 milliseconds.
On 5.5 it is also influenced by the number of purge threads and purge batch size. I toyed around with these settings in my earlier post and tuning them helped. However the only setting I could change on 5.1 is the purge lag in milliseconds that was already set to 0. In other words: I could not fiddle around with this. This time it wasn’t an upgrade to 5.5 either so I could not blame that again. 😉

So what was different on this server then? Well the only difference was that it did have “a bit” of disk utilization: around 80% during peak hours. Since it is not used as a front end server it does not affect the end users, but only the (background) job processes that process and store data on this server. However it could be the case that due to the IO utilization it started to lag behind and created a too large history list to catch up with its current configuration.

How did we resolve it then? After I read this quote of Peter Zaitsev on Marco Tusa‘s posting the solution became clear:

Running Very Long Transaction If you’re running very long transaction, be it even SELECT, Innodb will be unable to purge records for changes which are done after this transaction has started, in default REPEATABLE-READ isolation mode. This means very long transactions are very bad causing a lot of garbage to be accommodated in the database. It is not limited to undo slots. When we’re speaking about Long Transactions the time is a bad measure. Having transaction in read only database open for weeks does no harm, however if database has very high update rate, say 10K+ rows are modified every second even 5 minute transaction may be considered long as it will be enough to accumulate about 3 million of row changes.

The transaction isolation is default set to REPEATABLE-READ and we favor it on many of our systems, especially because it performs better than READ-COMMITTED. However a background job running storage server does not need this transaction isolation, especially not if it is was blocking the purge to be performed!

So in the end changing the transaction isolation to READ-COMMITTED did fix the job for us.

Some other things: tomorrow my team is attending the MySQL User Group NL and in three weeks time I’ll be speaking at Percona London:
Percona Live London, December 3-4, 2012
So see you there!