The ProxySQL multiplexing wild goose chase

TL;DR – We encountered multiplexing issues with ProxySQL and after going around in circles for a while we found that the impact of mysql-auto_increment_delay_multiplex and mysql-connection_delay_multiplex_ms was not documented. Read the blog post why you should check these values in your configuration and check how your codebase handles last insert identifiers!

At my present company we are using a multi-layer ProxySQL setup to route our traffic to the appropriate database cluster’s primary or replica hosts. For this post it doesn’t matter whether you run a single or a multi-layer setup, have many backend servers configured or not, so I’ll simplify our architecture to a single ProxySQL layer where the application connects to all three proxies evenly:

proxysql-multiplexing-wild-goose-chase-topology

The reason for having N+2 proxies is that this ensures us that we can retain high availability after failure of a single node. I’ve also added a fourth (canary) node and sent a small amount of traffic to that to see how any change would affect multiplexing.

One of the key features of ProxySQL is that it’s able to multiplex connections. In short: ProxySQL will attempt to send queries to your database servers over idle database connections. In effect queries of separate incoming connections can be sent to the master with as little backend connections as possible, lowering the connection overhead on your database servers. This could mean queries from the same incoming connection are being multiplexed over multiple backend connections. The limitation is that you can’t multiplex queries within a transaction and neither if you place locks, as this would require the connection to stick with the same backend connection. See also the ProxySQL multiplexing documentation on every condition where multiplexing will be (temporarily) disabled.

Problems encountered with high connection counts

After migration of a single database into multiple shards, we found that ProxySQL was hardly multiplexing connections at all and due to the increase in connections by adding a shard this wasn’t a scalable solution. The number of ingress connections is about 600 per shard, while the number egress connections stuck around 400 per shard. This meant that the ratio of ingress vs egress is about 66% and that’s not a good sign as ProxySQL is supposed to be multiplexing. A good and effective ratio would be more in the lines of 5%. We were certain ProxySQL was multiplexing on other hostgroups before as the ratio was far more favorable there.

proxysql-multiplexing-wild-goose-chase-first-layer-high-connections

Incoming and outgoing ProxySQL connections are ridiculously high

Why is this a big issue then? For us having more than 2000 active connections frontend and about 1400 backend meant ProxySQL was using a lot of CPU time. On a 4 core machine we noticed our CPUs were over 70% busy all the time, which wouldn’t allow us to lose a single ProxySQL node anymore. To stay within safe margins we first upgraded all ProxySQL hosts to 8 cores and this kept the CPUs within the regions of 40% to 50%. Adding a new shard would increase the incoming connections with 600 and backend connections with 400. That would only allow us to add maybe one or two shards before we would no longer be able to lose a ProxySQL node again and upgrade once more. Adding more ProxySQL hosts would work, but as we have a multi-layer ProxySQL architecture we need to scale all layers to keep up with the incoming connections on the other layers as well. In other words: our sharded environment just faced a new challenge as scale out wasn’t an (easy) option anymore.

If you search for ProxySQL and Multiplexing, the top results will be the documentation of ProxySQL Multiplexing and a few blog posts describing the wonders of ProxySQL multiplexing. In the documentation a few conditions when multiplexing is disabled are described, so naturally we examined if our application was meeting these conditions. We examined the general log, query digest inside ProxySQL, tcpdumps and concluded it wasn’t meeting any of these conditions and thus theoretically multiplexing should be functioning. But as we already established with the high incoming and backend connection it clearly wasn’t. Our first dead end.

Investigating ProxySQL bug reports

Then we started to investigate bug reports of ProxySQL to see if anything matched there. At first we found issue #1897 where Multiplexing was erroneously was disabled and there were a few other solved issues that hinted we should try to upgrade. However upgrading a production environment without knowing the impact is never favorable, so instead we added a fourth ProxySQL host and sent 1% of the traffic towards it. This allowed us to easily upgrade the host to various ProxySQL versions and see if this would resolve the issue. Unfortunately we found none of the upgrades would resolve our multiplexing issues. Another dead end.

We did notice the set names metric in ProxySQL was increasing fast, so this lead us to issue #1019 where multi-layered ProxySQL can have issues with set names where the database has a different characterset as ProxySQL’s default characterset.  This was the case with for us, but the documentation doesn’t mention set names influences multiplexing and the previous upgrade to 2.0.8 didn’t resolve our issues. At least we found out why the number of set names is increasing. Another dead end.

Then one of our developers pointed me towards ProxySQL issue #2339 in which a wjordan requests a documentation update on temporary conditions when multiplexing is disabled. He describes the mysql-auto_increment_delay_multiplex and mysql-connection_delay_multiplex_ms variables to be missing in this page.  I totally ignored this when I searched through the ProxySQL issues as the title contained the word documentation. Insert face palm here.

The two variables basically are a workaround for issue #1048 (Problem with “LAST_INSERT_ID()” returning 0), issue #1421 (Forward SELECTs on LAST_INSERT_ID) and issue #1452 (select last_insert_id() as ID) and are bundled in issue #1828. Ironically issue #1828 is the source for issue #1897, so I went full circle there! Insert second facepalm here.

How ProxySQL delays multiplexing

So what do these two variables do then? First we have to understand the origin of the problem here: with multiplexing enabled, whenever an insert or update would cause an auto increment column to increase, the LAST_INSERT_ID() query would not be guaranteed to be run on the same connection or right after the insert happened. So either it would be proxied over another connection (returning a wrong identifier or 0) or it would be run out of order due to multiplexing (again returning a wrong identifier or 0). These two variables allow you to have the connection stop multiplexing for X-number of queries or X-number of milliseconds after an insert or update happened on an auto increment column.

But wait! How does ProxySQL detect an auto increment column increased then? That’s quite simple: on every successfully query the MySQL binary protocol returns an OK packet to the client. Whenever an insert or update would trigger an auto increment column this OK packet also returns the last inserted identifier. This is what triggers ProxySQL to stop multiplexing for X-number of queries or milliseconds.

But wait! Why did the majority of applications still function correctly before ProxySQL 1.4.14 then? That’s also quite simple: most programming languages are using the native MySQL drivers. For instance PHP PDO makes use of mysqlnd that uses the native MySQL C-driver. Just like ProxySQL, mysqlnd reads the OK packet and stores the last inserted identifier internally. So when you make use of the LastInsertId function in PDO, this will retrieve the value from mysqlnd internally using the function . In general you can assume database drivers never run a SELECT LAST_INSERT_ID() against a database. However you should be cautious with some ORMs, like Hybernate, that are actually depending on queries like this.

proxysql-multiplexing-wild-goose-chase-mysqlnd-native-driver

mysqlnd stores the last insert identifier internally

The default for mysql-auto_increment_delay_multiplex is 5 and mysql-connection_delay_multiplex_ms is 0. So whenever ProxySQL encounters an OK packet with last inserted identifier set it will disable multiplexing on the connection for 5 consecutive queries. This basically locks the frontend connection to the backend connection for at least 5 queries. This to allow an ORM (or application) to run a LAST_INSERT_ID() query on the same connection. New incoming connections will then have to use a different connection from the connection pool. Obviously whenever ProxySQL encounters another OK packet with last inserted identifier it will reset this connection again to 5 consecutive queries again.

Back to our own problem statement

So why did this happen only to the sharded databases then? The primary key of our main table in the sharded database contains a UUID and not an auto increment. There was, however, an auto increment column present on the table to facilitate a sequence number in our records. We clearly overlooked this in our investigation as well. There are various reasons why we currently have to keep this column, so it wasn’t feasible to remove the column for now.

Also our sharded databases have a read/write ratio of around 3:2. That means 2 out of 5 queries will cause an auto increment to trigger and thus lock the frontend connection to a backend connection for at least 5 consecutive queries! With that ratio we will hardly ever multiplex! Once we changed this variable on our canary ProxySQL we immediately saw a significant drop in backend connections on ProxySQL. First we set mysql-auto_increment_delay_multiplex to 0, which caused all 16 incoming connections to be proxied over an average of 0 connections! When we changed the value to 1 it averaged over 3 connections!

proxysql-multiplexing-wild-goose-chase-connection-drop-canary

Drop in active backend connections after setting mysql-auto_increment_delay_multiplex to 0

Before applying this to all other ProxySQL hosts, there was one more hurdle: both variables are global, so they apply to all hostgroups. Once we established our codebase never ran a LAST_INSERT_ID() anywhere we changed it on all other ProxySQL hosts. The difference is night and day:

proxysql-multiplexing-wild-goose-chase-first-layer-connection-drop

Effect of setting mysql-auto_increment_delay_multiplex to 0 on first ProxySQL layer

And the CPU usage dropped to a more sane 30%:

proxysql-multiplexing-wild-goose-chase-first-layer-cpu-usage-drop

CPU usage decreases dramatically after multiplexing works again

Conclusion

There are clearly three problems with the introduction of mysql-auto_increment_delay_multiplex and mysql-connection_delay_multiplex_ms:

  1. Both variables are global variables. If you change them, they will apply to all ProxySQL hostgroups, servers, users and connections
  2. The default of mysql-auto_increment_delay_multiplex is set to 5. This means multiplexing will be disabled or less effective on any write-intensive workload that contains an auto increment column. This is regardless whether your application actually uses the LAST_INSERT_ID statement or not.
  3. With the introduction of both variables in version 1.4.14, the documentation on both variables was updated. However it’s impact on multiplexing has never been updated.

By the time this blogpost goes public I have already made a change in the documentation of ProxySQL on multiplexing. I’ve also created a feature request for ProxySQL for the ability to control the delay per server or per user. But after I submitted the feature request I realized that this might actually make it only worse: even more variables, configurations and query rules to check and it doesn’t fix the problem at it’s root. It would be much cleaner for ProxySQL to store the last inserted identifier returned by the OK packet in a variable bound to the frontend connection that made the request. This identifier can then be returned whenever that frontend connection has a query that contains the LAST_INSERT_ID function.

Relearn about your battery

Some days ago I got a call from our support engineer on duty that MySQL on one of our database servers was lagging more than 1000 seconds behind in replication and the server got kicked out of the pool because of the delay. He was unable to find out why and there was absolutely nothing in the mysql log files. When I got the call it was still lagging behind but the lag was slowly decreasing again.

After a quick peek in all our monitoring systems I isolated it to this message:
Cache Battery 0 in controller 0 is Charging (Ready) [probably harmless]
Apparently not that harmless! 😛

Obviously we did encountered this situation a couple of times before but apparently there was no detection on this machine.

The relearn cycle happens every 90 days and gets first scheduled when the machine gets powered on. Now imagine this happening in a master-master setup where both machines were powered on at the same time. Lucky enough you can use omconfig to reschedule the cycle up to 7 days, but then you obviously need to have detection in place.

Why did nobody come up with the idea to have a dual battery backed up cache with alternating relearn cycles? That way you can have your battery relearn without the controller going back into write-through mode. 😉

Percona Monitoring Plugins fix: Query Response Time

After switching from the Better Cacti Templates to the Percona Monitoring Plugins we suffered from the Query Response Time and the Query Time Histogram not graphing at all.

It took me a while to figure out why our new instance wasn’t graphing: the response time query was performing and the script was picking up its values. Also the rrd files were created but for some bizarre reason all values were set to “NaN”, in rrd/cacti terms: Not a Number. If you search on that subject you will come across a lot of (forum) postings stating you need to change your graph type to “GAUGE” or change the MIN/MAX values for your data templates. Strange as this was already set to a minimum of 0 and a maximum of an unsigned 64 bit integer.

After running the ss_get_mysql_stats.php script manually for these graphs I got the error stating that a -1 value was not allowed. Indeed the output of the script contained a -1 value for the last measurement and I quickly found the culprit: an uninitialized array key inside the script will cause it to return a -1 value. Now why was this array key not initialized? Simply because the query filling the array was capped to 13 rows instead of the expected 14 rows.

This left me with three options:
1. Change cacti templates to allow -1 value
2. Change cacti templates to only contain 13 data points instead of 14
3. Change the query in the ss_get_mysql_stats.php script
Naturally I patched the script and after 10 minutes the graphs started to get colorful again! 😉

So if you have the same problem as we do, you can find the patch attached to my bugreport:
Query reponse time and Query time histogram not graphing

Running multiple MySQL instances in parallel

I know, I haven’t been posting much lately. 5.5 upgrades got postponed due to the new storage platform needing my immediate attention and being a speaker at the Percona Live conference in April also needs a lot of attention.

One of the things I want to try out is running multiple MySQL instances on the same machine. The concept remained in the back of my mind ever since I attended Ryan Thiessen’s presentation on the MySQL conference 2011 but we never actually got a proper usecase for it. Well, with the new storage platform it would be really beneficial so an excellent use case to try it out! So what have I been busy with in the past week? That’s right: running multiple instances MySQL on one single server. 😉

Even though it is not well documented and nobody describes the process in depth it is not that complicated to get multiple instances running next to each other. However it does involve a lot of changes in the surrounding tools, scripts and monitoring. For example, this is what I changed so far:

1. MySQL startup script
Yes, you really want this baby to support multiple instances. I’ve learned my lesson with the wildgrowth of copies of the various MMM init scripts.

2. Templating of configs
If you want to maintain the instances well you should definitely start using a fixed template which includes a defaults file. In our case I created one defaults file for all instances and each and every instance will override the settings of the defaults file. Also some tuning parts are now separated from the main config.

3. Automation of adding new instances to a host
Apart from a bunch of config files, data directory you really want to have some intelligence when adding another instance. For example only the innodb_buffer_pool_size needs to be adjusted for each new instance you add.

4. Automation of removing instances from a host
Part of the step above: if you can add instances, they you need to be able to remove an instance. This should be done with care as it will be destructive. 😉

After this there is still a long extensive list of things to be taken care of:
1. Automation of replication setup
The plan is to keep things simple and have two hosts replicate all instances to each other. So the instance 3307 on host 1 will replicate to instance 3307 on host 2 (and back), instance 3308 on host 1 will replicate to instance 3308 on host 2 (and back), etc.

2. HA Monitoring needs thinking/replacement
I haven’t found a HA Monitoring tool that can handle multiple instances on one host.
Why is this a problem?
If only one of the MySQL instances needs maintenance you can’t use the current tools unless you are willing to make all other instances unavailable as well. Also what will you do when the connection pool of one instance gets exhausted? Or if one instance on both servers die?

3. Backup scripts needs some changes
Obviously our backup tools (wrapper scripts around xtrabackup) need some alteration. We are now running multiple instances, so we need to backup more than one database.

4. Cloning scripts need some changes
We have a script that can clone a live database (utilizing xtrabackup) to a new host. Apart from the fact that it assumes it needs to clone only one single database we might also go for full cloning of all instances

5. Monitoring needs to understand multiple instances
Our current (performance) monitoring tools, like Nagios/Cacti/etc, only assume one MySQL instance per host. At best I can implement the templates multiple times, but that also increases the number of other checks with the same factor.

And there obviously a lot more things I haven’t thought of yet. As you can see I’ll be quite occupied in the upcoming period…

MySQL 5.5 upgrade blues (part two)

Shortly after the MySQL 5.5 upgrade the whole cluster was upgraded with extra ram. This was a nice test to see how differently 5.1 and 5.5 behave when they A) innodb bufferpool is too small and B) when the innodb bufferpool has enough room to fit everything in memory.

The MySQL 5.5 had just the same pattern in terms of disk utilization as the other nodes before (around 30% to 40%) and after the upgrade (4% to 5%), so not much difference at all. However the number of free pages within the bufferpool is significantly lower (about 10%) than on the other nodes. This definitely needs some further investigation.

Apart from that the machine is stable and it seems we will proceed with the upgrade on the whole cluster soon.

A sidenote: I’m happy to announce that I was selected as a speaker at the Percona Live MySQL Conference & Expo in San Francisco, April 2012. I’ll be talking about Spil Games (the company I work for) and how our new architecture will solve or ease up the majority of our database issues.
Percona Live MySQL User's Conference, San Francisco, April 10-12th, 2012

MySQL 5.5 upgrade blues (part one)

At the company I work for we are still running Percona Server 5.1 in production and are slowly heading towards a Percona Server 5.5 rollout. It did take a lot of preparation in the past few months (write a my.cnf conversion script for example) and a lot of testing. A couple of machines already have been upgraded this week to 5.5 to compare performance and stability. So far the machines proved to be stable enough to keep them on 5.5 and even better: we already see a couple of benefits! However, the title wouldn’t have been blues if everything would have been a breeze, right? 😉

First problem we ran into was that our Cacti templates broke due to the changed InnoDB status output. So I headed towards the Cacti templates and looked in the issue tracker if the issue was already known. Apparently it was already known, but unfortunately not fixed yet. Lucky enough writing the fixes myself wasn’t much of a problem.

Secondly we ran into the issue that the history list was growing from a “steady” 200 to 4000 after upgrade. Searching on this topic revealed a problem with the purge operations but it was not clear to me what exactly was the problem. According to the MySQL documentation the default should suffice. Uhm, right?

Now I knew some things have changed in 5.5 regarding purging: a separate purge thread was already introduced in 5.1 but could it have been so different then? So I tried to find out what each and every purge variable would do:

innodb_max_purge_lag 0
innodb_purge_batch_size 20
innodb_purge_threads 1
relay_log_purge ON

At first I assumed that increasing the batch size would make the purging more efficient: the larger the batch the more it could handle, right?
Wrong: the larger you set it, the later it will purge! I found this on the MySQL documentation about it:

The granularity of changes, expressed in units of redo log records, that trigger a purge operation, flushing the changed buffer pool blocks to disk.

So the name is actually confusing! In our case it went from 20 to 40 making things worse and then from 40 to 10 making the history list go from 4000 to 1800.

Then I decided to see what the purge lag would do. Changing the purge lag as described by Peter did indeed lower the history list for a short while, but MMM also kicked the 5.5 server out of its pool because it started lagging behind in replication! So this is definitely something to keep in the back of your mind!

I did not change the purge threads to 0 since it is a machine that runs in our production environment. Also confusing is the deprecated innodb_use_purge_thread that could be set to different values than 0 and 1 but is marked as experimental.

This graph shows best how it worked out:
InnoDB transactions over a week


In the end I lowered the purge batch size to 1 and the history list went back from 4000 to its “normal” 200.

I’m positive a part two will come shortly, so stay tuned. 😉

Duplicate entries while deleting records?

No duplicate content
This weekend I got a call from the support engineer on duty that night. He noticed in one of our clusters all slave machines were failing replication but he could not figure out why.

At first I thought it would be an easy problem of the master and slaves being out of sync and the row-based replication failing on not finding the row, but then I noticed all machines were actually still running statement-based replication. As far as I could recall we did that to circumvent another issue that has already been solved months ago but for some reason we never put it back to row-based replication.

A simple SHOW SLAVE STATUS revealed something similar to this:

Last_Errno: 1062
Last_Error: Error 'Duplicate entry '272369' for key 'PRIMARY'' on query. Default database: 'userdata'. Query: 'DELETE u, uf1, uf2, up FROM `users` u LEFT JOIN `friends` uf1 ON u.id = uf1.id LEFT JOIN `friends` uf2 ON u.id = uf2.friend_id LEFT JOIN `prefs` up ON u.id = up.id WHERE u.name = 'testuser';

Continue reading “Duplicate entries while deleting records?”

LRU follow up

Due to one of the machines being configured a little bit too close to the edge we had to restart it to downsize its innodb_buffer_pool_size a little bit. We tested the innodb_lru_dump_restore directive with this machine. Results were very promising: it wrote the dump every 5 minutes (we set the variable to 300 seconds) and after MySQL had restarted it reloaded the dump.

However: since MySQL is already started fully before it even starts to load the LRU dump it means MySQL is already available to the outside world. This means in a HA environment it would already be going to perform poorly due to the torrent of queries coming in. This means either the loading of the LRU dump needs to be done up front by a change in Percona Server or we need to alter MMM not to do anything with the server untill it has loaded the LRU dump.

Challenges, challenges…

LRU Dump restore

Yesterday one of my colleagues pointed me to the innodb_lru_dump_restore variable available in the Percona MySQL Server distribution. See more about it here:
innodb_lru_dump_restore
LRU meaning that the list is in Least Recently Used order to make it easier to remove old items off the pages. Reading from the description it can be a very nice option for already running servers to maintain this list after startup, but it did not solve my immediate problem: I had to warm up two new freshly cloned read slaves on one of our busiest databases.

Normally I use mk-query-digest with the –processlist option on the (active) master, but in our setup this is seriously flawed because Continue reading “LRU Dump restore”

Why MySQL Quicksand?

Why MySQL Quicksand? Well, that’s easy enough: from experience I’ve learned that the margin between a fully functional MySQL database server and one that’s going into a grinding halt can be a very thin hairline. Getting it back to work may be a hell but it will, most of the time, be very possible to save your server.

Quicksand has the same trouble: it may appear to be quite sold, but a very small disturbance can cause a sudden decrease in viscosity and change it from solid to a liquid like substance. You will sink into it but, generally, not drown. 😉

What will be posted here? Well not everything will be exclusively MySQL but mostly be related to it. You can expect warstories, brainstorms, reposts, links, etc.

My main goal is that whatever I post here will be useful for someone else. 🙂