Using a statistical approach to analyze / monitor MySQL bottleneck queries
What comes up to your mind when you hear the phrase "MySQL performance tuning?" Most of the discussions covering the topic are things like server parameter tuning and blog entries describing the use of the "EXPLAIN" command. However, it is too time-consuming to check every single SQL query. A better approach is at first to determine the SQL queries that are actually consuming the server resources, and then to optimize those queries.
So the question is how to find out the bottleneck queries. At MySQL Conference & Expo 2009, Mark Callaghan explained in his keynote that Google was taking a statistical approach using the "SHOW PROCESSLIST" as a solution. The same command is mentioned as a rival approach in the webpages of MySQL Query Analyzer as well. Today, having a need for something alike, I wrote a set of tiny script files under the name "mprofile" that does the same thing. It can be used as follows.
Sampling SQL queries
Use mpdump to sample running queries from mysqld periodically. The default setting is to fetch 1,000 samples in 0.1 second interval (in other words, it would take a little more than 100 seconds to execute). Please use the --help option to find out the way to specify the number of samples, the interval, and connection parameters to mysqld can be specified as command line parameters.
% mpdump > mpdump.txt
Merging the queries and generating the report
Use mpfilter command to rewrite the variables in the sampled queries, then feed the result into mpreport command to sort the queries from the heaviest one to the lighter. Each line of the report is in the format "probability:SQL query," where probability is the average percentage of if the SQL query is executed. The example below is taken from the mysqld of Pathtraq. It is obvious that which queries are consuming the resources (and that the heaviest query is running at the probability of 138.3%: it is run by 1.383 threads by average).
% mpfilter < mpdump.txt | mpreport | head -100
138.300:select pt_sum(hits,?) as cnt from url inner join counter on url.id=counter.url where (pt_url_compress(?)<=rurlc and rurlc<pt_url_compress(?,?))
99.100:insert into tt select t1.url,cnt_mixed&? as cnt,********** where url.status_code=?
94.200:select id,pt_url_decompress(rurlc) as rurl from url where crawl_cnt<? and crawl_at>=? and crawl_at<?-?*pow(?,crawl_cnt) limit ?
22.000:select id,status_code from url where rurlc=pt_url_compress(?)
14.300:select id,pt_url_decompress(rurlc) as rurl,cnt from (select url,cnt from counter_by_day **********
9.300:insert into counter (url,referer,hits) values (?,?,pt_cnt_add(null,?,?)) on duplicate key update hits=pt_cnt_add(hits,?,?)
7.400:insert ignore into **********
6.100:select pt_sum(hits,?) as cnt from counter where url=?
5.900:select (crawl_at<? and -crawl_at<? and pt_sum(hits,floor(-crawl_at/?))>?) as recrawl from url inner join counter on url.id=counter.url where id=?
5.700:insert into **********
5.700:select t1.id,pt_url_decompress(rurlc) as rurl,cnt,score from **********
4.600:select id,name,dispname from category_ja order by disporder,id
4.300:select addr from **********
3.200:update url set crawl_at=?,lang=?,status_code=if(status_code=?,?,?) where i
d=?
(snip)
By using the report, we can decide from which query to start optimizing, by using explain, or by letting their responses be cached by memcached, etc. It is also possible to replace mpfilter with a house-made script to customize the report.
Conclusion
There are other ways to find out bottlneck queries, like using the slow query log or MySQL Enterprise Monitor. Comparing with them, the merits of using the described solution using "SHOW PROCESSLIST" are
- no need to change current environment
- practically no overhead
- potential to build a customized performance tuning / monitoring tool
Finally, the source code of mprofile is available from /platform/mysql/mprofile – CodeRepos::Share – Trac. Have fun!
mk-query-digest from Maatkit does similar analysis, and it can use the processlist as a source of queries.
It's good to examine two types of queries: those consuming resources (a lot of execution time in aggregate), and those that will create a poor user experience (slow execution time, even if rarely executed).
Posted by: Xaprb | July 24, 2009 at 08:37 PM
Thank you for your comment. I didn't know that Maakit could use processlist as a source. I will try using it.
Posted by: kazuho | July 26, 2009 at 02:23 AM
Jet Profiler for MySQL uses SHOW PROCESSLIST polling too. I totally agree on the advantages of using a statistical approach. It will give you the top offending queries. I've seen a lot of DBAs fighting (unindexed) queries in the slow query log, resulting only in minor performance improvements. Later, it turns out that the main load is caused by well-formed queries that are issued 10 times too often or so. They will never show up in the slow query log.
Posted by: Bjorn Melinder | July 26, 2009 at 06:53 PM
There's a big issue with continually polling SHOW PROCESSLIST like that though - one of the main reasons that MySQL Query Analyzer doesn't use that approach - it locks a major mutex when generating the processlist, LOCK_thread_count.
This is used pretty widely internally (and is one of the major mutex issues for scalability at the SQL layer imho), just checking the sql directory of a 5.1.35 source tree:
Cerberus:sql mark$ grep -rn LOCK_thread_count ./* | wc -l
163
So anybody reading this - beware. Using it "constantly" in production is *highly* not recommended (at least by me).
It's great for sampling though. :)
Posted by: Mark Leith | July 27, 2009 at 06:44 PM
developer.cybozu.co.jp is very informative. Thank you very much for all your work. AAA Toronto Payday Loans 1172 Bay St #101, Toronto, ON M5S 2B4 (416) 477-2817
Posted by: payday loans toronto | March 16, 2010 at 05:48 AM