{"id":194,"date":"2010-04-27T11:40:15","date_gmt":"2010-04-27T11:40:15","guid":{"rendered":"http:\/\/psyphi.net\/blog\/?p=194"},"modified":"2010-04-28T09:44:12","modified_gmt":"2010-04-28T09:44:12","slug":"a-little-sql-explain-example","status":"publish","type":"post","link":"https:\/\/psyphi.net\/blog\/2010\/04\/a-little-sql-explain-example\/","title":{"rendered":"A Little SQL &#8220;EXPLAIN&#8221; example"},"content":{"rendered":"<figure style=\"width: 300px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\" \" title=\"rosetta stone\" src=\"http:\/\/farm4.static.flickr.com\/3187\/2802823634_58537a01d3.jpg\" alt=\"\" width=\"300\" height=\"400\" \/><figcaption class=\"wp-caption-text\">http:\/\/www.flickr.com\/photos\/banyan_tree\/2802823634\/sizes\/m\/<\/figcaption><\/figure>\n<p>Background: I have a big table called &#8220;channel&#8221; with a few hundred thousand rows in &#8211; nothing vast, but big enough to cause some queries to run slower than I want.<\/p>\n<p>Today I was fixing something else and happened to run<\/p>\n<pre><code>show full processlist;<\/code><\/pre>\n<p>I noticed this taking too long:<\/p>\n<pre><code>SELECT payload FROM channel  LIMIT 398800,100;<\/code><\/pre>\n<p>This is a query used by some web-app paging code. Stupid really &#8211; <em>payload<\/em> isn&#8217;t indexed and there&#8217;s no use of any other keys in that query. Ok &#8211; how to improve it? First of all, see what EXPLAIN says:<\/p>\n<pre><code>mysql&gt; explain SELECT payload FROM channel  LIMIT 398800,100;\r\n+----+-------------+---------+------+---------------+------+---------+------+--------+-------+\r\n| id | select_type | table   | type | possible_keys | key  | key_len | ref  | rows   | Extra |\r\n+----+-------------+---------+------+---------------+------+---------+------+--------+-------+\r\n|  1 | SIMPLE      | channel | ALL  | NULL          | NULL | NULL    | NULL | 721303 |       |\r\n+----+-------------+---------+------+---------------+------+---------+------+--------+-------+\r\n1 row in set (0.00 sec)<\/code><\/pre>\n<p>Ok. So the simplest way would be to limit a selection of id_channel (the primary key) then select payloads in that set. First I tried this:<\/p>\n<pre><code>SELECT payload\r\nFROM channel\r\nWHERE id_channel IN (\r\n    SELECT id_channel FROM channel LIMIT 398800,100\r\n);<\/code><\/pre>\n<p>Seems straightforward, right? No, not really.<\/p>\n<pre><code>ERROR 1235 (42000): This version of MySQL doesn't yet support\r\n  'LIMIT &amp; IN\/ALL\/ANY\/SOME subquery'<\/code><\/pre>\n<p>Next!<\/p>\n<p>Second attempt, using a temporary table, selecting and saving the id_channels I&#8217;m interested in then using those in the actual query:<\/p>\n<pre><code>CREATE TEMPORARY TABLE channel_tmp(\r\n  id_channel BIGINT UNSIGNED NOT NULL PRIMARY KEY\r\n) ENGINE=innodb;\r\n\r\nINSERT INTO channel_tmp(id_channel)\r\n  SELECT id_channel\r\n  FROM channel LIMIT 398800,100;\r\n\r\nSELECT payload\r\n  FROM channel\r\n  WHERE id_channel IN (\r\n    SELECT id_channel FROM channel_tmp\r\n  );<\/code><\/pre>\n<pre><code>mysql&gt; explain select id_channel from channel limit 398800,100;\r\n+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+\r\n| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |\r\n+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+\r\n|  1 | SIMPLE      | channel | index | NULL          | PRIMARY | 8       | NULL | 722583 | Using index |\r\n+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+\r\n1 row in set (0.00 sec)<\/code><\/pre>\n<pre><code>mysql&gt; explain select payload from channel where id_channel in (select id_channel from channel_tmp);\r\n+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+\r\n| id | select_type  | table       | type            | poss_keys | key     | key_len | ref  | rows   | Extra       |\r\n+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+\r\n|  1 | PRIMARY      | channel     | ALL             | NULL      | NULL    | NULL    | NULL | 722327 | Using where |\r\n|  2 | DEP SUBQUERY | channel_tmp | unique_subquery | PRIMARY   | PRIMARY | 8       | func |      1 | Using index |\r\n+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+\r\n2 rows in set (0.00 sec)<\/code><\/pre>\n<p>Let&#8217;s try a self-join doing all of the above without explicitly making a temporary table. Self-joins can be pretty powerful &#8211; neat in the right places..<\/p>\n<pre><code>mysql&gt; explain SELECT payload\r\n  FROM channel c1,\r\n       (SELECT id_channel FROM channel limit 398800,100) c2\r\n  WHERE c1.id_channel=c2.id_channel;\r\n+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+\r\n| id | select_type | table      | type   | possible_keys | key     | key_len | ref           | rows   | Extra       |\r\n+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+\r\n|  1 | PRIMARY     | derived2   | ALL    | NULL          | NULL    | NULL    | NULL          |    100 |             |\r\n|  1 | PRIMARY     | c1         | eq_ref | PRIMARY       | PRIMARY | 8       | c2.id_channel |      1 |             |\r\n|  2 | DERIVED     | channel    | index  | NULL          | PRIMARY | 8       | NULL          | 721559 | Using index |\r\n+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+\r\n3 rows in set (0.21 sec)<\/code><\/pre>\n<p>This pulls out the right rows and even works around the &#8220;no limit in subselect&#8221; unsupported mysql feature but that id_channel selection in c2 still isn&#8217;t quite doing the right thing &#8211; I don&#8217;t like all the rows being returned, even if they&#8217;re coming straight out of the primary key index.<\/p>\n<p>A little bit of rudimentary benchmarking appears to suggest that the self-join is the fastest, followed by the original query at approximately one order of magnitude slower and trailing a long way behind at around another four-times slower than that, the temporary table. I&#8217;m not sure how or why the temporary table performance happens to be the slowest &#8211; perhaps down to storage access, or more likely my lack of understanding. Some time I might even try the in-memory table too for comparison.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Background: I have a big table called &#8220;channel&#8221; with a few hundred thousand rows in &#8211; nothing vast, but big enough to cause some queries to run slower than I want. Today I was fixing something else and happened to run show full processlist; I noticed this taking too long: SELECT payload FROM channel LIMIT &hellip; <a href=\"https:\/\/psyphi.net\/blog\/2010\/04\/a-little-sql-explain-example\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;A Little SQL &#8220;EXPLAIN&#8221; example&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[58,11],"tags":[1086,16,14,15],"class_list":["post-194","post","type-post","status-publish","format-standard","hentry","category-databases","category-programming","tag-databases","tag-explain","tag-optimisation","tag-sql"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/comments?post=194"}],"version-history":[{"count":21,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/194\/revisions"}],"predecessor-version":[{"id":215,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/194\/revisions\/215"}],"wp:attachment":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/media?parent=194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/categories?post=194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/tags?post=194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}