Background: I have a big table called “channel” with a few hundred thousand rows in – nothing vast, but big enough to cause some queries to run slower than I want.
Today I was fixing something else and happened to run
show full processlist;
I noticed this taking too long:
SELECT payload FROM channel LIMIT 398800,100;
This is a query used by some web-app paging code. Stupid really – payload isn’t indexed and there’s no use of any other keys in that query. Ok – how to improve it? First of all, see what EXPLAIN says:
mysql> explain SELECT payload FROM channel LIMIT 398800,100;
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | channel | ALL | NULL | NULL | NULL | NULL | 721303 | |
+----+-------------+---------+------+---------------+------+---------+------+--------+-------+
1 row in set (0.00 sec)
Ok. So the simplest way would be to limit a selection of id_channel (the primary key) then select payloads in that set. First I tried this:
SELECT payload
FROM channel
WHERE id_channel IN (
SELECT id_channel FROM channel LIMIT 398800,100
);
Seems straightforward, right? No, not really.
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
Next!
Second attempt, using a temporary table, selecting and saving the id_channels I’m interested in then using those in the actual query:
CREATE TEMPORARY TABLE channel_tmp(
id_channel BIGINT UNSIGNED NOT NULL PRIMARY KEY
) ENGINE=innodb;
INSERT INTO channel_tmp(id_channel)
SELECT id_channel
FROM channel LIMIT 398800,100;
SELECT payload
FROM channel
WHERE id_channel IN (
SELECT id_channel FROM channel_tmp
);
mysql> explain select id_channel from channel limit 398800,100;
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
| 1 | SIMPLE | channel | index | NULL | PRIMARY | 8 | NULL | 722583 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+--------+-------------+
1 row in set (0.00 sec)
mysql> explain select payload from channel where id_channel in (select id_channel from channel_tmp);
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | poss_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
| 1 | PRIMARY | channel | ALL | NULL | NULL | NULL | NULL | 722327 | Using where |
| 2 | DEP SUBQUERY | channel_tmp | unique_subquery | PRIMARY | PRIMARY | 8 | func | 1 | Using index |
+----+--------------+-------------+-------------+---------------+---------+---------+------+--------+-------------+
2 rows in set (0.00 sec)
Let’s try a self-join doing all of the above without explicitly making a temporary table. Self-joins can be pretty powerful – neat in the right places..
mysql> explain SELECT payload
FROM channel c1,
(SELECT id_channel FROM channel limit 398800,100) c2
WHERE c1.id_channel=c2.id_channel;
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
| 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL | NULL | 100 | |
| 1 | PRIMARY | c1 | eq_ref | PRIMARY | PRIMARY | 8 | c2.id_channel | 1 | |
| 2 | DERIVED | channel | index | NULL | PRIMARY | 8 | NULL | 721559 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------+--------+-------------+
3 rows in set (0.21 sec)
This pulls out the right rows and even works around the “no limit in subselect” unsupported mysql feature but that id_channel selection in c2 still isn’t quite doing the right thing – I don’t like all the rows being returned, even if they’re coming straight out of the primary key index.
A little bit of rudimentary benchmarking appears to suggest that the self-join is the fastest, followed by the original query at approximately one order of magnitude slower and trailing a long way behind at around another four-times slower than that, the temporary table. I’m not sure how or why the temporary table performance happens to be the slowest – perhaps down to storage access, or more likely my lack of understanding. Some time I might even try the in-memory table too for comparison.