{"id":13,"date":"2008-02-03T14:49:03","date_gmt":"2008-02-03T14:49:03","guid":{"rendered":"http:\/\/psyphi.net\/blog\/?p=13"},"modified":"2010-01-24T16:06:21","modified_gmt":"2010-01-24T16:06:21","slug":"great-pieces-of-code","status":"publish","type":"post","link":"https:\/\/psyphi.net\/blog\/2008\/02\/great-pieces-of-code\/","title":{"rendered":"Great pieces of code"},"content":{"rendered":"<p>A lot of what I do day-to-day is related to optimisation. Be it Perl code, SQL queries, Javascript or HTML there are usually at least a couple of cracking examples I find every week. On Friday I came across this:<\/p>\n<pre>\r\n<code>SELECT cycle FROM goldcrest WHERE id_run = ?<\/code>\r\n<\/pre>\n<p>This query is being used to find the number of the latest cycles (between 1 and 37 for each id_run) in a near-real-time tracking system and is used several times whenever a run report is viewed.<\/p>\n<pre>\r\n<code>EXPLAIN SELECT cycle FROM goldcrest WHERE id_run = 231;\r\n  \r\n+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+\r\n| id | select_type | table \u00c2\u00a0 \u00c2\u00a0 | type | possible_keys | key \u00c2\u00a0 \u00c2\u00a0 | key_len | ref \u00c2\u00a0 | rows \u00c2\u00a0 | Extra \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 |\r\n+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+\r\n| \u00c2\u00a01 | SIMPLE \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0| goldcrest | ref \u00c2\u00a0| g_idrun \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 | g_idrun | \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 8 | const | 262792 | Using where |\r\n+----+-------------+-----------+------+---------------+---------+---------+-------+--------+-------------+<\/code>\r\n<\/pre>\n<p>In itself this would be fine but the goldcrest table in this instance contains several thousand rows for each id_run. So, for id_run, let&#8217;s say, 231 this query happens to return approximately 588,000 rows to determine that the latest cycle for run 231 is the number 34.<\/p>\n<p>To clean this up we first try something like this:<\/p>\n<pre>\r\n<code>SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = ?<\/code>\r\n<\/pre>\n<p>which still scans the 588000 rows (keyed on id_run incidentally) but doesn&#8217;t actually return them to the user, only one row containing both values we&#8217;re interested in. Fair enough, the CPU and disk access penalties are similar but the data transfer penalty is significantly improved.<\/p>\n<p>Next I try adding an index against the id_run and cycle columns:<\/p>\n<pre>\r\n<code>ALTER TABLE goldcrest ADD INDEX(id_run,cycle);\r\nQuery OK, 37589514 rows affected (23 min 6.17 sec)\r\nRecords: 37589514 \u00c2\u00a0Duplicates: 0 \u00c2\u00a0Warnings: 0<\/code>\r\n<\/pre>\n<p>Now this of course takes a long time and, because the tuples are fairly redundant, creates a relatively inefficient index, also penalising future INSERTs. However, casually ignoring those facts, our query performance is now radically different:<\/p>\n<pre>\r\n<code>EXPLAIN SELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;\r\n  \r\n+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+\r\n| id | select_type | table | type | possible_keys | key \u00c2\u00a0| key_len | ref \u00c2\u00a0| rows | Extra \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0|\r\n+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+\r\n| \u00c2\u00a01 | SIMPLE \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0| NULL \u00c2\u00a0| NULL | NULL \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0| NULL | \u00c2\u00a0 \u00c2\u00a0NULL | NULL | NULL | Select tables optimized away |\r\n+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+\r\n  \r\nSELECT MIN(cycle),MAX(cycle) FROM goldcrest WHERE id_run = 231;\r\n+------------+------------+\r\n| MIN(cycle) | MAX(cycle) |\r\n+------------+------------+\r\n| \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a01 | \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 37 |\r\n+------------+------------+\r\n  \r\n1 row in set (0.01 sec)<\/code>\r\n<\/pre>\n<p>That looks a lot better to me now!<\/p>\n<p>Generally I try to steer clear of the mysterious internal workings of database engines, but with much greater frequency come across examples like this:<\/p>\n<pre>\r\n<code>sub clone_type {\r\n  my ($self, $clone_type, $clone) = @_;\r\n  my %clone_type;\r\n\r\n  if($clone and $clone_type) {\r\n    $clone_type{$clone} = $clone_type;\r\n    return $clone_type{$clone};\r\n  }\r\n\r\n  return;\r\n}<\/code>\r\n<\/pre>\n<p>Thankfully this one&#8217;s pretty quick to figure out &#8211; they&#8217;re usually *much* more convoluted, but still.. Huh??<\/p>\n<p>Pass in a clone_type scalar, create a local hash with the same name (Argh!), store the clone_type scalar in the hash keyed at position $clone, then return the same value we just stored.<\/p>\n<p>I don&#8217;t get it&#8230; maybe a global hash or something else would make sense, but this works out the same:<\/p>\n<pre>\r\n<code>sub clone_type {\r\n  my ($self, $clone_type, $clone) = @_;\r\n\r\n  if($clone and $clone_type) {\r\n    return $clone_type;\r\n  }\r\n  return;\r\n}<\/code>\r\n<\/pre>\n<p>and I&#8217;m still not sure why you&#8217;d want to do that if you have the values on the way in already.<\/p>\n<p>Programmers really need to think around the problem, not just through it. Thinking through may result in functionality but thinking around results in both function and performance which means a whole lot more in my book, and incidentally, why it seems so hard to hire good programmers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A lot of what I do day-to-day is related to optimisation. Be it Perl code, SQL queries, Javascript or HTML there are usually at least a couple of cracking examples I find every week. On Friday I came across this: SELECT cycle FROM goldcrest WHERE id_run = ? This query is being used to find &hellip; <a href=\"https:\/\/psyphi.net\/blog\/2008\/02\/great-pieces-of-code\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Great pieces of code&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[11],"tags":[13,16,14,1081,15],"class_list":["post-13","post","type-post","status-publish","format-standard","hentry","category-programming","tag-code","tag-explain","tag-optimisation","tag-programming","tag-sql"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/13","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/comments?post=13"}],"version-history":[{"count":6,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/13\/revisions"}],"predecessor-version":[{"id":19,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/13\/revisions\/19"}],"wp:attachment":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/media?parent=13"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/categories?post=13"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/tags?post=13"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}