{"id":571,"date":"2012-05-04T22:45:21","date_gmt":"2012-05-04T22:45:21","guid":{"rendered":"http:\/\/psyphi.net\/blog\/?p=571"},"modified":"2012-05-05T08:43:34","modified_gmt":"2012-05-05T08:43:34","slug":"naive-kmer-sequence-generator","status":"publish","type":"post","link":"https:\/\/psyphi.net\/blog\/2012\/05\/naive-kmer-sequence-generator\/","title":{"rendered":"na\u00c3\u00afve kmer sequence generator"},"content":{"rendered":"<p>This evening, for &#8220;fun&#8221;, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here&#8217;s a small, unintelligent, brute-force function I came up with.<br \/>\nIts alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it&#8217;s been able to hit all distinct combinations, tracked in the $seen hash. The final sequence ends up in $str.<\/p>\n<pre><code>#!\/usr\/local\/bin\/perl\r\nuse strict;\r\nuse warnings;\r\n\r\nmy $bases     = [qw(A C T G)];\r\nmy $k         = 3;\r\nmy $seen      = {};\r\nmy $str       = q[];\r\nmy $max_perms = (scalar @{$bases})**$k;\r\nmy $pos       = -1;\r\n\r\nPOS: while((scalar keys %{$seen}) &lt; $max_perms) {\r\n  $pos ++;\r\n  report();\r\n\r\n  for my $base (@{$bases}) {\r\n    my $triple = sprintf q[%s%s],\r\n                 (substr $str, $pos-($k-1), ($k-1)),\r\n\t\t $base;\r\n    if($pos &lt; ($k-1) ||\r\n       !$seen-&gt;{$triple}++) {\r\n      $str .= $base;\r\n      next POS;\r\n    }\r\n  }\r\n  $str .= $bases-&gt;[-1];\r\n}\r\n\r\nsub report {\r\n  print \"len=@{[length $str]} seen @{[scalar keys %{$seen}]}\/$max_perms kmers\\n\";\r\n}\r\n\r\nreport();\r\nprint $str, \"\\n\";<\/code><\/pre>\n<p>Executing for k=3, bases = ACTG the output looks like this:<\/p>\n<pre><code>elwood:~\/dev rmp$ .\/seqgen\r\nlen=0 seen 0\/64 kmers\r\nlen=1 seen 0\/64 kmers\r\nlen=2 seen 0\/64 kmers\r\nlen=3 seen 1\/64 kmers\r\nlen=4 seen 2\/64 kmers\r\nlen=5 seen 3\/64 kmers\r\nlen=6 seen 4\/64 kmers\r\nlen=7 seen 5\/64 kmers\r\nlen=8 seen 6\/64 kmers\r\nlen=9 seen 7\/64 kmers\r\nlen=10 seen 8\/64 kmers\r\nlen=11 seen 9\/64 kmers\r\nlen=12 seen 10\/64 kmers\r\nlen=13 seen 10\/64 kmers\r\nlen=14 seen 11\/64 kmers\r\nlen=15 seen 12\/64 kmers\r\nlen=16 seen 13\/64 kmers\r\nlen=17 seen 14\/64 kmers\r\nlen=18 seen 15\/64 kmers\r\nlen=19 seen 16\/64 kmers\r\nlen=20 seen 17\/64 kmers\r\nlen=21 seen 18\/64 kmers\r\nlen=22 seen 19\/64 kmers\r\nlen=23 seen 20\/64 kmers\r\nlen=24 seen 21\/64 kmers\r\nlen=25 seen 22\/64 kmers\r\nlen=26 seen 23\/64 kmers\r\nlen=27 seen 24\/64 kmers\r\nlen=28 seen 25\/64 kmers\r\nlen=29 seen 26\/64 kmers\r\nlen=30 seen 27\/64 kmers\r\nlen=31 seen 28\/64 kmers\r\nlen=32 seen 29\/64 kmers\r\nlen=33 seen 30\/64 kmers\r\nlen=34 seen 31\/64 kmers\r\nlen=35 seen 32\/64 kmers\r\nlen=36 seen 33\/64 kmers\r\nlen=37 seen 34\/64 kmers\r\nlen=38 seen 35\/64 kmers\r\nlen=39 seen 36\/64 kmers\r\nlen=40 seen 37\/64 kmers\r\nlen=41 seen 37\/64 kmers\r\nlen=42 seen 38\/64 kmers\r\nlen=43 seen 39\/64 kmers\r\nlen=44 seen 40\/64 kmers\r\nlen=45 seen 41\/64 kmers\r\nlen=46 seen 42\/64 kmers\r\nlen=47 seen 43\/64 kmers\r\nlen=48 seen 44\/64 kmers\r\nlen=49 seen 45\/64 kmers\r\nlen=50 seen 46\/64 kmers\r\nlen=51 seen 47\/64 kmers\r\nlen=52 seen 48\/64 kmers\r\nlen=53 seen 49\/64 kmers\r\nlen=54 seen 50\/64 kmers\r\nlen=55 seen 51\/64 kmers\r\nlen=56 seen 52\/64 kmers\r\nlen=57 seen 53\/64 kmers\r\nlen=58 seen 54\/64 kmers\r\nlen=59 seen 55\/64 kmers\r\nlen=60 seen 56\/64 kmers\r\nlen=61 seen 57\/64 kmers\r\nlen=62 seen 58\/64 kmers\r\nlen=63 seen 59\/64 kmers\r\nlen=64 seen 60\/64 kmers\r\nlen=65 seen 61\/64 kmers\r\nlen=66 seen 62\/64 kmers\r\nlen=67 seen 63\/64 kmers\r\nlen=68 seen 64\/64 kmers\r\nAAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG<\/code><\/pre>\n<p>As you can see, it manages to fit 64 distinct base triples in a string of only 68 characters. It could probably be packed a little more efficiently but I don&#8217;t think that&#8217;s too bad for a first attempt.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This evening, for &#8220;fun&#8221;, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here&#8217;s a small, unintelligent, brute-force function I came up with. Its alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it&#8217;s been &hellip; <a href=\"https:\/\/psyphi.net\/blog\/2012\/05\/naive-kmer-sequence-generator\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;na\u00c3\u00afve kmer sequence generator&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[11],"tags":[780,779,21,778],"class_list":["post-571","post","type-post","status-publish","format-standard","hentry","category-programming","tag-distinct","tag-kmer","tag-perl","tag-sequence"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/comments?post=571"}],"version-history":[{"count":7,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/571\/revisions"}],"predecessor-version":[{"id":580,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/571\/revisions\/580"}],"wp:attachment":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/media?parent=571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/categories?post=571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/tags?post=571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}