{"id":735,"date":"2013-05-15T21:42:58","date_gmt":"2013-05-15T21:42:58","guid":{"rendered":"http:\/\/psyphi.net\/blog\/?p=735"},"modified":"2013-05-15T21:42:58","modified_gmt":"2013-05-15T21:42:58","slug":"unique-overlapping-kmer-strings","status":"publish","type":"post","link":"https:\/\/psyphi.net\/blog\/2013\/05\/unique-overlapping-kmer-strings\/","title":{"rendered":"unique, overlapping kmer strings"},"content":{"rendered":"<p>Tinkering today I wrote a quick toy to generate strings of unique, overlapping kmers. Not particularly efficient, but possibly handy nonetheless.<\/p>\n<p>It takes a given k size, a configurable overlap and optionally the bases to use. First it generates a list of all the kmers then it recursively scans for matching overlapping kmers and extends a seed, terminating the recursion and printing if all kmers have been used.<\/p>\n<p>Run it like so:<\/p>\n<pre><code> .\/kmer-overlap -k=3 -overlap=2 ACTG<\/code><\/pre>\n<pre><code>#!\/usr\/local\/bin\/perl\r\n#########\r\n# Author:        rmp\r\n# Created:       2013-05-15\r\n# Last Modified: $Date$\r\n# Id:            $Id$\r\n# HeadURL:       $HeadURL$\r\n#\r\nuse strict;\r\nuse warnings;\r\nuse Getopt::Long;\r\nuse Readonly;\r\nuse English qw(-no_match_vars);\r\n\r\nReadonly::Scalar our $DEFAULT_K =&gt; 3;\r\nReadonly::Scalar our $DEFAULT_BASES =&gt; [qw(A C T G)];\r\n\r\nmy $opts = {};\r\nGetOptions($opts, qw(k=s rand help));\r\n\r\nif($opts-&gt;{help}) {\r\n  print &lt; &lt;\"EOT\"; $PROGRAM_NAME - rmp 2013-05-15 Usage:  $PROGRAM_NAME -k=3 -overlap=2 -rand ACTG EOT   exit; } my $k       = $opts-&gt;{k}       || $DEFAULT_K;\r\nmy $overlap = $opts-&gt;{overlap} || $k-1;\r\nmy $bases   = $DEFAULT_BASES;\r\n\r\nif(scalar @ARGV) {\r\n  $bases = [grep { $_ } map {split \/\/smx} @ARGV];\r\n}\r\n\r\n#########\r\n# Build all available kmers\r\n#\r\nmy $kmers = [];\r\n\r\nfor my $base1 (@{$bases}) {\r\n  build($base1, $bases, $kmers);\r\n}\r\n\r\n#########\r\n# optionally randomise the seeds\r\n#\r\nif($opts-&gt;{rand}) {\r\n  shuffle($kmers);\r\n}\r\n\r\n#########\r\n# start with a seed\r\n#\r\nfor my $seed (@{$kmers}) {\r\n  my $seen = {\r\n\t      $seed =&gt; 1,\r\n\t     };\r\n  solve($seed, $seen);\r\n}\r\n\r\nsub build {\r\n  my ($seq, $bases, $kmers) = @_;\r\n  if(length $seq == $k) {\r\n    #########\r\n    # reached target k - store &amp; terminate\r\n    #\r\n    push @{$kmers}, $seq;\r\n    return 1;\r\n  }\r\n\r\n  for my $base (@{$bases}) {\r\n    ########\r\n    # extend and descend\r\n    #\r\n    build(\"$seq$base\", $bases, $kmers);\r\n  }\r\n\r\n  return;\r\n}\r\n\r\nsub solve {\r\n  my ($seq_in, $seen) = @_;\r\n\r\n  if(scalar keys %{$seen} == scalar @{$kmers}) {\r\n    #########\r\n    # exhausted all kmers - completed!\r\n    #\r\n    print $seq_in, \"\\n\";\r\n    return 1;\r\n  }\r\n\r\n  my $seq_tail     = substr $seq_in, -$overlap, $overlap;\r\n\r\n  my $overlapping  = [grep { $_ =~ \/^$seq_tail\/smx } # filter in only seqs which overlap the seed tail\r\n\t\t      grep { !$seen-&gt;{$_} }          # filter out kmers we've seen already\r\n\t\t      @{$kmers}];\r\n  if(!scalar @{$overlapping}) {\r\n    #########\r\n    # no available overlapping kmers - terminate!\r\n    #\r\n    return;\r\n  }\r\n\r\n  if($opts-&gt;{rand}) {\r\n    shuffle($overlapping);\r\n  }\r\n\r\n  my $overhang = $k-$overlap;\r\n  for my $overlap_seq (@{$overlapping}) {\r\n    #########\r\n    # extend and descend\r\n    #\r\n    my $seq_out = $seq_in . substr $overlap_seq, -$overhang, $overhang;\r\n    solve($seq_out, {%{$seen}, $overlap_seq =&gt; 1});\r\n  }\r\n\r\n  return;\r\n}\r\n\r\nsub shuffle {\r\n  my ($arr_in) = @_;\r\n  for my $i (0..scalar @{$arr_in}-1) {\r\n    my $j = int rand $i;\r\n    ($arr_in-&gt;[$i], $arr_in-&gt;[$j]) = ($arr_in-&gt;[$j], $arr_in-&gt;[$i]);\r\n  }\r\n}<\/code><\/pre>\n<p>Output looks like this:<\/p>\n<pre><code>epiphyte:~ rmp$ .\/kmer-overlap -k=2 AC\r\nAACCA\r\nACCAA\r\nCAACC\r\nCCAAC<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Tinkering today I wrote a quick toy to generate strings of unique, overlapping kmers. Not particularly efficient, but possibly handy nonetheless. It takes a given k size, a configurable overlap and optionally the bases to use. First it generates a list of all the kmers then it recursively scans for matching overlapping kmers and extends &hellip; <a href=\"https:\/\/psyphi.net\/blog\/2013\/05\/unique-overlapping-kmer-strings\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;unique, overlapping kmer strings&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[11],"tags":[38,920,21,778],"class_list":["post-735","post","type-post","status-publish","format-standard","hentry","category-programming","tag-bioinformatics","tag-kmers","tag-perl","tag-sequence"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/comments?post=735"}],"version-history":[{"count":6,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/735\/revisions"}],"predecessor-version":[{"id":741,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/posts\/735\/revisions\/741"}],"wp:attachment":[{"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/media?parent=735"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/categories?post=735"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/psyphi.net\/blog\/wp-json\/wp\/v2\/tags?post=735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}