Bookmarks for June 5th through June 26th

These are my links for June 5th through June 26th:

unique, overlapping kmer strings

Tinkering today I wrote a quick toy to generate strings of unique, overlapping kmers. Not particularly efficient, but possibly handy nonetheless.

It takes a given k size, a configurable overlap and optionally the bases to use. First it generates a list of all the kmers then it recursively scans for matching overlapping kmers and extends a seed, terminating the recursion and printing if all kmers have been used.

Run it like so:

 ./kmer-overlap -k=3 -overlap=2 ACTG
#!/usr/local/bin/perl
#########
# Author:        rmp
# Created:       2013-05-15
# Last Modified: $Date$
# Id:            $Id$
# HeadURL:       $HeadURL$
#
use strict;
use warnings;
use Getopt::Long;
use Readonly;
use English qw(-no_match_vars);

Readonly::Scalar our $DEFAULT_K => 3;
Readonly::Scalar our $DEFAULT_BASES => [qw(A C T G)];

my $opts = {};
GetOptions($opts, qw(k=s rand help));

if($opts->{help}) {
  print < <"EOT"; $PROGRAM_NAME - rmp 2013-05-15 Usage:  $PROGRAM_NAME -k=3 -overlap=2 -rand ACTG EOT   exit; } my $k       = $opts->{k}       || $DEFAULT_K;
my $overlap = $opts->{overlap} || $k-1;
my $bases   = $DEFAULT_BASES;

if(scalar @ARGV) {
  $bases = [grep { $_ } map {split //smx} @ARGV];
}

#########
# Build all available kmers
#
my $kmers = [];

for my $base1 (@{$bases}) {
  build($base1, $bases, $kmers);
}

#########
# optionally randomise the seeds
#
if($opts->{rand}) {
  shuffle($kmers);
}

#########
# start with a seed
#
for my $seed (@{$kmers}) {
  my $seen = {
	      $seed => 1,
	     };
  solve($seed, $seen);
}

sub build {
  my ($seq, $bases, $kmers) = @_;
  if(length $seq == $k) {
    #########
    # reached target k - store & terminate
    #
    push @{$kmers}, $seq;
    return 1;
  }

  for my $base (@{$bases}) {
    ########
    # extend and descend
    #
    build("$seq$base", $bases, $kmers);
  }

  return;
}

sub solve {
  my ($seq_in, $seen) = @_;

  if(scalar keys %{$seen} == scalar @{$kmers}) {
    #########
    # exhausted all kmers - completed!
    #
    print $seq_in, "\n";
    return 1;
  }

  my $seq_tail     = substr $seq_in, -$overlap, $overlap;

  my $overlapping  = [grep { $_ =~ /^$seq_tail/smx } # filter in only seqs which overlap the seed tail
		      grep { !$seen->{$_} }          # filter out kmers we've seen already
		      @{$kmers}];
  if(!scalar @{$overlapping}) {
    #########
    # no available overlapping kmers - terminate!
    #
    return;
  }

  if($opts->{rand}) {
    shuffle($overlapping);
  }

  my $overhang = $k-$overlap;
  for my $overlap_seq (@{$overlapping}) {
    #########
    # extend and descend
    #
    my $seq_out = $seq_in . substr $overlap_seq, -$overhang, $overhang;
    solve($seq_out, {%{$seen}, $overlap_seq => 1});
  }

  return;
}

sub shuffle {
  my ($arr_in) = @_;
  for my $i (0..scalar @{$arr_in}-1) {
    my $j = int rand $i;
    ($arr_in->[$i], $arr_in->[$j]) = ($arr_in->[$j], $arr_in->[$i]);
  }
}

Output looks like this:

epiphyte:~ rmp$ ./kmer-overlap -k=2 AC
AACCA
ACCAA
CAACC
CCAAC

restart a script when a new version is deployed

I have a lot of scripts running in a lot of places, doing various little jobs, mostly shuffling data files around and feeding them into pipelines and suchlike. I also use Jenkins CI to automatically run my tests and build deb packages for Debian/Ubuntu Linux. Unfortunately, being a lazy programmer I haven’t read up about all the great things deb and apt can do so I don’t know how to fire shell commands like “service x reload” or “/etc/init.d/x restart” once a package has been deployed. Kicking a script to pick up changes is quite a common thing to do.

Instead I have a little trick that makes use of the build process changing timestamps on files when it rolls up the package. So when the script wakes up, and starts the next iteration of its event loop, the first thing it does is check the timestamp of itself and if it’s different from the last iteration it executes itself, replacing the running process with a fresh one.

One added gotcha is that if you want to run in taint mode you need to satisfy a bunch of extra requirements such as detainting $ENV{PATH} and all commandline arguments before any re-execing occurs.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author: rpettett
# Last Modified: $Date$
# Id: $Id$
# $HeadURL$
#
use strict;
use warnings;
use Readonly;
use Carp;
use English qw(-no_match_vars);
our $VERSION = q[1.0];

Readonly::Scalar our $SLEEP_LONG  => 600;
Readonly::Scalar our $SLEEP_SHORT => 30;

$OUTPUT_AUTOFLUSH++;

my @original_argv = @ARGV;

#########

# handle SIGHUP restarts
#
local $SIG{HUP} = sub {
  carp q[caught SIGHUP];
  exec $PROGRAM_NAME, @original_argv;
};

my $last_modtime;

while(1) {
  #########
  # handle software-deployment restarts
  #
  my $modtime = -M $PROGRAM_NAME;

  if($last_modtime && $last_modtime ne $modtime) {
    carp q[re-execing];
    exec $PROGRAM_NAME, @original_argv;
  }
  $last_modtime = $modtime;

  my $did_work_flag;
  eval {
    $did_work_flag = do_stuff();
    1;
  } or do {
    $did_work_flag = 0;
  };

  local $SIG{ALRM} = sub {
    carp q[rudely awoken by SIGALRM];
  };

  my $sleep = $did_work_flag ? $SLEEP_SHORT : $SLEEP_LONG;
  carp qq[sleeping for $sleep];
  sleep $sleep;
}

Bookmarks for June 1st through November 3rd

These are my links for June 1st through November 3rd:

Conway’s Game of Life in Perl

I wanted a quick implementation of Conway’s Game of Life this evening to muck about with, with the boys. Whipped this up in simple Perl for running in a terminal / on the commandline. It’s not the fastest implementation on the planet but that’s most likely just the way I’ve coded it.

Throw some 1s and 0s in the DATA block at the end to modify the start state, change the $WIDTH and $HEIGHT of the area, or uncomment the random data line in init() and see what you see.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author:        rmp
# Created:       2012-10-12
# Last Modified: $Date: 2012-10-12 19:09:00 +0100 (Fri, 12 Oct 2012) $
# Id:            $Id$
# $HeadURL$
#
use strict;
use warnings;
use Time::HiRes qw(sleep);
use Readonly;
use English qw(-no_match_vars);
use Carp;

Readonly::Scalar my $WIDTH      => 78;
Readonly::Scalar my $HEIGHT     => 21;
Readonly::Scalar my $TURN_DELAY => 0.1;

our $VERSION = '0.01';

my $grid  = init();
my $turns = 0;
while(1) {
  render($grid);
  $grid = turn($grid);
  sleep $TURN_DELAY;
  $turns++;
}

sub init {
  #########
  # initialise with a manual input from the DATA block below
  #
  local $RS = undef;
  my $data  = <data>;
  my $out   = [
	       map { [split //smx, $_] }
	       map { split /\n/smx, $_ }
	       $data
	      ];

  #########
  # fill the matrix with space
  #
  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      $out->[$y]->[$x] ||= 0;
#      $out->[$y]->[$x] = rand >= 0.2 ? 0 : 1; # initialise with some random data
    }
  }
  return $out;
}

#########
# draw to stdout/screen
#
sub render {
  my ($in) = @_;
  system $OSNAME eq 'MSWin32' ? 'cls' : 'clear';

  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];
  for my $y (@{$in}) {
    print q[|] or croak qq[Error printing: $ERRNO];
    print map { $_ ? q[O] : q[ ] } @{$y} or croak qq[Error printing: $ERRNO];
    print "|\r\n" or croak qq[Error printing: $ERRNO];
  }
  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];

  return 1;
}

#########
# the fundamental Game of Life rules
#
sub turn {
  my ($in) = @_;
  my $out  = [];

  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      my $topedge    = $y-1;
      my $bottomedge = $y+1;
      my $leftedge   = $x-1;
      my $rightedge  = $x+1;

      my $checks = [
		    grep { $_->[0] >= 0 && $_->[0] < $HEIGHT } # Y boundary checking
		    grep { $_->[1] >= 0 && $_->[1] < $WIDTH }  # X boundary checking
		    [$topedge,    $leftedge],
		    [$topedge,    $x],
		    [$topedge,    $rightedge],
		    [$y,          $leftedge],
		    [$y,          $rightedge],
		    [$bottomedge, $leftedge],
		    [$bottomedge, $x],
		    [$bottomedge, $rightedge],
		   ];

      my $alive = scalar
	          grep { $_ }
	          map { $in->[$_->[0]]->[$_->[1]] }
		  @{$checks};

      $out->[$y]->[$x] = (($in->[$y]->[$x] && $alive == 2) ||
			  $alive == 3);
    }
  }
  return $out;
}

__DATA__
0
0
0
0
0
0
0000000010
0000000111
0000000101
0000000111
0000000010
</data>

p.s. WordPress is merrily swapping “DATA” for “data” on line 38 and adding an extra /data tag at the end of that code snippet. Fix line 38 and don’t copy & paste the close tag. Damn I hate WordPress :(

Haplotype Consensus Clustering

Way back in the annals of history (2002) I wrote a bit of code to perform haplotype groupings for early Ensembl-linked data. Like my recent kmer scanner example, it used one of my favourite bits of Perl – the regex engine. I dug the old script out of an backup and it was, as you’d expect, completely horrible. So for fun I gave it a makeover this evening, in-between bits of Silent Witness.

This is what it looks like now. Down to 52 lines of code from 118 lines in the 2002 version. I guess the last 10 years have made me a little over twice as concise.

#!/usr/local/bin/perl -T
use strict;
use warnings;

#########
# read everything in
#
my $in = [];
while(my $line = <>) {
  chomp $line;
  if(!$line) {
    next;
  }

  #########
  # build regex pattern
  #
  $line =~ s{[X]}{.}smxg;

  #########
  # store
  #
  push @{$in}, uc $line;
}

my $consensii = {};

#########
# iterate over inputs
#
SEQ: for my $seq (sort { srt($a, $b) } @{$in}) {
  #########
  # iterate over consensus sequences so far
  #
  for my $con (sort { srt($a, $b) } keys %{$consensii}) {
    if($seq =~ /^$con$/smx ||
       $con =~ /^$seq$/smx) {
      #########
      # if input matches consensus, store & jump to next sequence
      #
      push @{$consensii->{$con}}, $seq;
      next SEQ;
    }
  }

  #########
  # if no match was found, create a new consensus container
  #
  $consensii->{$seq} = [$seq];
}

#########
# custom sort routine
# - firstly sort by sequence length
# - secondly sort by number of "."s (looseness)
#
sub srt {
  my ($x, $y) = @_;
  my $lx = length $x;
  my $ly = length $y;

  if($lx < $ly) {
    return -1;
  }
  if($ly > $lx) {
    return 1;
  }

  my $nx = $x =~ tr{.}{.};
  my $ny = $y =~ tr{.}{.};

  return $nx < => $ny;
}

#########
# tally and print everything out
#
while(my ($k, $v) = each %{$consensii}) {
  $k =~ s/[.]/X/sxmg;
  print $k, " [", (scalar @{$v}) ,"]\n";
  for my $m (@{$v}) {
    $m =~ s/[.]/X/sxmg;
    print "  $m\n";
  }
}

The input file looks something like this:

ACTGXTGC
ACTGATGC
ACTGTTGC
ACTGCTGC
ACTGGTGC
ACTXCTGC
ACXGCTGC
ACTGXTGC
CTGCTGC
CTGGTGC
CTXCTGC
CXGCTGC
CTGXTGC

ACTGACTGACTGACTGACTG
ACTGACTGACTGACTGACTG
ACTGXTGACTGACTG
ACTGACTGACTXACTG
ACXTGACTGACTGACTG

and the output looks a little like this – consensus [number of sequences] followed by an indented list of matching sequences:

elwood:~/dev rmp$ ./haplotype-sort < haplotype-in.txt 
ACTGXTGACTGACTG [1]
  ACTGXTGACTGACTG
ACTGATGC [1]
  ACTGATGC
CTGCTGC [4]
  CTGCTGC
  CTXCTGC
  CXGCTGC
  CTGXTGC
ACTGCTGC [5]
  ACTGCTGC
  ACTGXTGC
  ACTXCTGC
  ACXGCTGC
  ACTGXTGC
ACTGACTGACTGACTGACTG [2]
  ACTGACTGACTGACTGACTG
  ACTGACTGACTGACTGACTG
ACTGTTGC [1]
  ACTGTTGC
CTGGTGC [1]
  CTGGTGC
ACTGACTGACTXACTG [1]
  ACTGACTGACTXACTG
ACXTGACTGACTGACTG [1]
  ACXTGACTGACTGACTG
ACTGGTGC [1]
  ACTGGTGC

naïve kmer scanner

Another bit of fun, basically the opposite of yesterday’s post, here we’re detecting the number of unique kmers present in a sequence. It’s easy to do this with an iterating substr approach but I like Perl’s regex engine a lot so I wanted to do it using that. Okay, I wanted to do it entirely in one /e regex but it’s slightly trickier and a lot less clear manipulating pos inside a /e substitution function.

#!/usr/local/bin/perl
use strict;
use warnings;

my $str   = q[AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG];
my $k     = 3;
my $match = q[\s*[ACTG]\s*]x$k;
my $seen  = {};

while($str =~ m{($match)}smxgi) {
  my $m = $1;
  $m    =~ s/\s*//smxg;

  $seen->{$m}++;

  pos $str = (pos $str) - $k + 1;
}

{
  local $, = "\n";
  print sort keys %{$seen};
}

printf "\n%d unique ${k}mers\n", scalar keys %{$seen};

$k is the size of the kmers we’re looking for. In this case 3, as we were generating yesterday.
$match attempts to take care of matches across newlines, roughly what one might find inside a FASTA. YMMV.
$seen keeps track of uniques we’ve encountered so far in $str.

The while loop iterates through matches found by the regex engine and pos, a function you don’t see too often, resets the start position for the next match, in this case to the current position minus 1 less than the length of the match (pos – k + 1).

The output looks something like this:


elwood:~/dev rmp$ ./kmers 
AAA
AAC
AAG
AAT
ACA
ACC
ACG
ACT
AGA
AGC
AGG
AGT
ATA
ATC
ATG
ATT
CAA
CAC
CAG
CAT
CCA
CCC
CCG
CCT
CGA
CGC
CGG
CGT
CTA
CTC
CTG
CTT
GAA
GAC
GAG
GAT
GCA
GCC
GCG
GCT
GGA
GGC
GGG
GGT
GTA
GTC
GTG
GTT
TAA
TAC
TAG
TAT
TCA
TCC
TCG
TCT
TGA
TGC
TGG
TGT
TTA
TTC
TTG
TTT
64 unique 3mers

If I were really keen I’d make use this in a regression test for yesterday’s toy.

naïve kmer sequence generator

This evening, for “fun”, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here’s a small, unintelligent, brute-force function I came up with.
Its alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it’s been able to hit all distinct combinations, tracked in the $seen hash. The final sequence ends up in $str.

#!/usr/local/bin/perl
use strict;
use warnings;

my $bases     = [qw(A C T G)];
my $k         = 3;
my $seen      = {};
my $str       = q[];
my $max_perms = (scalar @{$bases})**$k;
my $pos       = -1;

POS: while((scalar keys %{$seen}) < $max_perms) {
  $pos ++;
  report();

  for my $base (@{$bases}) {
    my $triple = sprintf q[%s%s],
                 (substr $str, $pos-($k-1), ($k-1)),
		 $base;
    if($pos < ($k-1) ||
       !$seen->{$triple}++) {
      $str .= $base;
      next POS;
    }
  }
  $str .= $bases->[-1];
}

sub report {
  print "len=@{[length $str]} seen @{[scalar keys %{$seen}]}/$max_perms kmers\n";
}

report();
print $str, "\n";

Executing for k=3, bases = ACTG the output looks like this:

elwood:~/dev rmp$ ./seqgen
len=0 seen 0/64 kmers
len=1 seen 0/64 kmers
len=2 seen 0/64 kmers
len=3 seen 1/64 kmers
len=4 seen 2/64 kmers
len=5 seen 3/64 kmers
len=6 seen 4/64 kmers
len=7 seen 5/64 kmers
len=8 seen 6/64 kmers
len=9 seen 7/64 kmers
len=10 seen 8/64 kmers
len=11 seen 9/64 kmers
len=12 seen 10/64 kmers
len=13 seen 10/64 kmers
len=14 seen 11/64 kmers
len=15 seen 12/64 kmers
len=16 seen 13/64 kmers
len=17 seen 14/64 kmers
len=18 seen 15/64 kmers
len=19 seen 16/64 kmers
len=20 seen 17/64 kmers
len=21 seen 18/64 kmers
len=22 seen 19/64 kmers
len=23 seen 20/64 kmers
len=24 seen 21/64 kmers
len=25 seen 22/64 kmers
len=26 seen 23/64 kmers
len=27 seen 24/64 kmers
len=28 seen 25/64 kmers
len=29 seen 26/64 kmers
len=30 seen 27/64 kmers
len=31 seen 28/64 kmers
len=32 seen 29/64 kmers
len=33 seen 30/64 kmers
len=34 seen 31/64 kmers
len=35 seen 32/64 kmers
len=36 seen 33/64 kmers
len=37 seen 34/64 kmers
len=38 seen 35/64 kmers
len=39 seen 36/64 kmers
len=40 seen 37/64 kmers
len=41 seen 37/64 kmers
len=42 seen 38/64 kmers
len=43 seen 39/64 kmers
len=44 seen 40/64 kmers
len=45 seen 41/64 kmers
len=46 seen 42/64 kmers
len=47 seen 43/64 kmers
len=48 seen 44/64 kmers
len=49 seen 45/64 kmers
len=50 seen 46/64 kmers
len=51 seen 47/64 kmers
len=52 seen 48/64 kmers
len=53 seen 49/64 kmers
len=54 seen 50/64 kmers
len=55 seen 51/64 kmers
len=56 seen 52/64 kmers
len=57 seen 53/64 kmers
len=58 seen 54/64 kmers
len=59 seen 55/64 kmers
len=60 seen 56/64 kmers
len=61 seen 57/64 kmers
len=62 seen 58/64 kmers
len=63 seen 59/64 kmers
len=64 seen 60/64 kmers
len=65 seen 61/64 kmers
len=66 seen 62/64 kmers
len=67 seen 63/64 kmers
len=68 seen 64/64 kmers
AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG

As you can see, it manages to fit 64 distinct base triples in a string of only 68 characters. It could probably be packed a little more efficiently but I don’t think that’s too bad for a first attempt.

3 sorts of sort

I’ve been fiddling around recently with some stuff which I’m sure I covered in my CS degree 16 (Gah! Really?) years ago but have had to re-educate myself about. Namely a few different implementations of sort. I implemented three types in Perl with some reacquaintance of the mechanisms via Wikipedia. I found a few existing examples of sort algorithms in Perl but they generally looked a bit unpleasant so like all programmers I decided to write my own (complete with errors, as an exercise to the reader). Here I’ve also added some notes, mostly to myself, which are largely unsubstantiated because I haven’t measured memory, speed or recursion depth, for example (though these are well-documented elsewhere).

1. Bubble Sort

#!/usr/bin/perl -w
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = bubblesort($set);
print "out: @{$sorted}\n";

sub bubblesort {
  my ($in)     = @_;
  my $out      = [@{$in}];
  my $length   = scalar @{$in};
  my $modified = 1;

  while($modified) {
    $modified = 0;
    for my $i (0..$length-2) {
      if($out->[$i] > $out->[$i+1]) {
	($out->[$i], $out->[$i+1]) = ($out->[$i+1], $out->[$i]);
	$modified = 1;
      }
    }
  }

  return $out;
}

Bubblesort iterates through each element of the list up to the last but one, comparing to the next element in the list. If it’s greater the values are swapped. The process repeats until no modifications are made to the list.

Pros: doesn’t use much memory – values are swapped in situ; doesn’t perform deep recursion; is easy to read

Cons: It’s pretty slow. The worst-case complexity is O(n2) passes (for each value in the list each value in the list is processed once).

2. Merge Sort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = mergesort($set);
print "out: @{$sorted}\n";

sub mergesort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $partition = $length / 2;
  my $left      = [@{$in}[0..$partition-1]];
  my $right     = [@{$in}[$partition..$length-1]];

  return merge(mergesort($left), mergesort($right));
}

sub merge {
  my ($left, $right) = @_;
  my $merge = [];

  while(scalar @{$left} || scalar @{$right}) {
    if(scalar @{$left} && scalar @{$right}) {
      if($left->[0] < $right->[0]) {
	push @{$merge}, shift @{$left};
      } else {
	push @{$merge}, shift @{$right};
      }
    } elsif(scalar @{$left}) {
      push @{$merge}, shift @{$left};
    } elsif(scalar @{$right}) {
      push @{$merge}, shift @{$right};
    }
  }
  return $merge;
}

Mergesort recurses through the list, in each iteration breaking the remaining list in half. Once broken down to individual elements, each pair of elements/lists at each depth of recursion is reconstituted into a new ordered list and returned.

Pros: generally quicker than bubblesort; O(n log n) complexity.

Cons: quite difficult to read

3. Quicksort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = quicksort($set);
print "out: @{$sorted}\n";

sub quicksort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $pivot = splice @{$in}, $length / 2, 1;
  my $left  = [];
  my $right = [];

  for my $v (@{$in}) {
    if($v < $pivot) {
      push @{$left}, $v;
    } else {
      push @{$right}, $v;
    }
  }

  return [@{quicksort($left)}, $pivot, @{quicksort($right)}];
}

Quicksort is probably the best known of all the sort algorithms out there. It’s easier to read than Mergesort, though arguably still not as easy as Bubblesort, but it’s a common pattern and its speed makes up for anything lacking in readability. At each iteration a pivot is selected and removed from the list. The remaining list is scanned and for element lower than the pivot is put in a new “left” list and each greater element is put into a new “right” list. The returned result is a merged recursive quicksort of the left list, the pivot and the right list.

In this example I’m picking the middle element of the list as the pivot. I’m sure there are entire branches of discrete mathematics dealing with how to choose the pivot based on the type of input data.

Pros: (One of?) the fastest sort algorithm(s) around; Reasonably efficient memory usage and recursion depth. Average O(n log n) complexity again (worst is O(n2)).

Perhaps it’s worth noting that in 25-odd years of programming computers I’ve only ever had to examine the inner workings of sort routines as part of my degree – never before, nor after, but it’s certainly brought back a few memories.

Neat Perl Gotcha

For a while now I’ve been using Test::Perl::Critic as an integral part of my daily coding. Test::Perl::Critic wraps Perl::Critic into a set of tests which can be automatically run against a distribution. Perl Critic implements Damien Conway’s set of standard Perl Best Practices.

I’m not going to go into the arguments of whether they’re good or bad rules right now. Suffice to say I use nearly all of them and my code has improved because of it. Anyway, one of the rules states you shouldn’t use parentheses for core method calls like length(), int() or rand(), so most of the time I don’t, but today I wanted to do this:

my @array = map { int rand * 45 } (1..10);

The results come out as an array of zeros. Why? Well it’s not a broken PRNG, that’s for sure. It’s a simple case of misunderstood parsing, operator precedence and Perl internals. Looking closely * 45 isn’t what you’d expect, *45 refers to the typeglob named 45 in the main package. I actually have no idea how this is cast into a number in Perl but however it’s done, it evaluates to zero. rand 0 seems to exhibit the same behaviour as rand 1, yielding a random float between zero and 1. int rand 0 will always be zero.

So? Well to stop the parser from taking *45 as an argument to rand you need to upset PerlCritic and add those parentheses back in:

my @array = map { int rand() *45 } (1..10);

and it works now so back to work!

[ The astute amongst you will realise you can just say int rand(45). I know, but that doesn’t work nearly so well as a mildly interesting example. I blame PerlCritic ;) ]