programming - psyphi.net blog

Conway’s Game of Life in Perl

I wanted a quick implementation of Conway’s Game of Life this evening to muck about with, with the boys. Whipped this up in simple Perl for running in a terminal / on the commandline. It’s not the fastest implementation on the planet but that’s most likely just the way I’ve coded it.

Throw some 1s and 0s in the DATA block at the end to modify the start state, change the $WIDTH and $HEIGHT of the area, or uncomment the random data line in init() and see what you see.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author:        rmp
# Created:       2012-10-12
# Last Modified: $Date: 2012-10-12 19:09:00 +0100 (Fri, 12 Oct 2012) $
# Id:            $Id$
# $HeadURL$
#
use strict;
use warnings;
use Time::HiRes qw(sleep);
use Readonly;
use English qw(-no_match_vars);
use Carp;

Readonly::Scalar my $WIDTH      => 78;
Readonly::Scalar my $HEIGHT     => 21;
Readonly::Scalar my $TURN_DELAY => 0.1;

our $VERSION = '0.01';

my $grid  = init();
my $turns = 0;
while(1) {
  render($grid);
  $grid = turn($grid);
  sleep $TURN_DELAY;
  $turns++;
}

sub init {
  #########
  # initialise with a manual input from the DATA block below
  #
  local $RS = undef;
  my $data  = <data>;
  my $out   = [
	       map { [split //smx, $_] }
	       map { split /\n/smx, $_ }
	       $data
	      ];

  #########
  # fill the matrix with space
  #
  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      $out->[$y]->[$x] ||= 0;
#      $out->[$y]->[$x] = rand >= 0.2 ? 0 : 1; # initialise with some random data
    }
  }
  return $out;
}

#########
# draw to stdout/screen
#
sub render {
  my ($in) = @_;
  system $OSNAME eq 'MSWin32' ? 'cls' : 'clear';

  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];
  for my $y (@{$in}) {
    print q[|] or croak qq[Error printing: $ERRNO];
    print map { $_ ? q[O] : q[ ] } @{$y} or croak qq[Error printing: $ERRNO];
    print "|\r\n" or croak qq[Error printing: $ERRNO];
  }
  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];

  return 1;
}

#########
# the fundamental Game of Life rules
#
sub turn {
  my ($in) = @_;
  my $out  = [];

  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      my $topedge    = $y-1;
      my $bottomedge = $y+1;
      my $leftedge   = $x-1;
      my $rightedge  = $x+1;

      my $checks = [
		    grep { $_->[0] >= 0 && $_->[0] < $HEIGHT } # Y boundary checking
		    grep { $_->[1] >= 0 && $_->[1] < $WIDTH }  # X boundary checking
		    [$topedge,    $leftedge],
		    [$topedge,    $x],
		    [$topedge,    $rightedge],
		    [$y,          $leftedge],
		    [$y,          $rightedge],
		    [$bottomedge, $leftedge],
		    [$bottomedge, $x],
		    [$bottomedge, $rightedge],
		   ];

      my $alive = scalar
	          grep { $_ }
	          map { $in->[$_->[0]]->[$_->[1]] }
		  @{$checks};

      $out->[$y]->[$x] = (($in->[$y]->[$x] && $alive == 2) ||
			  $alive == 3);
    }
  }
  return $out;
}

__DATA__
0
0
0
0
0
0
0000000010
0000000111
0000000101
0000000111
0000000010
</data>

p.s. WordPress is merrily swapping “DATA” for “data” on line 38 and adding an extra /data tag at the end of that code snippet. Fix line 38 and don’t copy & paste the close tag. Damn I hate WordPress :(

Another use for Selenium IDE

A dear friend of mine recently needed to recover all email from his mailbox. Normally this wouldn’t be a problem, there are plenty of options in any sane mail application – export or archive mailbox, select-all messages and “Send Again”/Redirect/Bounce to another address or at the very worst, select-all and forward. Most of these options are available with desktop mail applications – Pine, Squirrelmail, IMP, Outlook, Outlook Express, Windows Mail, Mail.app, Thunderbird, Eudora and I’m sure loads of others.

Unfortunately the only access provided was through Microsoft’s Outlook Web Access (2007). This, whilst being fairly pretty in Lite (non-Internet Explorer browsers) mode and prettier/heavier in MSIE, does not have any useful bulk forwarding or export functionality at all. None. Not desperately handy, to be sure.

Ok, so my first port of call was to connect my Mail.app which supports Exchange OWA access. No dice – spinning, hanging, no data. Hmm – odd. Ok, second I tried fetchExc a Java commandline tool which promised everything I needed but in the end delivered pretty obtuse error messages. After an hour’s fiddling I gave up with fetchExc and tried falling back to Perl with Email::Folder::Exchange. This had very similar results to fetchExc but a slightly different set of errors.

After much swearing and a lot more poking, probing and requesting of tips from other friends (thanks Ze) the OWA service was also found to be sitting behind Microsoft’s Internet Security and Acceleration server. This isn’t a product I’ve used before but I can only assume it’s an expensive reverse proxy, the sort of thing I’d compare to inexpensive Apache + mod_proxy + mod_security on a good day. This ISA service happened to block all remote SOAP (2000/2003) and WebDAV (2007/2010) access too. Great! No remote service access whatsoever.

Brute force to the rescue. I could, of course go in and manually forward each and every last mail, but that’s quite tedious and a huge amount of clicking and pasting in the same email address.Â Enter Selenium IDE.

Selenium is a suite of tools for remote controlling browsers, primarily for writing tests for interactive applications. I use it in my day to day work mostly for checking bits of dynamic javascript, DHTML, forms etc. are doing the right things when clicked/pressed/dragged and generally interacted with. OWA is just a (really badly written) webpage to interact with, after all.

I downloaded the excellent sideflow.js plugin which provides loop functionality not usually required for web app testing and after a bit of DOM inspection on the OWA pages I came up with the following plan –

click the subject link
click the “forward” button
enter the recipient address
click the send button
select the checkbox
press the “delete” button
repeat 500 times

The macro looked something like this:

<table cellpadding="1" cellspacing="1" border="1">
<thead>
<tr><td rowspan="1" colspan="3">owa-selenium-macro</td></tr>
</thead><tbody>
<tr>
	<td>getEval</td>
	<td>index=0</td>
	<td></td>
</tr>
<tr>
	<td>while</td>
	<td>index&lt;500</td>
	<td></td>
</tr>
<tr>
	<td>storeEval</td>
	<td>index</td>
	<td>value</td>
</tr>
<tr>
	<td>echo</td>
	<td>${value}</td>
	<td></td>
</tr>
<tr>
	<td>clickAndWait</td>
	<td>//table[1]/tbody/tr[2]/td[3]/table/tbody/tr[2]/td/div/table//tbody/tr[3]/td[6]/h1/a</td>
	<td></td>
</tr>
<tr>
	<td>clickAndWait</td>
	<td>id=lnkHdrforward</td>
	<td></td>
</tr>
<tr>
	<td>type</td>
	<td>id=txtto</td>
	<td>newaddress@gmail.com</td>
</tr>
<tr>
	<td>clickAndWait</td>
	<td>id=lnkHdrsend</td>
	<td></td>
</tr>
<tr>
	<td>click</td>
	<td>name=chkmsg</td>
	<td></td>
</tr>
<tr>
	<td>clickAndWait</td>
	<td>id=lnkHdrdelete</td>
	<td></td>
</tr>
<tr>
	<td>getEval</td>
	<td>index++</td>
	<td></td>
</tr>
<tr>
	<td>endWhile</td>
	<td></td>
	<td></td>
</tr>
</tbody></table>

So I logged in, opened each folder in turn and replayed the macro in Selenium IDE as many times as I needed to. Bingo! Super kludgy but it worked well, was entertaining to watch and ultimately did the job.

SVN Server Integration with HTTPS, Active Directory, PAM & Winbind

Subversion on a whiteboard — Image CC by johntrainor

In this post I’d like to explain how it’s possible to integrate SVN (Subversion) source control using WebDAV and HTTPS using Apache and Active Directory to provide authentication and access control.

It’s generally accepted that SVN over WebDAV/HTTPS Â provides finer granulation security controls than SVN+SSH. The problem is that SVN+SSH is really easy to set up, requiring knowledge of svnadmin and the filesystem and very little else but WebDAV+HTTPS requires knowledge of Apache and its modules relating to WebDAV, authentication and authorisation which is quite a lot more to ask. Add to that authenticating to AD and you have yourself a lovely string of delicate single point of failure components. Ho-hum, not a huge amount you can do about that but at least the Apache components are pretty robust.

For this article I’m using CentOS but everything should be transferrable to any distribution with a little tweakage.

Repository Creation

Firstly then, pick a disk or volume with plenty of space, we’re using make your repository – same as you would for svn+ssh:

svnadmin create /var/svn/repos

Apache Modules

Install the prerequisite Apache modules:

yum install mod_dav_svn

This should also install mod_authz_svn which we’ll also be making use of. Both should end up in Apache’s module directory, in this case /etc/httpd/modules/

Download and install mod_authnz_external from its Google Code page. This allows Apache basic authentication to hook into an external authentication mechanism. mod_authnz_external.so should end up in Apache’s module directory but in my case it ended up in its default location of /usr/lib/httpd/modules/.

Download and install the companion pwauth utility from its Google Code page. In my case it installs to /usr/local/sbin/pwauth and needs suexec permissions (granted using chmod +s).

Apache Configuration (HTTP)

ServerName svn.example.com
ServerAdmin me@example.com

Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/mod_setenvif.so
LoadModule mime_module		modules/mod_mime.so
LoadModule log_config_module	modules/mod_log_config.so
LoadModule dav_module		modules/mod_dav.so
LoadModule dav_svn_module	modules/mod_dav_svn.so
LoadModule auth_basic_module    modules/mod_auth_basic.so
LoadModule authz_svn_module	modules/mod_authz_svn.so
LoadModule authnz_external_module modules/mod_authnz_external.so

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>
	ServerName	svn.example.com

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	<location / >
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

		<limitexcept GET PROPFIND OPTIONS REPORT>
			Require valid-user
		</limitexcept>
	</location>
</virtualhost>

Network Time (NTP)

In order to join a Windows domain, accurate and synchronised time is crucial, so you’ll need to be running NTPd.

yum install ntp
chkconfig ntpd on
ntpdate ntp.ubuntu.com
service ntpd start

Samba Configuration

Here’s where AD comes in and in my experience this is by far the most unreliable service. Install and configure samba:

yum install samba
chkconfig winbind on

Edit your /etc/samba/smb.conf to pull information from AD.

[global]
	workgroup = EXAMPLE
	realm = EXAMPLE.COM
	security = ADS
	allow trusted domains = No
	use kerberos keytab = Yes
	log level = 3
	log file = /var/log/samba/%m
	max log size = 50
	printcap name = cups
	idmap backend = idmap_rid:EXAMPLE=600-20000
	idmap uid = 600-20000
	idmap gid = 600-20000
	template shell = /bin/bash
	winbind enum users = Yes
	winbind enum groups = Yes
	winbind use default domain = Yes
	winbind offline logon = yes

Join the machine to the domain – you’ll need an account with domain admin credentials to do this:

net ads join -U administrator

Check the join is behaving ok:

[root@svn conf]# net ads info
LDAP server: 192.168.100.10
LDAP server name: ad00.example.com
Realm: EXAMPLE.COM
Bind Path: dc=EXAMPLE,dc=COM
LDAP port: 389
Server time: Tue, 15 May 2012 22:44:34 BST
KDC server: 192.168.100.10
Server time offset: 130

(Re)start winbind to pick up the new configuration:

service winbind restart

PAM & nsswitch.conf

PAM needs to know where to pull its information from, so we tell it about the new winbind service in /etc/pam.d/system-auth.

#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        sufficient    pam_winbind.so try_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so broken_shadow
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore] pam_winbind.so
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3
password    sufficient    pam_unix.so md5 shadow nullok try_first_pass use_authtok
password    sufficient    pam_winbind.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      /lib/security/pam_mkhomedir.so 
session     required      pam_unix.so
session     optional      pam_winbind.so

YMMV with PAM. It can take quite a lot of fiddling around to make it work perfectly. This obviously has an extremely close correlation to how flaky users find the authentication service. If you’re running on 64-bit you may find you need to install 64-bit versions of pam modules, e.g. mkhomedir which aren’t installed by default.

We also modify nsswitch.conf to tell other, non-pam aspects of the system where to pull information from:

passwd:     files winbind
shadow:     files winbind
group:      files winbind

To check the authentication information is coming back correctly you can use wbinfo but I like seeing data by using getent group or getent passwd. The output of these two commands will contain domain accounts if things are working correctly and only local system accounts otherwise.

External Authentication

We’re actually going to use system accounts for authentication. To stop people continuing to use svn+ssh (and thus bypassing the authorisation controls) we edit /etc/ssh/sshd_config and use AllowUsers or AllowGroups and specify all permitted users. Using AllowGroups will also provide AD group control of permitted logins but as the list is small it’s probably overkill. My sshd_config list looks a lot like this:

AllowUsers	root rmp contractor itadmin

To test external authentication run /usr/local/sbin/pwauth as below. “yay” should be displayed if things are working ok. Note the password here is displayed in clear-text:

[root@svn conf]# pwauth && echo 'yay' || echo 'nay'
rmp
mypassword

Access Controls

/etc/httpd/authz_svn.conf is the only part which should require any modifications over time – the access controls specify who is allowed to read and/or write to each svn project, in fact as everything’s a URL now you can arbitrarily restrict subfolders of projects too but that’s a little OTT. It can be arbitrarily extended and can take local and active directory usernames. I’m sure mod_authz_svn has full documentation about what you can and can’t put in here.

#
# Allow anonymous read access to everything by default.
#
[/]
* = r
rmp = rw

[/myproject]
rmp = rw
bob = rw

...

SSL

So far that’s all the basic components. The last piece in the puzzle is enabling SSL for Apache. I use the following /etc/httpd/httpd.conf:

ServerName svn.example.com
ServerAdmin me@example.com

Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/mod_setenvif.so
LoadModule mime_module		modules/mod_mime.so
LoadModule log_config_module	modules/mod_log_config.so
LoadModule proxy_module		modules/mod_proxy.so
LoadModule proxy_http_module	modules/mod_proxy_http.so
LoadModule rewrite_module	modules/mod_rewrite.so
LoadModule dav_module		modules/mod_dav.so
LoadModule dav_svn_module	modules/mod_dav_svn.so
LoadModule auth_basic_module    modules/mod_auth_basic.so
LoadModule authz_svn_module	modules/mod_authz_svn.so
LoadModule ssl_module		modules/mod_ssl.so
LoadModule authnz_external_module modules/mod_authnz_external.so

Include conf.d/ssl.conf

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>
	ServerName		svn.example.com

	Rewrite		/	https://svn.example.com/	[R=permanent,L]
</virtualhost>

<virtualhost *:443>
	ServerName	svn.example.com

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	SSLEngine on
	SSLProtocol all -SSLv2

	SSLCipherSuite		ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
	SSLCertificateFile	/etc/httpd/conf/svn.crt
	SSLCertificateKeyFile	/etc/httpd/conf/svn.key

	<location />
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

		<limitexcept GET PROPFIND OPTIONS REPORT>
			Require valid-user
		</limitexcept>
	
</virtualhost>

/etc/httpd/conf.d/ssl.conf is pretty much the unmodified distribution ssl.conf and looks like this:

LoadModule ssl_module modules/mod_ssl.so

Listen 443

AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl

SSLPassPhraseDialog  builtin

SSLSessionCache         shmcb:/var/cache/mod_ssl/scache(512000)
SSLSessionCacheTimeout  300

SSLMutex default

SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin

SSLCryptoDevice builtin

SetEnvIf User-Agent ".*MSIE.*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

You’ll need to build yourself a certificate, self-signed if necessary, but that’s a whole other post. I recommend searching the web for “openssl self signed certificate” and you should find what you need. The above httpd.conf references the key and certificate under /etc/httpd/conf/svn.key and /etc/httpd/conf/svn.crt respectively.

The mod_authnz_external+pwauth combination can be avoided if you can persuadeÂ mod_authz_ldap to play nicely. There are a few different ldap modules around on the intertubes and after a lot of trial and even more error I couldn’t make any of them work reliably if at all.

And if all this leaves you feeling pretty nauseous it’s quite natural. To remedy this, go use git instead.

Haplotype Consensus Clustering

Way back in the annals of history (2002) I wrote a bit of code to perform haplotype groupings for early Ensembl-linked data. Like my recent kmer scanner example, it used one of my favourite bits of Perl – the regex engine. I dug the old script out of an backup and it was, as you’d expect, completely horrible. So for fun I gave it a makeover this evening, in-between bits of Silent Witness.

This is what it looks like now. Down to 52 lines of code from 118 lines in the 2002 version. I guess the last 10 years have made me a little over twice as concise.

#!/usr/local/bin/perl -T
use strict;
use warnings;

#########
# read everything in
#
my $in = [];
while(my $line = <>) {
  chomp $line;
  if(!$line) {
    next;
  }

  #########
  # build regex pattern
  #
  $line =~ s{[X]}{.}smxg;

  #########
  # store
  #
  push @{$in}, uc $line;
}

my $consensii = {};

#########
# iterate over inputs
#
SEQ: for my $seq (sort { srt($a, $b) } @{$in}) {
  #########
  # iterate over consensus sequences so far
  #
  for my $con (sort { srt($a, $b) } keys %{$consensii}) {
    if($seq =~ /^$con$/smx ||
       $con =~ /^$seq$/smx) {
      #########
      # if input matches consensus, store & jump to next sequence
      #
      push @{$consensii->{$con}}, $seq;
      next SEQ;
    }
  }

  #########
  # if no match was found, create a new consensus container
  #
  $consensii->{$seq} = [$seq];
}

#########
# custom sort routine
# - firstly sort by sequence length
# - secondly sort by number of "."s (looseness)
#
sub srt {
  my ($x, $y) = @_;
  my $lx = length $x;
  my $ly = length $y;

  if($lx < $ly) {
    return -1;
  }
  if($ly > $lx) {
    return 1;
  }

  my $nx = $x =~ tr{.}{.};
  my $ny = $y =~ tr{.}{.};

  return $nx < => $ny;
}

#########
# tally and print everything out
#
while(my ($k, $v) = each %{$consensii}) {
  $k =~ s/[.]/X/sxmg;
  print $k, " [", (scalar @{$v}) ,"]\n";
  for my $m (@{$v}) {
    $m =~ s/[.]/X/sxmg;
    print "  $m\n";
  }
}

The input file looks something like this:

ACTGXTGC
ACTGATGC
ACTGTTGC
ACTGCTGC
ACTGGTGC
ACTXCTGC
ACXGCTGC
ACTGXTGC
CTGCTGC
CTGGTGC
CTXCTGC
CXGCTGC
CTGXTGC

ACTGACTGACTGACTGACTG
ACTGACTGACTGACTGACTG
ACTGXTGACTGACTG
ACTGACTGACTXACTG
ACXTGACTGACTGACTG

and the output looks a little like this – consensus [number of sequences] followed by an indented list of matching sequences:

elwood:~/dev rmp$ ./haplotype-sort < haplotype-in.txt 
ACTGXTGACTGACTG [1]
  ACTGXTGACTGACTG
ACTGATGC [1]
  ACTGATGC
CTGCTGC [4]
  CTGCTGC
  CTXCTGC
  CXGCTGC
  CTGXTGC
ACTGCTGC [5]
  ACTGCTGC
  ACTGXTGC
  ACTXCTGC
  ACXGCTGC
  ACTGXTGC
ACTGACTGACTGACTGACTG [2]
  ACTGACTGACTGACTGACTG
  ACTGACTGACTGACTGACTG
ACTGTTGC [1]
  ACTGTTGC
CTGGTGC [1]
  CTGGTGC
ACTGACTGACTXACTG [1]
  ACTGACTGACTXACTG
ACXTGACTGACTGACTG [1]
  ACXTGACTGACTGACTG
ACTGGTGC [1]
  ACTGGTGC

naÃ¯ve kmer scanner

Another bit of fun, basically the opposite of yesterday’s post, here we’re detecting the number of unique kmers present in a sequence. It’s easy to do this with an iterating substr approach but I like Perl’s regex engine a lot so I wanted to do it using that. Okay, I wanted to do it entirely in one /e regex but it’s slightly trickier and a lot less clear manipulating pos inside a /e substitution function.

#!/usr/local/bin/perl
use strict;
use warnings;

my $str   = q[AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG];
my $k     = 3;
my $match = q[\s*[ACTG]\s*]x$k;
my $seen  = {};

while($str =~ m{($match)}smxgi) {
  my $m = $1;
  $m    =~ s/\s*//smxg;

  $seen->{$m}++;

  pos $str = (pos $str) - $k + 1;
}

{
  local $, = "\n";
  print sort keys %{$seen};
}

printf "\n%d unique ${k}mers\n", scalar keys %{$seen};

$k is the size of the kmers we’re looking for. In this case 3, as we were generating yesterday.
$match attempts to take care of matches across newlines, roughly what one might find inside a FASTA. YMMV.
$seen keeps track of uniques we’ve encountered so far in $str.

The while loop iterates through matches found by the regex engine and pos, a function you don’t see too often, resets the start position for the next match, in this case to the current position minus 1 less than the length of the match (pos – k + 1).

The output looks something like this:


elwood:~/dev rmp$ ./kmers 
AAA
AAC
AAG
AAT
ACA
ACC
ACG
ACT
AGA
AGC
AGG
AGT
ATA
ATC
ATG
ATT
CAA
CAC
CAG
CAT
CCA
CCC
CCG
CCT
CGA
CGC
CGG
CGT
CTA
CTC
CTG
CTT
GAA
GAC
GAG
GAT
GCA
GCC
GCG
GCT
GGA
GGC
GGG
GGT
GTA
GTC
GTG
GTT
TAA
TAC
TAG
TAT
TCA
TCC
TCG
TCT
TGA
TGC
TGG
TGT
TTA
TTC
TTG
TTT
64 unique 3mers

If I were really keen I’d make use this in a regression test for yesterday’s toy.

naÃ¯ve kmer sequence generator

This evening, for “fun”, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here’s a small, unintelligent, brute-force function I came up with.
Its alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it’s been able to hit all distinct combinations, tracked in the $seen hash. The final sequence ends up in $str.

#!/usr/local/bin/perl
use strict;
use warnings;

my $bases     = [qw(A C T G)];
my $k         = 3;
my $seen      = {};
my $str       = q[];
my $max_perms = (scalar @{$bases})**$k;
my $pos       = -1;

POS: while((scalar keys %{$seen}) < $max_perms) {
  $pos ++;
  report();

  for my $base (@{$bases}) {
    my $triple = sprintf q[%s%s],
                 (substr $str, $pos-($k-1), ($k-1)),
		 $base;
    if($pos < ($k-1) ||
       !$seen->{$triple}++) {
      $str .= $base;
      next POS;
    }
  }
  $str .= $bases->[-1];
}

sub report {
  print "len=@{[length $str]} seen @{[scalar keys %{$seen}]}/$max_perms kmers\n";
}

report();
print $str, "\n";

Executing for k=3, bases = ACTG the output looks like this:

elwood:~/dev rmp$ ./seqgen
len=0 seen 0/64 kmers
len=1 seen 0/64 kmers
len=2 seen 0/64 kmers
len=3 seen 1/64 kmers
len=4 seen 2/64 kmers
len=5 seen 3/64 kmers
len=6 seen 4/64 kmers
len=7 seen 5/64 kmers
len=8 seen 6/64 kmers
len=9 seen 7/64 kmers
len=10 seen 8/64 kmers
len=11 seen 9/64 kmers
len=12 seen 10/64 kmers
len=13 seen 10/64 kmers
len=14 seen 11/64 kmers
len=15 seen 12/64 kmers
len=16 seen 13/64 kmers
len=17 seen 14/64 kmers
len=18 seen 15/64 kmers
len=19 seen 16/64 kmers
len=20 seen 17/64 kmers
len=21 seen 18/64 kmers
len=22 seen 19/64 kmers
len=23 seen 20/64 kmers
len=24 seen 21/64 kmers
len=25 seen 22/64 kmers
len=26 seen 23/64 kmers
len=27 seen 24/64 kmers
len=28 seen 25/64 kmers
len=29 seen 26/64 kmers
len=30 seen 27/64 kmers
len=31 seen 28/64 kmers
len=32 seen 29/64 kmers
len=33 seen 30/64 kmers
len=34 seen 31/64 kmers
len=35 seen 32/64 kmers
len=36 seen 33/64 kmers
len=37 seen 34/64 kmers
len=38 seen 35/64 kmers
len=39 seen 36/64 kmers
len=40 seen 37/64 kmers
len=41 seen 37/64 kmers
len=42 seen 38/64 kmers
len=43 seen 39/64 kmers
len=44 seen 40/64 kmers
len=45 seen 41/64 kmers
len=46 seen 42/64 kmers
len=47 seen 43/64 kmers
len=48 seen 44/64 kmers
len=49 seen 45/64 kmers
len=50 seen 46/64 kmers
len=51 seen 47/64 kmers
len=52 seen 48/64 kmers
len=53 seen 49/64 kmers
len=54 seen 50/64 kmers
len=55 seen 51/64 kmers
len=56 seen 52/64 kmers
len=57 seen 53/64 kmers
len=58 seen 54/64 kmers
len=59 seen 55/64 kmers
len=60 seen 56/64 kmers
len=61 seen 57/64 kmers
len=62 seen 58/64 kmers
len=63 seen 59/64 kmers
len=64 seen 60/64 kmers
len=65 seen 61/64 kmers
len=66 seen 62/64 kmers
len=67 seen 63/64 kmers
len=68 seen 64/64 kmers
AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG

As you can see, it manages to fit 64 distinct base triples in a string of only 68 characters. It could probably be packed a little more efficiently but I don’t think that’s too bad for a first attempt.

5MHz Propagation Forecasts utilising the Experiment’s Database

Using historical data to predict future conditions

This article, written by Gwyn Williams G4FKH and published in RadCom December 2011 is © RSGB 2011, reproduced with permission.

BRIEF HISTORY

The RSGB 5MHz Experiment started toward the end of 2002, when the 5MHz Working Group was formed under the chairmanship of John Gould, G3WKL. The current NoVs for access to 5MHz expire on 30 June 2015, but it is possible that Ofcom and MoD will agree to a further extension. The 5MHz Experiment could then run well into the current solar cycle. On the website [1] are various ways in which information can be retrieved from the 5MHz database. In fact there are three separate parts to the database. The first two are concerned with the logging of the three propagation beacons that have been established: the first for manual logging and the second for automated logging. The third part is a compilation of propagation data for the same period. It suddenly occurred to me that the database could be used to forecast propagation.

BASIC CONCEPT

This article and the subsequent web application discuss the automatically logged data and the propagation data. The three beacons in questions are GB3RAL, GB3WES and GB3ORK situated in Maidenhead Locator squares, IO91in, IO84qn and IO89ja respectively. They transmit on a frequency of 5.290MHz every quarter of an hour for one minute each, starting on the hour. Each sequence therefore lasts for three minutes. The logging software was designed by Peter Martinez, G3PLX and is available on the web via the experiment’s main page [1]. To date, the automatic logging part of the database has accumulated over 1.3 million lines of data; the propagation part about one third of this. To perform the forecasts it is necessary to interrogate both of these datasets.
It will be appreciated that there are many stations contributing to the database and therefore there are an equal number of station setups! In order to explain the potential pitfall, I will explain my setup. I have a dipole aerial with a radio and PC. The output from the radio couples to the PC’s soundcard and the program manipulates data from the soundcard. My radio does not provide sufficient output to drive the soundcard, so I have installed an inline amplifier. I can, therefore, adjust my own recorded background/signal strength. Not taking care when analysing the data will cause these differences to become exaggerated. The G3PLX program compensates somewhat for this internally and displays the background signal level and the SNR of the station being monitored. This internal rationalisation and the forecasting program’s own scheme mitigate these differences.

I write small computer programs in the Perl language; during a fairly recent local club meeting I discussed this idea with Roger Pettett, G7TKI, a professional systems developer. My personal experience is not up to his standard and, as Roger kindly volunteered to develop the program in his own time, I quickly agreed. Roger is not au fait with propagation so the collaboration is a good one.

5MHz PROPAGATION

Before I go on to discuss the developed program and its usage, it seems expedient to first outline propagation at this frequency. Most of the propagation at this QRG is via NVIS (near vertical incidence skywave) communication. The E-layer (and probably, during some periods, the F-layer) do influence the longer distance circuits. NVIS for most practical purposes has a range of about 400km; therefore, if a station is set up in the middle of the UK, NVIS would be the dominant mode for the whole of the country. However, we do not live in an idealised world so a method is needed to perform forecasts between any two parts of the UK. To facilitate this database locators are used to correlate distances. The R₁₂ (twelve-month smoothed relative sunspot number) and the A_P (planetary A-index) are used to represent propagation conditions. R₁₂ is automatically retrieved from the internet, but it is necessary to input the A_P [2] that represents the propagation conditions under consideration. The best results within the UK will be achieved when using a 1â„2Î» dipole; the program suggests some aerial dimensions.
The ionosphere is fed by ionisation from the sun and this has a direct correlation with sunspot numbers. The ionosphere requires X amount of ionisation to allow for NVIS communication. For simplicity we can view the A_P as a measure of disturbance in the ionosphere. The higher the A_P, the worse the propagation will become.

Instead of using propagation prediction engines, eg VOACAP, we interrogate the databases to find parity (within parameters) to those conditions input by the user. A more complete introduction to NVIS communications can be found in [3].

5MHz FORECASTING PROGRAM

The completed program can be found at [4] and should be straightforward in use. As mentioned it is only necessary to input the A_P, month, year and locator squares that are to be forecast, the last input tick is for a smoothed graph output. Figure 1 is an example of the input form. It can take a little time to search through the databases to find the correct matches; when the relevant data are found a table like Table 1 is produced along with a graph, Figure 2.

The table, which can be viewed as a median for a whole month, comprises three headed columns. The first column is the hour of the day, the second an SNR (signal to noise ratio) for the hour and the third an explanatory note for the SNR. For the purposes of this dataset an SNR below 20 should be considered as unreadable. The graph is a pictorial view of the table data; the thin green line is where the signal drops out of usability.
It may not be apparent why we need a tool such as the one described, because after all we have experience at this QRG and have other prediction tools at our left hand. What these things do not provide is the ability to visualise propagation when conditions are disturbed or changing. Inputting the correct answers into the input form allows the userÂ to see what conditions will be like when the ionosphere becomes disturbed, such as from Coronal Mass Ejections and solar flares. This ability is lacking in prediction engines, but this program shows to what extent any degree of disturbance should have upon any circuit. However, there is one short-coming suffered by all forecasting tools and that is the ability to see things before they happen, ie a short- wave-fade from a large flare. This effect will last the same amount of time as the flare is visible from Earth (utilising the correct viewing techniques), from a minute or two, to perhaps an hour or so. These effects have a quick onset and an equally quick fall-off period.
Comparatively speaking and from interest the output from this program was compared with REC533, the same prediction engine used to produce the RadCom predictions (to be fair, predictions programs were never really designed for such short distances).
A survey was produced for RadCom in 2000 and REC533 was found to perform best. Comparing the output from REC533 (Table 2 Predicted SNR column) and what was actually copied using the automatic monitoring program (Table 2 Automatic Recording SNR column) showed that REC533 was only 25% accurate. When it is considered that this percentage of accuracy is only obtainable when geomagnetic conditions are quiet it shows a severe shortcoming.

Comparing this to the output from our new program (Table 2 Forecast SNR column) with an accuracy of 67% against actual monitoring, under all geomagnetic conditions, it shows the new program to be far superior to anything previously available. Comparison was carried out with the criteria of the SNR being within 10dB. In order to obtain more accurate results more data is required as well as data from a greater number of stations.
Table 3 and Figure 3 demonstrate the way in which conditions can change in reality. These clearly demonstrate the different conditions when an ionospheric disturbance is at hand. Of course it should be understood that diurnal conditions also have a bearing on the results shown in the table, ie at the very end and very beginning of the day, for some circuits no path is expected. What is happening is that the foF2 (F2 critical frequency) is dropping below the required frequency; this is when the E and F layers can accommodate communication.

SUMMARY

This simple but effective idea can prove useful in all sorts of circumstances, ie for everyday amateur use, emergency situations (RAYNET) and other occasions outside the amateur service where 5MHz is in use, especially when conditions become disturbed during ionospheric upheavals. It is the use of a database with all varieties of propagation conditions that is the winning ingredient in performing accurate forecasts. As the database builds up to a crescendoÂ in 2015, a whole solar cycle and more of information will have been logged and will provide an unsurpassed amount of quality data for this frequency.

WEBSEARCH

[1] www.rsgb.org/spectrumforum/hf/5mhz.php
[2] www.swpc.noaa.gov/ftpdir/weekly/27DO.txt
[3] Near Vertical Incidence Skywave Communication, Theory, Techniques and Validation by LTC, David M Fiedler, (NJ ARNG) (Ret) and MAJ Edward J Farmer, PE (CA SMR). Published by Worldradio Books,
[4] www.rsgb-spectrumforum.org.uk/5mhzpf

This article may also be downloaded in PDF format here:Â 5MHz Propagation Forecasting utilising the Experiment’s Database

3 sorts of sort

I’ve been fiddling around recently with some stuff which I’m sure I covered in my CS degree 16 (Gah! Really?) years ago but have had to re-educate myself about. Namely a few different implementations of sort. I implemented three types in Perl with some reacquaintance of the mechanisms via Wikipedia. I found a few existing examples of sort algorithms in Perl but they generally looked a bit unpleasant so like all programmers I decided to write my own (complete with errors, as an exercise to the reader). Here I’ve also added some notes, mostly to myself, which are largely unsubstantiated because I haven’t measured memory, speed or recursion depth, for example (though these are well-documented elsewhere).

1. Bubble Sort

#!/usr/bin/perl -w
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = bubblesort($set);
print "out: @{$sorted}\n";

sub bubblesort {
  my ($in)     = @_;
  my $out      = [@{$in}];
  my $length   = scalar @{$in};
  my $modified = 1;

  while($modified) {
    $modified = 0;
    for my $i (0..$length-2) {
      if($out->[$i] > $out->[$i+1]) {
	($out->[$i], $out->[$i+1]) = ($out->[$i+1], $out->[$i]);
	$modified = 1;
      }
    }
  }

  return $out;
}

Bubblesort iterates through each element of the list up to the last but one, comparing to the next element in the list. If it’s greater the values are swapped. The process repeats until no modifications are made to the list.

Pros: doesn’t use much memory – values are swapped in situ; doesn’t perform deep recursion; is easy to read

Cons: It’s pretty slow. The worst-case complexity is O(n²) passes (for each value in the list each value in the list is processed once).

2. Merge Sort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = mergesort($set);
print "out: @{$sorted}\n";

sub mergesort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $partition = $length / 2;
  my $left      = [@{$in}[0..$partition-1]];
  my $right     = [@{$in}[$partition..$length-1]];

  return merge(mergesort($left), mergesort($right));
}

sub merge {
  my ($left, $right) = @_;
  my $merge = [];

  while(scalar @{$left} || scalar @{$right}) {
    if(scalar @{$left} && scalar @{$right}) {
      if($left->[0] < $right->[0]) {
	push @{$merge}, shift @{$left};
      } else {
	push @{$merge}, shift @{$right};
      }
    } elsif(scalar @{$left}) {
      push @{$merge}, shift @{$left};
    } elsif(scalar @{$right}) {
      push @{$merge}, shift @{$right};
    }
  }
  return $merge;
}

Mergesort recurses through the list, in each iteration breaking the remaining list in half. Once broken down to individual elements, each pair of elements/lists at each depth of recursion is reconstituted into a new ordered list and returned.

Pros: generally quicker than bubblesort; O(n log n) complexity.

Cons: quite difficult to read

3. Quicksort

#!/usr/bin/perl
use strict;
use warnings;

my $set    = [map { int rand() * 99 } (0..40)];
print "in:  @{$set}\n";

my $sorted = quicksort($set);
print "out: @{$sorted}\n";

sub quicksort {
  my ($in) = @_;

  my $length = scalar @{$in};
  if($length < = 1) {
    return $in;
  }

  my $pivot = splice @{$in}, $length / 2, 1;
  my $left  = [];
  my $right = [];

  for my $v (@{$in}) {
    if($v < $pivot) {
      push @{$left}, $v;
    } else {
      push @{$right}, $v;
    }
  }

  return [@{quicksort($left)}, $pivot, @{quicksort($right)}];
}

Quicksort is probably the best known of all the sort algorithms out there. It’s easier to read than Mergesort, though arguably still not as easy as Bubblesort, but it’s a common pattern and its speed makes up for anything lacking in readability. At each iteration a pivot is selected and removed from the list. The remaining list is scanned and for element lower than the pivot is put in a new “left” list and each greater element is put into a new “right” list. The returned result is a merged recursive quicksort of the left list, the pivot and the right list.

In this example I’m picking the middle element of the list as the pivot. I’m sure there are entire branches of discrete mathematics dealing with how to choose the pivot based on the type of input data.

Pros: (One of?) the fastest sort algorithm(s) around; Reasonably efficient memory usage and recursion depth. Average O(n log n) complexity again (worst is O(n²)).

Perhaps it’s worth noting that in 25-odd years of programming computers I’ve only ever had to examine the inner workings of sort routines as part of my degree – never before, nor after, but it’s certainly brought back a few memories.

Neat Perl Gotcha

For a while now I’ve been using Test::Perl::Critic as an integral part of my daily coding. Test::Perl::Critic wraps Perl::Critic into a set of tests which can be automatically run against a distribution. Perl Critic implements Damien Conway’s set of standard Perl Best Practices.

I’m not going to go into the arguments of whether they’re good or bad rules right now. Suffice to say I use nearly all of them and my code has improved because of it. Anyway, one of the rules states you shouldn’t use parentheses for core method calls like length(), int() or rand(), so most of the time I don’t, but today I wanted to do this:

my @array = map { int rand * 45 } (1..10);

The results come out as an array of zeros. Why? Well it’s not a broken PRNG, that’s for sure. It’s a simple case of misunderstood parsing, operator precedence and Perl internals. Looking closely * 45 isn’t what you’d expect, *45 refers to the typeglob named 45 in the main package. I actually have no idea how this is cast into a number in Perl but however it’s done, it evaluates to zero. rand 0 seems to exhibit the same behaviour as rand 1, yielding a random float between zero and 1. int rand 0 will always be zero.

So? Well to stop the parser from taking *45 as an argument to rand you need to upset PerlCritic and add those parentheses back in:

my @array = map { int rand() *45 } (1..10);

and it works now so back to work!

[ The astute amongst you will realise you can just say int rand(45). I know, but that doesn’t work nearly so well as a mildly interesting example. I blame PerlCritic ;) ]

A Simple Continuous Integration (Jenkins) Dashboard

I had 15 minutes today to produce a wall-mounted-screen-compatible dashboard for showing the latest build statuses from our Jenkins continuous integration manager. It’s written in Perl and uses a few CPAN modules – XML::Simple, LWP::Simple and Readonly.

This is what it looks like:

and here’s the code:

#!/usr/local/bin/perl -T
use strict;
use warnings;
use XML::Simple;
use LWP::Simple qw(get);
use Carp;
use English qw(-no_match_vars);
use Readonly;

Readonly::Scalar our $CI      => q[http://my.ci.server/rssLatest];
Readonly::Scalar our $COLUMNS => 6;

my $str     = get($CI);
my $xml     = XMLin($str);
my @entries = map { $xml->{entry}->{$_} } sort keys %{$xml->{entry}};

print <<"EOT" or croak qq[Error printing: $ERRNO];
Content-type: text/html

<html>
 <head>
  <title>Continuous Integration HUD</title>
  <meta http-equiv="refresh" content="120; url=$ENV{SCRIPT_NAME}"/>
  <style type="text/css">
.stable { background-color: green }
.unstable { background-color: yellow }
.broken { background-color: red }
table { margin: 0 auto; }
a { font-size: bigger; text-decoration: none; color: black; }
  </style>
  <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script>
  <script type="text/javascript">
\$(document).ready(redraw);
\$(window).resize(redraw);

function redraw() {
   \$('tr').height((\$(window).height()-60)/\$('tr').size());
   \$('td').width((\$(window).width()-60)/\$('tr').first().find('td').size());
}
  </script>
 </head>
 <body>
EOT

print qq[<table>\n] or croak qq[Error printing: $ERRNO];
while(scalar @entries) {
  print qq[ <tr>\n] or croak qq[Error printing: $ERRNO];
  for my $j (1..$COLUMNS) {
    my $entry = shift @entries;
    if(!$entry) {
      last;
    }

    my $title = $entry->{title};
    my $class = q[stable];
    $class    = ($title =~ /unstable/smx) ? 'unstable' : $class;
    $class    = ($title =~ /broken/smx)   ? 'broken'   : $class;
    $title    =~ s{\s+[(].*?$}{}smx;

    my $href = $entry->{link}->{href};
    print qq[  <td class="$class"><a href="$href">$title</a></td>] or croak qq[Error printing: $ERRNO];
  }
  print qq[ </tr>\n] or croak qq[Error printing: $ERRNO];
}
print qq[</table>\n] or croak qq[Error printing: $ERRNO];

print <<'EOT' or croak qq[Error printing: $ERRNO];
 </body>
</html>
EOT