Multivariate Charts from HTML tables in D3.js

For a dynamic monovariate (single line) chart, please see my earlier post – http://psyphi.net/blog/2013/04/charts-from-tables-with-d3js-and-jquery/.

Sometimes you just have to plot more than one dataset on the same chart, but you might have a complex data table with some “collections” of single-values and some collections of multiple values. Here I’ve put together an example from something I’ve been working on recently. Once your back-end queries (SQL or whatever) are written and your templates convert those data into basic HTML tables, you can plot then straight to SVG/D3 without much extra work.

Nearly all of that extra work is around adding appropriate classes to cells to distinguish columns and collections of columns. The rest is to extract those cells out again and decide which should be plotted together.

In this example, tabs and table headings belong to classes “collection_#” “a_c#” where the collection_# identifies a set of columns to be displayed together and the a_c# identifies the (links for the) columns themselves. Collections with multiple columns therefore have a single collection class but contain more than one a_c# class.

Next each table tbody td data cell belongs to a c# class, one for each column. Each one is also uniquely identified by a td#_<date> which allows hovers on the table cell to highlight the SVG data point and vice versa. Next each cell contains a span with a “val” class (more on that in the next post).

SVG paths may now be built for each column. Clicks on table-headings and tabs are able to examine which columns co-display because they belong in the same collection and then scale and plot them appropriately.

Note that the first and last tabs in this example plot single lines to demonstrate mixed collections in action. The middle two tabs have two lines each but there’s no reason why you couldn’t have more (although there are only seven colours listed at the moment).

Bookmarks for August 5th through October 14th

These are my links for August 5th through October 14th:

Cross-platform automated builds for node-webkit applications

Recently I’ve been extending my “classic” JavaScript knowledge by learning NodeJS. I’m sad to say that writing cross-platform, Desktop-class applications in Perl is just way too much hassle. However, having also discovered node-webkit I’ve been able to accelerate my desktop application development using classic HTML & CSS knowledge and improving my JavaScript techniques, mostly trying to better understand fully asynchronous, non-blocking programming. Apart from some initial mind-bending scoping experiences which maybe I’ll come back to another day, it’s generally been a breeze.

One of the useful things I’ve been able to do is to automate cross-platform application builds for Windows and Mac (Linux to come, not a priority right now but should be easy – feel free to comment). It’s not compilation, but more like application packaging.

My project has the node-webkit distributable zips in “src/”. The target folder is “dist/” and I’m also using a few DOS tools (zip.exe & unzip.exe and the commandline Anolis Resource editor) which live in dist/tools. The targets are built with timestamped filenames, a .exe in “dist/win/” for Windows and a .dmg in “dist/mac/” for OSX. I don’t do anything clever with Info.plist on Mac though I know I should, but the icons are set on both platforms, assuming they’ve been pre-generated and saved in the right places (resources/).

On OSX I’m using system make which presumably came with XCode. On Windows I’m using gmake which on my system came with a previous installation of Strawberry Perl but is also available in a Windows binary/installer.

My Makefile looks something like below (“make” not being one of my strongest skills – apologies for the ugly stuff). It might not be 100% complete as it’s been excised out of the original, much more complicated Makefile, so use with caution. If anyone has any tips on stuffing it all into NSIS automatically as well, please comment.

NW    := node-webkit-v0.7.3
NWWIN := $(NW)-win-ia32
NWMAC := $(NW)-osx-ia32
NWLIN := $(NW)-linux-x64

deps:
	rm -rf node_modules
	npm install

windeps:
	if exist node_modules rmdir node_modules /s /q
	npm install

# zip.exe &amp; unzip.exe from http://stahlworks.com/dev/?tool=zipunzip
# Resourcer.exe from http://anolis.codeplex.com/
win: windeps
	if exist mdc.nw del mdc.nw /q
	if exist dist\win rmdir dist\win /s /q
	if exist tmp rmdir tmp /s /q
	dist\tools\zip -r mdc.nw mdc package.json node_modules
	mkdir dist\win tmp
	dist\tools\unzip -d tmp -o src\$(NWWIN).zip
	dist\tools\Resourcer -op:del -src:tmp\nw.exe -type:14 -name:IDR_MAINFRAME
	dist\tools\Resourcer -op:add -src:tmp\nw.exe -type:14 -name:IDR_MAINFRAME -file:resources\mdc72x72.ico -lang:1033
	copy /b tmp\nw.exe+mdc.nw dist\win\mdc.exe
	copy tmp\icudt.dll dist\win
	copy tmp\nw.pak dist\win
	if exist mdc.nw del mdc.nw /q
	if exist tmp rmdir tmp /q /s
	dist\tools\zip -r dist\win\mdc-$(shell echo %date:~-4,4%%date:~3,2%%date:~0,2%-%time:~0,2%%time:~3,2%).zip dist\win

mac: deps
	[ ! -f mdc.nw ] || rm mdc.nw
	zip -r mdc.nw mdc package.json resources/mdc72x72.png node_modules
	touch node-webkit.app
	rm -rf node-webkit.app dist/mdc.app dist/mac
	mkdir dist/mac
	unzip -o src/$(NWMAC).zip
	mv node-webkit.app dist/mac/mdc.app
	mv mdc.nw dist/mac/mdc.app/Contents/Resources/app.nw
	rm dist/mac/mdc.app/Contents/Resources/nw.icns
	sips -s format icns resources/mdc512x512.png --out dist/mac/mdc.app/Contents/Resources/mdc.icns
	perl -i -pe 's{nw[.]icns}{mdc.icns}smxg' dist/mac/mdc.app/Contents/Info.plist
	perl -i -pe 's{node[-]webkit[ ]App}{MDC}smxg' dist/mac/mdc.app/Contents/Info.plist
	hdiutil create dist/mac/mdc-$(shell date +'%Y%m%d-%H%M').dmg -ov -volname "MDC" -fs HFS+ -srcfolder dist/mac/mdc.app
	rm -rf dist/mac/mdc.app

test:
	@./node_modules/.bin/mocha

.PHONY: test

Bookmarks for June 30th through July 2nd

These are my links for June 30th through July 2nd:

Bookmarks for June 5th through June 26th

These are my links for June 5th through June 26th:

Random Sequence Mutator

Here’s a handy one(ish)-liner to mutate an input sequence using Perl’s RegEx engine:

epiphyte:~ rmp$ perl -e '$seq="ACTAGCTACGACTAGCATCGACT"; $mutants = [qw(A C T G)];
  print "$seq\n";
  $seq =~ s{([ATGC])}{ rand() < 0.5 ? $mutants->[int rand 4] : $1 }smiexg;
  print "$seq\n";'
ACTAGCTACGACTAGCATCGACT
ACAATCGCGGACCAGAATCTCTT

This gives each base in $seq a 50% chance (rand() < 0.5) of mutating to something, but as the original base is in the available $mutants array it has a further 25% chance of changing to itself. If you wanted to improve it by excluding the original base for each mutation you might do something like:

epiphyte:~ rmp$ perl -e '$seq="ACTAGCTACGACTAGCATCGACT"; $mutants = [qw(A C T G)];
  $mutsize=scalar @{$mutants}; print "$seq\n";
  $seq =~ s{([ATGC])}{ rand() < 0.5 ? [grep { $_ ne $1 } @{$mutants}]->[int rand $mutsize-1] : $1 }smiexg;
  print "$seq\n";'
ACTAGCTACGACTAGCATCGACT
TGTAGATAATGTGATACGAGACT

This (quite inefficiently) builds an array of all available options from $mutants, excluding $1 the matched base at each position.

Unrolling it and tidying it up a little for readability looks like this:

my $seq     = 'ACTAGCTACGACTAGCATCGACT';
my $mutants = [qw(A C T G)];
my $mutsize = scalar @{$mutants};

print "$seq\n";

$seq =~ s{([ATGC])}{
   rand() < 0.5
    ?
   [grep { $_ ne $1 } @{$mutants}]->[int rand $mutsize-1]
    :
   $1
 }smiexg;

print "$seq\n";'

Bookmarks for May 6th through May 22nd

These are my links for May 6th through May 22nd:

unique, overlapping kmer strings

Tinkering today I wrote a quick toy to generate strings of unique, overlapping kmers. Not particularly efficient, but possibly handy nonetheless.

It takes a given k size, a configurable overlap and optionally the bases to use. First it generates a list of all the kmers then it recursively scans for matching overlapping kmers and extends a seed, terminating the recursion and printing if all kmers have been used.

Run it like so:

 ./kmer-overlap -k=3 -overlap=2 ACTG
#!/usr/local/bin/perl
#########
# Author:        rmp
# Created:       2013-05-15
# Last Modified: $Date$
# Id:            $Id$
# HeadURL:       $HeadURL$
#
use strict;
use warnings;
use Getopt::Long;
use Readonly;
use English qw(-no_match_vars);

Readonly::Scalar our $DEFAULT_K => 3;
Readonly::Scalar our $DEFAULT_BASES => [qw(A C T G)];

my $opts = {};
GetOptions($opts, qw(k=s rand help));

if($opts->{help}) {
  print < <"EOT"; $PROGRAM_NAME - rmp 2013-05-15 Usage:  $PROGRAM_NAME -k=3 -overlap=2 -rand ACTG EOT   exit; } my $k       = $opts->{k}       || $DEFAULT_K;
my $overlap = $opts->{overlap} || $k-1;
my $bases   = $DEFAULT_BASES;

if(scalar @ARGV) {
  $bases = [grep { $_ } map {split //smx} @ARGV];
}

#########
# Build all available kmers
#
my $kmers = [];

for my $base1 (@{$bases}) {
  build($base1, $bases, $kmers);
}

#########
# optionally randomise the seeds
#
if($opts->{rand}) {
  shuffle($kmers);
}

#########
# start with a seed
#
for my $seed (@{$kmers}) {
  my $seen = {
	      $seed => 1,
	     };
  solve($seed, $seen);
}

sub build {
  my ($seq, $bases, $kmers) = @_;
  if(length $seq == $k) {
    #########
    # reached target k - store & terminate
    #
    push @{$kmers}, $seq;
    return 1;
  }

  for my $base (@{$bases}) {
    ########
    # extend and descend
    #
    build("$seq$base", $bases, $kmers);
  }

  return;
}

sub solve {
  my ($seq_in, $seen) = @_;

  if(scalar keys %{$seen} == scalar @{$kmers}) {
    #########
    # exhausted all kmers - completed!
    #
    print $seq_in, "\n";
    return 1;
  }

  my $seq_tail     = substr $seq_in, -$overlap, $overlap;

  my $overlapping  = [grep { $_ =~ /^$seq_tail/smx } # filter in only seqs which overlap the seed tail
		      grep { !$seen->{$_} }          # filter out kmers we've seen already
		      @{$kmers}];
  if(!scalar @{$overlapping}) {
    #########
    # no available overlapping kmers - terminate!
    #
    return;
  }

  if($opts->{rand}) {
    shuffle($overlapping);
  }

  my $overhang = $k-$overlap;
  for my $overlap_seq (@{$overlapping}) {
    #########
    # extend and descend
    #
    my $seq_out = $seq_in . substr $overlap_seq, -$overhang, $overhang;
    solve($seq_out, {%{$seen}, $overlap_seq => 1});
  }

  return;
}

sub shuffle {
  my ($arr_in) = @_;
  for my $i (0..scalar @{$arr_in}-1) {
    my $j = int rand $i;
    ($arr_in->[$i], $arr_in->[$j]) = ($arr_in->[$j], $arr_in->[$i]);
  }
}

Output looks like this:

epiphyte:~ rmp$ ./kmer-overlap -k=2 AC
AACCA
ACCAA
CAACC
CCAAC

restart a script when a new version is deployed

I have a lot of scripts running in a lot of places, doing various little jobs, mostly shuffling data files around and feeding them into pipelines and suchlike. I also use Jenkins CI to automatically run my tests and build deb packages for Debian/Ubuntu Linux. Unfortunately, being a lazy programmer I haven’t read up about all the great things deb and apt can do so I don’t know how to fire shell commands like “service x reload” or “/etc/init.d/x restart” once a package has been deployed. Kicking a script to pick up changes is quite a common thing to do.

Instead I have a little trick that makes use of the build process changing timestamps on files when it rolls up the package. So when the script wakes up, and starts the next iteration of its event loop, the first thing it does is check the timestamp of itself and if it’s different from the last iteration it executes itself, replacing the running process with a fresh one.

One added gotcha is that if you want to run in taint mode you need to satisfy a bunch of extra requirements such as detainting $ENV{PATH} and all commandline arguments before any re-execing occurs.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author: rpettett
# Last Modified: $Date$
# Id: $Id$
# $HeadURL$
#
use strict;
use warnings;
use Readonly;
use Carp;
use English qw(-no_match_vars);
our $VERSION = q[1.0];

Readonly::Scalar our $SLEEP_LONG  => 600;
Readonly::Scalar our $SLEEP_SHORT => 30;

$OUTPUT_AUTOFLUSH++;

my @original_argv = @ARGV;

#########

# handle SIGHUP restarts
#
local $SIG{HUP} = sub {
  carp q[caught SIGHUP];
  exec $PROGRAM_NAME, @original_argv;
};

my $last_modtime;

while(1) {
  #########
  # handle software-deployment restarts
  #
  my $modtime = -M $PROGRAM_NAME;

  if($last_modtime && $last_modtime ne $modtime) {
    carp q[re-execing];
    exec $PROGRAM_NAME, @original_argv;
  }
  $last_modtime = $modtime;

  my $did_work_flag;
  eval {
    $did_work_flag = do_stuff();
    1;
  } or do {
    $did_work_flag = 0;
  };

  local $SIG{ALRM} = sub {
    carp q[rudely awoken by SIGALRM];
  };

  my $sleep = $did_work_flag ? $SLEEP_SHORT : $SLEEP_LONG;
  carp qq[sleeping for $sleep];
  sleep $sleep;
}