programming - psyphi.net blog

Signing MacOSX apps with Linux

Do you, like me, develop desktop applications for MacOSX? Do you, like me, do it on Linux because it makes for a much cheaper and easier to manage gitlab CI/CD build farm? Do you still sign your apps using a MacOSX machine, or worse (yes, like me), not sign them at all, leaving ugly popups like the one below?

With the impending trustpocalypse next month a lot of third-party (non-app-store) apps for MacOSX are going to start having deeper trust issues than they’ve had previously, no doubt meaning more, uglier popups than that one, or worse, not being able to run at all.

I suspect this trust-tightening issue, whilst arguably a relatively good thing to do to in the war against malware, will adversely affect a huge number of open-source Mac applications where the developer/s wish to provide Mac support for their users but may not wish to pay the annual Apple Developer tax even though it’s still relatively light, or may not even own any Apple hardware (though who knows how they do their integration testing?). In-particular this is likely to affect very many applications built with Electron or NWJS, into which group this post falls.

Well, this week I’ve been looking into this issue for one of the apps I look after, and I’m pleased to say it’s at a stage where I’m comfortable writing something about it. The limitation is that you don’t sidestep paying the Apple Developer tax, as you do still need valid certs with the Apple trust root. But you can sidestep paying for more Apple hardware than you need, i.e. nothing needed in the build farm.

First I should say all of the directions I used came from a 2016 article, here. Thanks very much to Allin Cottrell.

Below is the (slightly-edited) script now forming part of the build pipeline for my app. Hopefully the comments make it fairly self-explanatory. Before you say so, yes I’ve been lazy and haven’t parameterised directory and package names yet.

#!/bin/bash

#########
# This is a nwjs (node) project so fish the version out of package.json
#
VERSION=$(jq -r .version package.json)

#########
# set up the private key for signing, if present
#
rm -f key.pem
if [ "$APPLE_PRIVATE_KEY" != "" ]; then
    echo "$APPLE_PRIVATE_KEY" > key.pem
fi

#########
# temporary build folder/s for package construction
#
rm -rf build
mkdir build && cd build
mkdir -p flat/base.pkg flat/Resources/en.lproj
mkdir -p root/Applications;

#########
# stage the unsigned applicatio into the build folder
#
cp -pR "../dist/EPI2MEAgent/osx64/EPI2MEAgent.app" root/Applications/

#########
# fix a permissions issue which only manifests after following cpio stage
# nw.app seems to be built with owner-read only. no good when packaging as root
#
chmod go+r "root/Applications/EPI2MEAgent.app/Contents/Resources/app.nw"

#########
# pack the application payload
#
( cd root && find . | cpio -o --format odc --owner 0:80 | gzip -c ) > flat/base.pkg/Payload

#########
# calculate a few attributes
#
files=$(find root | wc -l)
bytes=$(du -b -s root | awk '{print $1}')
kbytes=$(( $bytes / 1000 ))

#########
# template the Installer PackageInfo
#
cat <<EOT > flat/base.pkg/PackageInfo
<pkg-info format-version="2" identifier="com.metrichor.agent.base.pkg" version="$VERSION" install-location="/" auth="root">
  <payload installKBytes="$kbytes" numberOfFiles="$files"/>
  <scripts>
    <postinstall file="./postinstall"/>
  </scripts>
  <bundle-version>
    <bundle id="com.metrichor.agent" CFBundleIdentifier="com.nw-builder.epimeagent" path="./Applications/EPI2MEAgent.app" CFBundleVersion="$VERSION"/>
  </bundle-version>
</pkg-info>
EOT

#########
# configure the optional post-install script with a popup dialog
#
mkdir -p scripts
cat <<EOT > scripts/postinstall
#!/bin/bash

osascript -e 'tell app "Finder" to activate'
osascript -e 'tell app "Finder" to display dialog "To get the most of EPI2ME please also explore the Nanopore Community https://community.nanoporetech.com/ ."'
EOT

chmod +x scripts/postinstall

#########
# pack the postinstall payload
#
( cd scripts && find . | cpio -o --format odc --owner 0:80 | gzip -c ) > flat/base.pkg/Scripts
mkbom -u 0 -g 80 root flat/base.pkg/Bom

#########
# Template the flat-package Distribution file together with a MacOS version check
#
cat <<EOT > flat/Distribution
<?xml version="1.0" encoding="utf-8"?>
<installer-script minSpecVersion="1.000000" authoringTool="com.apple.PackageMaker" authoringToolVersion="3.0.3" authoringToolBuild="174">
    <title>EPI2MEAgent $VERSION</title>
    <options customize="never" allow-external-scripts="no"/>
    <domains enable_anywhere="true"/>
    <installation-check script="pm_install_check();"/>
    <script>
function pm_install_check() {
  if(!(system.compareVersions(system.version.ProductVersion,'10.12') >= 0)) {
    my.result.title = 'Failure';
    my.result.message = 'You need at least Mac OS X 10.12 to install EPI2MEAgent.';
    my.result.type = 'Fatal';
    return false;
  }
  return true;
}
    </script>
    <choices-outline>
        <line choice="choice1"/>
    </choices-outline>
    <choice id="choice1" title="base">
        <pkg-ref id="com.metrichor.agent.base.pkg"/>
    </choice>
    <pkg-ref id="com.metrichor.agent.base.pkg" installKBytes="$kbytes" version="$VERSION" auth="Root">#base.pkg</pkg-ref>
</installer-script>
EOT

#########
# pack the Installer
#
( cd flat && xar --compression none -cf "../EPI2MEAgent $VERSION Installer.pkg" * )

#########
# check if we have a key for signing
#
if [ ! -f ../key.pem ]; then
    echo "not signing"
    exit
fi

#########
# calculate attribute
: | openssl dgst -sign ../key.pem -binary | wc -c > siglen.txt

#########
# xar the Installer package
#
xar --sign -f "EPI2MEAgent $VERSION Installer.pkg" \
    --digestinfo-to-sign digestinfo.dat --sig-size $(cat siglen.txt) \
    --cert-loc ../dist/tools/mac/certs/cert00 --cert-loc ../dist/tools/mac/certs/cert01 --cert-loc ../dist/tools/mac/certs/cert02

#########
# construct the signature
#
openssl rsautl -sign -inkey ../key.pem -in digestinfo.dat \
        -out signature.dat

#########
# add the signature to the installer
#
xar --inject-sig signature.dat -f "EPI2MEAgent $VERSION Installer.pkg"

#########
# clean up
#
rm -f signature.dat digestinfo.dat siglen.txt key.pem

With all that you still need a few assets. I built and published (internally) corresponding debs for xar v1.6.1 and bomutils 0.2. You might want to compile & install those from source – they’re pretty straightforward builds.

Next, you need a signing identity. I used XCode (Preferences => Accounts => Apple ID => Manage Certificates) to add a new Mac Installer Distribution certificate. Then used that to sign my .app once on MacOS in order to fish out the Apple cert chain (there are probably better ways to do this)

productsign --sign LJXXXXXX58 \
        build/EPI2MEAgent\ 2020.1.14\ Installer.pkg \
        EPI2MEAgent\ 2020.1.14\ Installer.pkg

Then fish out the certs

xar -f EPI2MEAgent\ 2020.1.14\ Installer.pkg \
        --extract-certs certs
mac:~/agent rmp$ ls -l certs/
total 24
-rw-r--r--  1 rmp  Users  1494 15 Jan 12:06 cert00
-rw-r--r--  1 rmp  Users  1062 15 Jan 12:06 cert01
-rw-r--r--  1 rmp  Users  1215 15 Jan 12:06 cert02

Next use Keychain to export the .p12 private key for the “3rd Party Mac Developer Installer” key. Then openssl it a bit to convert to a pem.

openssl pkcs12 -in certs.p12 -nodes | openssl rsa -out key.pem

I set this up the contents of key.pem as a gitlab CI/CD Environment Variable APPLE_PRIVATE_KEY so it’s never committed to the project source tree.

Once all that’s in place it should be possible to run the script (paths-permitting, obviously yours will be different) and end up with an installer looking something like this. Look for the closed padlock in the top-right, and the fully validated chain of certificate trust.

In conclusion, the cross-platform application nwjs builds (Mac, Windows, Linux) all run using nw-builder on ubuntu:18.04, and the Mac (and Windows, using osslsigncode, maybe more on that later) also all run on ubuntu:18.04. Meaning one docker image for the Linux-based Gitlab CI/CD build farm. Nice!

Remote Power Management using Arduino

2016-03-07 Update: Git Repo available

Recently I’ve been involved with building a hardware device consisting of a cluster of low-powerÂ PC servers. The boards chosen for this particular project aren’t enterprise or embedded -style boards with specialist features like out of band (power) management (like Dell’s iDRAC or Intel’s AMT) so I started thinking about how toÂ approximate something similar.

It’s also a little reminiscent of STONITH (Shoot The Other Node In The Head), used for aspects of the Linux-HA (High Availability) services.

I dug around in a box of goodies and found a couple of handy parts:

Arduino Duemilanove
Seeedstudio Arduino Relay Shield v3

The relays are rated for switching up to 35V at 8A – easily handling the 19V @ 2A for the mini server boards I’m remote managing.

The other handy thing to notice is that the Arduino by its nature is serial-enabled, meaning you can control itÂ very simply using a USB connection to the management system without needing any more shields or adapters.

Lastly it’s worth mentioning that the relays are effectively SPDT switches so have connections for circuit open and closed.Â In my case this is useful as most of the time I don’t want the relays to be energised, saving power and prolonging the life of the relay.

The example Arduino code below opens a serial port and collects characters in a string variable until a carriage-return (0x0D) before acting, accepting commands “on”, “off” and “reset”. When a command is completed, the code clears the command buffer and flips voltages on the digital pins controlling the relays. Works a treat – all I need to do now is splice the power cables for the cluster compute units and run them through the right connectors on the relay boards. With the draw the cluster nodes pull being well within the specs of the relays it might even be possible to happily run two nodesÂ through each relay.

There’s no reason why this sort of thing couldn’t be used for many other purposes too – home automation or other types of remote management, and could obviously be activated over ethernet, wifi or bluetooth instead of serial – goes without saying for a relay board -duh!

int MotorControl1 = 4;
int MotorControl2 = 5;
int MotorControl3 = 6;
int MotorControl4 = 7;
int incomingByte = 0; // for incoming serial data
String input = ""; // for command message

void action (String cmd) {
  if(cmd == "off") {
    digitalWrite(MotorControl1, HIGH); // NO1 + COM1
    digitalWrite(MotorControl2, HIGH); // NO2 + COM2
    digitalWrite(MotorControl3, HIGH); // NO3 + COM3
    digitalWrite(MotorControl4, HIGH); // NO4 + COM4
    return;
  }

  if(cmd == "on") {
    digitalWrite(MotorControl1, LOW); // NC1 + COM1
    digitalWrite(MotorControl2, LOW); // NC2 + COM2
    digitalWrite(MotorControl3, LOW); // NC3 + COM3
    digitalWrite(MotorControl4, LOW); // NC4 + COM4
    return;
  }

  if(cmd == "reset") {
    action("off");
    delay(1000);
    action("on");
    return;
  }

  Serial.println("unknown action");
}

// the setup routine runs once when you press reset:
void setup() {
  pinMode(MotorControl1, OUTPUT);
  pinMode(MotorControl2, OUTPUT);
  pinMode(MotorControl3, OUTPUT);
  pinMode(MotorControl4, OUTPUT);
  Serial.begin(9600); // opens serial port, sets data rate to 9600 bps
  Serial.println("relay controller v0.1 rmp@psyphi.net actions are on|off|reset");
  input = "";
} 

// the loop routine runs over and over again forever:
void loop() {
  if (Serial.available() > 0) {
    incomingByte = Serial.read();

    if(incomingByte == 0x0D) {
      Serial.println("action:" + input);
      action(input);
      input = "";
    } else {
      input.concat(char(incomingByte));
    }
  } else {
    delay(1000); // no need to go crazy
  }
}

Pushing Jenkins Job Build Statuses to Geckoboard

IÂ love using Geckoboard. I love using Jenkins. I do have a few issues connecting the two though.

My JenkinsÂ build cluster sits inside my corporate network and while there is a Jenkins plugin for Geckoboard it will only connect to Jenkins instances it can see on the public internet. IÂ haven’t yet found a Geckoboard plugin for Jenkins to push results out through either. One day soon I’ll be annoyed enough to learn some Java and write one but until then I have a hack.

The core configurations of most of my Jenkins jobs runsÂ approximately on these lines:

make deb && scp *deb deb-repo.my.net:/var/www/apt/incoming/

i.e. build a .deb (for Ubuntu) and if successful, copy andÂ queue it for indexing by reprepro on my .deb repository server.

Now in Geckoboard I can configure a 1×1 Custom Text widget for PUSH data and publish data to it like so:

curl https://push.geckoboard.com/v1/send/F639F1AE-2227-11E4-A773-8FE5A58BF7C4 \
-d "{"api_key":"AC738FE5A58BF7C4","data":{"item":[{"text":"packagename.deb","type":0}]}}"

Let’s make it a little more sustainable. In the main Jenkins configuration I set up a global environment variable called GECKO_APIKEYÂ with a value ofÂ AC738FE5A58BF7C4. Now the line reads:

curl https://push.geckoboard.com/v1/send/F639F1AE-2227-11E4-A773-8FE5A58BF7C4 \
-d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"packagename.deb","type":0}]}}"

I know I’ll need to change the posted data on failure which most like means duplicating some or all of that line so I’ll extract the widget id too. The job is now configured like:

export WIDGET=F639F1AE-2227-11E4-A773-8FE5A58BF7C4
make deb && scp *deb deb-repo.my.net:/var/www/apt/incoming/
curl https://push.geckoboard.com/v1/send/$WIDGET \
 -d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"packagename.deb","type":0}]}}"

But it’s not yet triggered differently on success or failure, so…

export WIDGET=F639F1AE-2227-11E4-A773-8FE5A58BF7C4
make deb && scp *deb deb-repo.my.net:/var/www/apt/incoming/ Â && \
curl https://push.geckoboard.com/v1/send/$WIDGETÂ \
 -d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"packagename.deb","type":0}]}}" || \
curl https://push.geckoboard.com/v1/send/$WIDGETÂ \
 -d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"packagename.deb","type":1}]}}"

The duplicate URL and packagename.deb are annoyingÂ aren’t they? A quick look at the Jenkins docs reveals $JOB_NAME has what we want.

export WIDGET=F639F1AE-2227-11E4-A773-8FE5A58BF7C4
export GECKO_URL=https://push.geckoboard.com/v1/send/$WIDGET
make deb && scp *deb deb-repo.my.net:/var/www/apt/incoming/ Â && \
curl $GECKO_URLÂ \
 -d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"$JOB_NAME PASS","type":0}]}}" || \
curl $GECKO_URLÂ \
 -d "{"api_key":"$GECKO_APIKEY","data":{"item":[{"text":"$JOB_NAME FAIL","type":1}]}}"

Not too bad.Â It even works on Windows without too manyÂ modifications – “set” instead of “export”, %VAR% instead of $VAR and a Windows curl binary added to %PATH%.

Note: All API keys and Widget Ids have been changed to protect the innocent.

Multivariate Charts from HTML tables in D3.js

For a dynamic monovariate (single line) chart, please see my earlier post – http://psyphi.net/blog/2013/04/charts-from-tables-with-d3js-and-jquery/.

Sometimes you just have to plot more than one dataset on the same chart, but you might have a complex data table with some “collections” of single-values and some collections of multiple values. Here I’ve put together an example from something I’ve been working on recently. Once your back-end queries (SQL or whatever) are written and your templates convert those data into basic HTML tables, you can plot then straight to SVG/D3 without much extra work.

Nearly all of that extra work is around adding appropriate classes to cells to distinguish columns and collections of columns. The rest is to extract those cells out again and decide which should be plotted together.

In this example, tabs and table headings belong to classes “collection_#” “a_c#” where the collection_# identifies a set of columns to be displayed together and the a_c# identifies the (links for the) columns themselves. Collections with multiple columns therefore have a single collection class but contain more than one a_c# class.

Next each table tbody td data cell belongs to a c# class, one for each column. Each one is also uniquely identified by a td#_<date> which allows hovers on the table cell to highlight the SVG data point and vice versa. Next each cell contains a span with a “val” class (more on that in the next post).

SVG paths may now be built for each column. Clicks on table-headings and tabs are able to examine which columns co-display because they belong in the same collection and then scale and plot them appropriately.

Note that the first and last tabs in this example plot single lines to demonstrate mixed collections in action. The middle two tabs have two lines each but there’s no reason why you couldn’t have more (although there are only seven colours listed at the moment).

Cross-platform automated builds for node-webkit applications

Recently I’ve been extending my “classic” JavaScript knowledge by learning NodeJS. I’m sad to say that writing cross-platform, Desktop-class applications in Perl is just way too much hassle. However, having also discovered node-webkit I’ve been able to accelerate my desktop application development using classic HTML & CSS knowledge and improving my JavaScript techniques, mostly trying to better understand fully asynchronous, non-blocking programming. Apart from some initial mind-bending scoping experiences which maybe I’ll come back to another day, it’s generally been a breeze.

One of the useful things I’ve been able to do is to automate cross-platform application builds for Windows and Mac (Linux to come, not a priority right now but should be easy – feel free to comment). It’s not compilation, but more like application packaging.

My project has the node-webkit distributable zips in “src/”. The target folder is “dist/” and I’m also using a few DOS tools (zip.exe & unzip.exe and the commandline Anolis Resource editor) which live in dist/tools. The targets are built with timestamped filenames, a .exe in “dist/win/” for Windows and a .dmg in “dist/mac/” for OSX. I don’t do anything clever with Info.plist on Mac though I know I should, but the icons are set on both platforms, assuming they’ve been pre-generated and saved in the right places (resources/).

On OSX I’m using system make which presumably came with XCode. On Windows I’m using gmake which on my system came with a previous installation of Strawberry Perl but is also available in a Windows binary/installer.

My Makefile looks something like below (“make” not being one of my strongest skills – apologies for the ugly stuff). It might not be 100% complete as it’s been excised out of the original, much more complicated Makefile, so use with caution. If anyone has any tips on stuffing it all into NSIS automatically as well, please comment.

NW    := node-webkit-v0.7.3
NWWIN := $(NW)-win-ia32
NWMAC := $(NW)-osx-ia32
NWLIN := $(NW)-linux-x64

deps:
	rm -rf node_modules
	npm install

windeps:
	if exist node_modules rmdir node_modules /s /q
	npm install

# zip.exe &amp; unzip.exe from http://stahlworks.com/dev/?tool=zipunzip
# Resourcer.exe from http://anolis.codeplex.com/
win: windeps
	if exist mdc.nw del mdc.nw /q
	if exist dist\win rmdir dist\win /s /q
	if exist tmp rmdir tmp /s /q
	dist\tools\zip -r mdc.nw mdc package.json node_modules
	mkdir dist\win tmp
	dist\tools\unzip -d tmp -o src\$(NWWIN).zip
	dist\tools\Resourcer -op:del -src:tmp\nw.exe -type:14 -name:IDR_MAINFRAME
	dist\tools\Resourcer -op:add -src:tmp\nw.exe -type:14 -name:IDR_MAINFRAME -file:resources\mdc72x72.ico -lang:1033
	copy /b tmp\nw.exe+mdc.nw dist\win\mdc.exe
	copy tmp\icudt.dll dist\win
	copy tmp\nw.pak dist\win
	if exist mdc.nw del mdc.nw /q
	if exist tmp rmdir tmp /q /s
	dist\tools\zip -r dist\win\mdc-$(shell echo %date:~-4,4%%date:~3,2%%date:~0,2%-%time:~0,2%%time:~3,2%).zip dist\win

mac: deps
	[ ! -f mdc.nw ] || rm mdc.nw
	zip -r mdc.nw mdc package.json resources/mdc72x72.png node_modules
	touch node-webkit.app
	rm -rf node-webkit.app dist/mdc.app dist/mac
	mkdir dist/mac
	unzip -o src/$(NWMAC).zip
	mv node-webkit.app dist/mac/mdc.app
	mv mdc.nw dist/mac/mdc.app/Contents/Resources/app.nw
	rm dist/mac/mdc.app/Contents/Resources/nw.icns
	sips -s format icns resources/mdc512x512.png --out dist/mac/mdc.app/Contents/Resources/mdc.icns
	perl -i -pe 's{nw[.]icns}{mdc.icns}smxg' dist/mac/mdc.app/Contents/Info.plist
	perl -i -pe 's{node[-]webkit[ ]App}{MDC}smxg' dist/mac/mdc.app/Contents/Info.plist
	hdiutil create dist/mac/mdc-$(shell date +'%Y%m%d-%H%M').dmg -ov -volname "MDC" -fs HFS+ -srcfolder dist/mac/mdc.app
	rm -rf dist/mac/mdc.app

test:
	@./node_modules/.bin/mocha

.PHONY: test

Random Sequence Mutator

Here’s a handy one(ish)-liner to mutate an input sequence using Perl’s RegEx engine:

epiphyte:~ rmp$ perl -e '$seq="ACTAGCTACGACTAGCATCGACT"; $mutants = [qw(A C T G)];
  print "$seq\n";
  $seq =~ s{([ATGC])}{ rand() < 0.5 ? $mutants->[int rand 4] : $1 }smiexg;
  print "$seq\n";'
ACTAGCTACGACTAGCATCGACT
ACAATCGCGGACCAGAATCTCTT

This gives each base in $seq a 50% chance (rand() < 0.5) of mutating to something, but as the original base is in the available $mutants array it has a further 25% chance of changing to itself. If you wanted to improve it by excluding the original base for each mutation you might do something like:

epiphyte:~ rmp$ perl -e '$seq="ACTAGCTACGACTAGCATCGACT"; $mutants = [qw(A C T G)];
  $mutsize=scalar @{$mutants}; print "$seq\n";
  $seq =~ s{([ATGC])}{ rand() < 0.5 ? [grep { $_ ne $1 } @{$mutants}]->[int rand $mutsize-1] : $1 }smiexg;
  print "$seq\n";'
ACTAGCTACGACTAGCATCGACT
TGTAGATAATGTGATACGAGACT

This (quite inefficiently) builds an array of all available options from $mutants, excluding $1 the matched base at each position.

Unrolling it and tidying it up a little for readability looks like this:

my $seq     = 'ACTAGCTACGACTAGCATCGACT';
my $mutants = [qw(A C T G)];
my $mutsize = scalar @{$mutants};

print "$seq\n";

$seq =~ s{([ATGC])}{
   rand() < 0.5
    ?
   [grep { $_ ne $1 } @{$mutants}]->[int rand $mutsize-1]
    :
   $1
 }smiexg;

print "$seq\n";'

unique, overlapping kmer strings

Tinkering today I wrote a quick toy to generate strings of unique, overlapping kmers. Not particularly efficient, but possibly handy nonetheless.

It takes a given k size, a configurable overlap and optionally the bases to use. First it generates a list of all the kmers then it recursively scans for matching overlapping kmers and extends a seed, terminating the recursion and printing if all kmers have been used.

Run it like so:

 ./kmer-overlap -k=3 -overlap=2 ACTG

#!/usr/local/bin/perl
#########
# Author:        rmp
# Created:       2013-05-15
# Last Modified: $Date$
# Id:            $Id$
# HeadURL:       $HeadURL$
#
use strict;
use warnings;
use Getopt::Long;
use Readonly;
use English qw(-no_match_vars);

Readonly::Scalar our $DEFAULT_K => 3;
Readonly::Scalar our $DEFAULT_BASES => [qw(A C T G)];

my $opts = {};
GetOptions($opts, qw(k=s rand help));

if($opts->{help}) {
  print < <"EOT"; $PROGRAM_NAME - rmp 2013-05-15 Usage:  $PROGRAM_NAME -k=3 -overlap=2 -rand ACTG EOT   exit; } my $k       = $opts->{k}       || $DEFAULT_K;
my $overlap = $opts->{overlap} || $k-1;
my $bases   = $DEFAULT_BASES;

if(scalar @ARGV) {
  $bases = [grep { $_ } map {split //smx} @ARGV];
}

#########
# Build all available kmers
#
my $kmers = [];

for my $base1 (@{$bases}) {
  build($base1, $bases, $kmers);
}

#########
# optionally randomise the seeds
#
if($opts->{rand}) {
  shuffle($kmers);
}

#########
# start with a seed
#
for my $seed (@{$kmers}) {
  my $seen = {
	      $seed => 1,
	     };
  solve($seed, $seen);
}

sub build {
  my ($seq, $bases, $kmers) = @_;
  if(length $seq == $k) {
    #########
    # reached target k - store & terminate
    #
    push @{$kmers}, $seq;
    return 1;
  }

  for my $base (@{$bases}) {
    ########
    # extend and descend
    #
    build("$seq$base", $bases, $kmers);
  }

  return;
}

sub solve {
  my ($seq_in, $seen) = @_;

  if(scalar keys %{$seen} == scalar @{$kmers}) {
    #########
    # exhausted all kmers - completed!
    #
    print $seq_in, "\n";
    return 1;
  }

  my $seq_tail     = substr $seq_in, -$overlap, $overlap;

  my $overlapping  = [grep { $_ =~ /^$seq_tail/smx } # filter in only seqs which overlap the seed tail
		      grep { !$seen->{$_} }          # filter out kmers we've seen already
		      @{$kmers}];
  if(!scalar @{$overlapping}) {
    #########
    # no available overlapping kmers - terminate!
    #
    return;
  }

  if($opts->{rand}) {
    shuffle($overlapping);
  }

  my $overhang = $k-$overlap;
  for my $overlap_seq (@{$overlapping}) {
    #########
    # extend and descend
    #
    my $seq_out = $seq_in . substr $overlap_seq, -$overhang, $overhang;
    solve($seq_out, {%{$seen}, $overlap_seq => 1});
  }

  return;
}

sub shuffle {
  my ($arr_in) = @_;
  for my $i (0..scalar @{$arr_in}-1) {
    my $j = int rand $i;
    ($arr_in->[$i], $arr_in->[$j]) = ($arr_in->[$j], $arr_in->[$i]);
  }
}

Output looks like this:

epiphyte:~ rmp$ ./kmer-overlap -k=2 AC
AACCA
ACCAA
CAACC
CCAAC

restart a script when a new version is deployed

I have a lot of scripts running in a lot of places, doing various little jobs, mostly shuffling data files around and feeding them into pipelines and suchlike. I also use Jenkins CI to automatically run my tests and build deb packages for Debian/Ubuntu Linux. Unfortunately, being a lazy programmer I haven’t read up about all the great things deb and apt can do so I don’t know how to fire shell commands like “service x reload” or “/etc/init.d/x restart” once a package has been deployed. Kicking a script to pick up changes is quite a common thing to do.

Instead I have a little trick that makes use of the build process changing timestamps on files when it rolls up the package. So when the script wakes up, and starts the next iteration of its event loop, the first thing it does is check the timestamp of itself and if it’s different from the last iteration it executes itself, replacing the running process with a fresh one.

One added gotcha is that if you want to run in taint mode you need to satisfy a bunch of extra requirements such as detainting $ENV{PATH} and all commandline arguments before any re-execing occurs.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author: rpettett
# Last Modified: $Date$
# Id: $Id$
# $HeadURL$
#
use strict;
use warnings;
use Readonly;
use Carp;
use English qw(-no_match_vars);
our $VERSION = q[1.0];

Readonly::Scalar our $SLEEP_LONG  => 600;
Readonly::Scalar our $SLEEP_SHORT => 30;

$OUTPUT_AUTOFLUSH++;

my @original_argv = @ARGV;

#########

# handle SIGHUP restarts
#
local $SIG{HUP} = sub {
  carp q[caught SIGHUP];
  exec $PROGRAM_NAME, @original_argv;
};

my $last_modtime;

while(1) {
  #########
  # handle software-deployment restarts
  #
  my $modtime = -M $PROGRAM_NAME;

  if($last_modtime && $last_modtime ne $modtime) {
    carp q[re-execing];
    exec $PROGRAM_NAME, @original_argv;
  }
  $last_modtime = $modtime;

  my $did_work_flag;
  eval {
    $did_work_flag = do_stuff();
    1;
  } or do {
    $did_work_flag = 0;
  };

  local $SIG{ALRM} = sub {
    carp q[rudely awoken by SIGALRM];
  };

  my $sleep = $did_work_flag ? $SLEEP_SHORT : $SLEEP_LONG;
  carp qq[sleeping for $sleep];
  sleep $sleep;
}

Middleware and Monorails

Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction.

It’s two hours in to the three hour e-commerce process mapping meeting with three sets of consultants in the heart and heat of London with the traffic noise, police sirens and vaguely pleasant Hare Krishna chanting wafting in through the open windows of the meeting room. I’m minding my own business quietly in the corner, designing pipelines, data flows and object models, coding prolifically and generally doing much more useful things in the safe confines of my head when it happens… What’s your middleware? Hmm? What? Was that directed at me? Of course it was. Nuts.

Middleware, middleware, I’d better think fast. Not a term I use but I’m sure I can remember what it actually means. It must sit between things… Oh yes, it’s coming back to me now. Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction. An API for APIs. A tool for fooling developers into thinking that they’re not tightly coupling their applications together when instead they’re tightly coupling to a third-party system they have even less control over because they’re incapable of agreeing direct service communication specs with that other application. “It’s ok, it’s standards-based”. Sure it is. Whatever you say… Middleware is the consultants’ friend though – a clever sounding service that does the integration for you but generally requires little direct development and provides easy resale of the same work to multiple clients.

I’ve been developing large-scale, high traffic websites for a few years now and I’ve never had need for a component that specifically markets itself as middleware. Not once. I’ve used plenty of APIs, web services, object brokers, message queues, key-value stores and any other number of components with easily identifiable purposes but I think using middleware for middleware’s sake is still just a little too meta for me. Yessir! It’s a genuine, bona fide, API’d middleware. I see it absolutely as a Springfield Monorail application, but those Shelbyville folk are so much smarter, maybe I should be more like them.

What’s my middleware? None, I don’t have one. I don’t want one, I’m pretty sure I don’t need one. Cue surprised looks, amusement and disbelief.

Ok, twist my arm, I suppose I quite like Zapier. (Ahh, ok, he’s one of us after all)

Update: I tried to find a pretty picture to augment the post with but I couldn’t find anything on Google Images or Flickr that didn’t make me want to punch the screen.

Charts from Tables with D3js and jQuery

I’ve been tinkering with D3js on and off for a couple of months now, purely for generating simple, inline charts in web pages, made from data already dumped into HTML tables. Doing this is easier than building, caching and referencing external bitmap (PNG, GIF or whatever) images with Gnuplot or GD::Graph and also simpler than building bitmap images and serving them base64 encoded inline with <img alt=”” src=”data:…” />.

Using jQuery (or similar) to extract data from an already-present HTML table means there’s almost no code required whenever you want to add and plot a new column that someone might want to report on. Pushing all the work to the client should also mean slightly lighter server loads, though granted it’s already done the heavy lifting during the query to generate the table.

I’ve used examples from a number of sources, mostly from over on the d3js.org website itself and Mike Bostock’s inspiring example gallery. Plus the ever useful jQuery and jQueryUI libraries.

The result is a tabbed (with a jqueryui-themed unordered list) report based on a data table below. Clicking on either a tab or a table heading (all except the date) will animate and redraw the chart above. The data are collected using a jQuery selector on column classes in each.

Feel free to take and reuse it – just pinch the frame source.