psyphi.net blog - Another collection of braingunk and technolint

restart a script when a new version is deployed

I have a lot of scripts running in a lot of places, doing various little jobs, mostly shuffling data files around and feeding them into pipelines and suchlike. I also use Jenkins CI to automatically run my tests and build deb packages for Debian/Ubuntu Linux. Unfortunately, being a lazy programmer I haven’t read up about all the great things deb and apt can do so I don’t know how to fire shell commands like “service x reload” or “/etc/init.d/x restart” once a package has been deployed. Kicking a script to pick up changes is quite a common thing to do.

Instead I have a little trick that makes use of the build process changing timestamps on files when it rolls up the package. So when the script wakes up, and starts the next iteration of its event loop, the first thing it does is check the timestamp of itself and if it’s different from the last iteration it executes itself, replacing the running process with a fresh one.

One added gotcha is that if you want to run in taint mode you need to satisfy a bunch of extra requirements such as detainting $ENV{PATH} and all commandline arguments before any re-execing occurs.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author: rpettett
# Last Modified: $Date$
# Id: $Id$
# $HeadURL$
#
use strict;
use warnings;
use Readonly;
use Carp;
use English qw(-no_match_vars);
our $VERSION = q[1.0];

Readonly::Scalar our $SLEEP_LONG  => 600;
Readonly::Scalar our $SLEEP_SHORT => 30;

$OUTPUT_AUTOFLUSH++;

my @original_argv = @ARGV;

#########

# handle SIGHUP restarts
#
local $SIG{HUP} = sub {
  carp q[caught SIGHUP];
  exec $PROGRAM_NAME, @original_argv;
};

my $last_modtime;

while(1) {
  #########
  # handle software-deployment restarts
  #
  my $modtime = -M $PROGRAM_NAME;

  if($last_modtime && $last_modtime ne $modtime) {
    carp q[re-execing];
    exec $PROGRAM_NAME, @original_argv;
  }
  $last_modtime = $modtime;

  my $did_work_flag;
  eval {
    $did_work_flag = do_stuff();
    1;
  } or do {
    $did_work_flag = 0;
  };

  local $SIG{ALRM} = sub {
    carp q[rudely awoken by SIGALRM];
  };

  my $sleep = $did_work_flag ? $SLEEP_SHORT : $SLEEP_LONG;
  carp qq[sleeping for $sleep];
  sleep $sleep;
}

Bookmarks for March 8th through April 30th

These are my links for March 8th through April 30th:

NVD3.js :: re-usable charts for d3.js –
Inception | Break & Enter – defeat screenlock, disk encryption for e.g. powered-on & screenlocked devices with physical access
Asterisk for Raspberry Pi –
Storm, distributed and fault-tolerant realtime computation –
Bitcoin Mining Hardware – ASIC Bitcoin Miner – Butterfly Labs –

Middleware and Monorails

Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction.

It’s two hours in to the three hour e-commerce process mapping meeting with three sets of consultants in the heart and heat of London with the traffic noise, police sirens and vaguely pleasant Hare Krishna chanting wafting in through the open windows of the meeting room. I’m minding my own business quietly in the corner, designing pipelines, data flows and object models, coding prolifically and generally doing much more useful things in the safe confines of my head when it happens… What’s your middleware? Hmm? What? Was that directed at me? Of course it was. Nuts.

Middleware, middleware, I’d better think fast. Not a term I use but I’m sure I can remember what it actually means. It must sit between things… Oh yes, it’s coming back to me now. Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction. An API for APIs. A tool for fooling developers into thinking that they’re not tightly coupling their applications together when instead they’re tightly coupling to a third-party system they have even less control over because they’re incapable of agreeing direct service communication specs with that other application. “It’s ok, it’s standards-based”. Sure it is. Whatever you say… Middleware is the consultants’ friend though – a clever sounding service that does the integration for you but generally requires little direct development and provides easy resale of the same work to multiple clients.

I’ve been developing large-scale, high traffic websites for a few years now and I’ve never had need for a component that specifically markets itself as middleware. Not once. I’ve used plenty of APIs, web services, object brokers, message queues, key-value stores and any other number of components with easily identifiable purposes but I think using middleware for middleware’s sake is still just a little too meta for me. Yessir! It’s a genuine, bona fide, API’d middleware. I see it absolutely as a Springfield Monorail application, but those Shelbyville folk are so much smarter, maybe I should be more like them.

What’s my middleware? None, I don’t have one. I don’t want one, I’m pretty sure I don’t need one. Cue surprised looks, amusement and disbelief.

Ok, twist my arm, I suppose I quite like Zapier. (Ahh, ok, he’s one of us after all)

Update: I tried to find a pretty picture to augment the post with but I couldn’t find anything on Google Images or Flickr that didn’t make me want to punch the screen.

Systems & Security Tools du jour

I’ve been to two events in the past two weeks which have started me thinking harder about the way we protect and measure our enterprise systems.

The first of the two events was the fourth Splunk Live in St. Paul’s, London last week. I’ve been a big fan of Splunk for a few years but I’ve never really tried it out in production. The second was InfoSec at Earl’s Court. More about that one later.

What is Splunk?

To be honest, splunk is different things to different people. Since inception it’s had great value as a log collation and event alerting tool for systems administrators as that was what it was originally designed to do. However as both DJ Skillman and Godfrey Sullivan pointed out, Splunk has grown into a lot more than that. It solved a lot of “Big Data” (how I hate that phrase) problems before Big Data was trendy, taking arbitrary unstructured data sources structuring them in useful ways, indexing the hell out of them and adding friendly, near-real-time reporting and alerting on top. Nowadays, given the right data sources, Splunk is capable of providing across-the-board Operational Intelligence, yielding tremendous opportunities in measuring value of processes and events.

How does it work?

In order to make the most out of a Splunk installation you require at least three basic things :-

A data source -Â anything from a basic syslog or Apache web server log to a live high level ERP logistics event feed or even entire code commits
An enrichment process – something to tag packets, essentially to assign value to indexed fields, allowing the association of fields from different feeds, e.g. tallying new orders with a customer database with stock keeping perhaps.
A report – a canned report, presented on a dashboard for your CFO for example, or an email alert to tell your IT manager that someone squirting 5 day experiments in at the head of the analysis pipeline is going to go over-budget on your AWS analysis pipeline in three days’ time.

How far can you go with it?

Well, here’s a few of the pick ‘n’ mix selection of things I’d like to start indexing as soon as we sort out a) the restricted data limits of our so-far-free Splunk installation and b) what’s legal to do

Door id access (physical site presence)
VPN logins (virtual site presence)
Wifi device registrations (guest, internal, whatever)
VoIP + PSTN call logs (number, duration)
Environmentals – temperature, humidity of labs, offices, server rooms
System logs for everything (syslog, authentication, Apache, FTPd, MySQL connections, Samba, the works)
SGE job logs with user & project accounting
Application logs for anything we’ve written in house
Experimental metadata (who ran what when, where, why)
Domains for all incoming + outgoing mail, plus mail/attachment weights (useful for spotting outliers exfiltrating data)
Firewall: accepted incoming connections
Continuous Integration test results (software project, timings, memory, cpu footprints)
SVN/Git code commits (yes, it’s possible to log the entire change set)
JIRA tickets (who, what, when, project, component, priority)
ERP logs (supply chain, logistics, stock control, manufacturing lead times)
CRM + online store logs (customer info, helpdesk cases, orders)
anything and everything else with vaguely any business value

I think it’s pretty obvious that all this stuff taken together constitutes what most people call Big Data these days. There’s quite a distinction between that sort of mixed relational data and the plainer “lots of data” I deal with day to day, experimental data in the order of a terabyte-plus per device per day.

Charts from Tables with D3js and jQuery

I’ve been tinkering with D3js on and off for a couple of months now, purely for generating simple, inline charts in web pages, made from data already dumped into HTML tables. Doing this is easier than building, caching and referencing external bitmap (PNG, GIF or whatever) images with Gnuplot or GD::Graph and also simpler than building bitmap images and serving them base64 encoded inline with <img alt=”” src=”data:…” />.

Using jQuery (or similar) to extract data from an already-present HTML table means there’s almost no code required whenever you want to add and plot a new column that someone might want to report on. Pushing all the work to the client should also mean slightly lighter server loads, though granted it’s already done the heavy lifting during the query to generate the table.

I’ve used examples from a number of sources, mostly from over on the d3js.org website itself and Mike Bostock’s inspiring example gallery. Plus the ever useful jQuery and jQueryUI libraries.

The result is a tabbed (with a jqueryui-themed unordered list) report based on a data table below. Clicking on either a tab or a table heading (all except the date) will animate and redraw the chart above. The data are collected using a jQuery selector on column classes in each.

Feel free to take and reuse it – just pinch the frame source.

Bookmarks for December 12th through February 18th

These are my links for December 12th through February 18th:

Using the iPod Nano 6th gen with Ubuntu

Today I spent 3 hours wrestling with a secondhand ipod Nano, 6th gen (the “6” is the killer) for a friend, trying to make it work happily with Ubuntu.

Having never actually owned an iPod myself, only iPhone and iPad, it was a vaguely educational experience too. I found nearly no useful information on dozens of fora – all of them only reporting either “it works” without checking the generation, or “it doesn’t work” with no resolution, or “it should work” with no evidence. Yay Linux!

There were two issues to address – firstly making the iPod block storage device visible to Linux and secondly finding something to manage the unconventional media database on the iPod itself.

It turned out that most iPods, certainly early generations, work well with Linux but this one happened not to. Most iPods are supported via libgpod, whether you’re using Banshee, Rhythmbox, even Amarok (I think) and others. I had no luck with Rhythmbox, Banshee, gtkpod, or simple block storage access for synchronising music.

It also turns out that Spotify one of my other favourite music players doesn’t use libgpod, which looked very promising.

So the procedure I used to get this one to work went something like this:

Restore and/or initialise the iPod using the standard procedure with iTunes (I used iTunes v10 and latest iPod firmware 1.2) on a Windows PC. Do not use iTunes on OSX. Using OSX results in the iPod being formatted using a not-well-supported filesystem (hfsplus with journalling). Using Windows results in a FAT filesystem (mounted as vfat under Linux).Having said that, I did have some success making the OSX-initialised device visible to Linux but it required editing fstab and adding:
```
/dev/sdb2 /media/ipod hfsplus user,rw,noauto,force 0 0
```
which is pretty stinky. FAT-based filesystems have been well supported for a long time – best to stick with that. Rhythmbox, the player I was trying at the time, also didn’t support the new media database. It appeared to copy files on but failed every time, complaining about unsupported/invalid database checksums. According to various fora the hashes need reverse engineering.
Install the Ubuntu Spotify Preview using the Ubuntu deb (not the Wine version). I used the instructions here.
I have a freeÂ Spotify account, which I’ve had for ages and might not be possible to make any more. I was worried that not having a premium or unlimited account wouldn’t let me use the iPod sync, but in the end it worked fine. The iPod was seen and available in Spotify straight away and allowed synchronisation of specific playlists or all “Local Files”.Â In the end as long as Spotify was running and the iPod connected, I could just copy files directly into my ~/Music/ folder and Spotify would sync it onto the iPod immediately.

Superb, job done! (I didn’t try syncing any pictures)

Bookmarks for November 27th through December 7th

These are my links for November 27th through December 7th:

Home · kripken/emscripten Wiki – LLVM Javascript target. Yes, compile C to Javascript, using SDL calls for HTML Canvas work.
One of the most bonkers compiler transformations I've come across. Thanks @new299!
WebCL – Heterogeneous parallel computing in HTML5 web browsers –
Neal’s Yard Dairy :: Farm Cheeses From The British Isles –
RStudio – Shiny –
Send beautiful email newsletters with Campaign Monitor –

Bookmarks for November 6th through November 26th

These are my links for November 6th through November 26th:

The Hawksmoor Inheritance – Schools' crypto challenges
WNDW Downloads – Wireless Networking in the Developing World, eBook
GiantShark –
ROSALIND | Problems – codeeval for bioinformatics
s3backer – FUSE-based single file backing store via Amazon S3 – Google Project Hosting – S3-backed block devices

Bookmarks for June 1st through November 3rd

These are my links for June 1st through November 3rd:

Meraki –
Cambridge Wood Works – Recycling and Reusing Waste Wood In Cambridge UK –
web-sorrow – a versatile security scanner for the information disclosure and fingerprinting phases of pentesting. written in perl – Google Project Hosting –
Shades | Software | Charcoal Design –
Welcome – Fritzing –
PLOTS Spectral Workbench: index –
Cubify – Express Yourself in 3D – basic reprap with a bit more polish
LPMT – Little Projection-Mapping Tool | "Projection-Mapping for the masses" –
CircuitLab – online schematic editor & circuit simulator –
Nava Whiteford – SGenomics Ltd –
Scalable Flash Memory Array – Violin Memory Violin Memory –
All The Cheat Sheets That A Web Developer Needs | Top Design Magazine – Web Design and Digital Content – via @jitsukerr . Missing bits of perl and apache other than rewrite
hashcat – advanced password recovery –