Systems & Security Tools du jour

I’ve been to two events in the past two weeks which have started me thinking harder about the way we protect and measure our enterprise systems.

The first of the two events was the fourth Splunk Live in St. Paul’s, London last week. I’ve been a big fan of Splunk for a few years but I’ve never really tried it out in production. The second was InfoSec at Earl’s Court. More about that one later.

What is Splunk?

To be honest, splunk is different things to different people. Since inception it’s had great value as a log collation and event alerting tool for systems administrators as that was what it was originally designed to do. However as both DJ Skillman and Godfrey Sullivan pointed out, Splunk has grown into a lot more than that. It solved a lot of “Big Data” (how I hate that phrase) problems before Big Data was trendy, taking arbitrary unstructured data sources structuring them in useful ways, indexing the hell out of them and adding friendly, near-real-time reporting and alerting on top. Nowadays, given the right data sources, Splunk is capable of providing across-the-board Operational Intelligence, yielding tremendous opportunities in measuring value of processes and events.

How does it work?

In order to make the most out of a Splunk installation you require at least three basic things :-

  1. A data source - anything from a basic syslog or Apache web server log to a live high level ERP logistics event feed or even entire code commits
  2. An enrichment process – something to tag packets, essentially to assign value to indexed fields, allowing the association of fields from different feeds, e.g. tallying new orders with a customer database with stock keeping perhaps.
  3. A report – a canned report, presented on a dashboard for your CFO for example, or an email alert to tell your IT manager that someone squirting 5 day experiments in at the head of the analysis pipeline is going to go over-budget on your AWS analysis pipeline in three days’ time.

How far can you go with it?

Well, here’s a few of the pick ‘n’ mix selection of things I’d like to start indexing as soon as we sort out a) the restricted data limits of our so-far-free Splunk installation and b) what’s legal to do

  • Door id access (physical site presence)
  • VPN logins (virtual site presence)
  • Wifi device registrations (guest, internal, whatever)
  • VoIP + PSTN call logs (number, duration)
  • Environmentals – temperature, humidity of labs, offices, server rooms
  • System logs for everything (syslog, authentication, Apache, FTPd, MySQL connections, Samba, the works)
  • SGE job logs with user & project accounting
  • Application logs for anything we’ve written in house
  • Experimental metadata (who ran what when, where, why)
  • Domains for all incoming + outgoing mail, plus mail/attachment weights (useful for spotting outliers exfiltrating data)
  • Firewall: accepted incoming connections
  • Continuous Integration test results (software project, timings, memory, cpu footprints)
  • SVN/Git code commits (yes, it’s possible to log the entire change set)
  • JIRA tickets (who, what, when, project, component, priority)
  • ERP logs (supply chain, logistics, stock control, manufacturing lead times)
  • CRM + online store logs (customer info, helpdesk cases, orders)
  • anything and everything else with vaguely any business value

I think it’s pretty obvious that all this stuff taken together constitutes what most people call Big Data these days. There’s quite a distinction between that sort of mixed relational data and the plainer “lots of data” I deal with day to day, experimental data in the order of a terabyte-plus per device per day.

Charts from Tables with D3js and jQuery

I’ve been tinkering with D3js on and off for a couple of months now, purely for generating simple, inline charts in web pages, made from data already dumped into HTML tables. Doing this is easier than building, caching and referencing external bitmap (PNG, GIF or whatever) images with Gnuplot or GD::Graph and also simpler than building bitmap images and serving them base64 encoded inline with <img alt=”” src=”data:…” />.

Using jQuery (or similar) to extract data from an already-present HTML table means there’s almost no code required whenever you want to add and plot a new column that someone might want to report on. Pushing all the work to the client should also mean slightly lighter server loads, though granted it’s already done the heavy lifting during the query to generate the table.

I’ve used examples from a number of sources, mostly from over on the website itself and Mike Bostock’s inspiring example gallery. Plus the ever useful jQuery and jQueryUI libraries.

The result is a tabbed (with a jqueryui-themed unordered list) report based on a data table below. Clicking on either a tab or a table heading (all except the date) will animate and redraw the chart above. The data are collected using a jQuery selector on column classes in each.

Feel free to take and reuse it – just pinch the frame source.

Using the iPod Nano 6th gen with Ubuntu

440x330-ipod-nano6gen-frontToday I spent 3 hours wrestling with a secondhand ipod Nano, 6th gen (the “6” is the killer) for a friend, trying to make it work happily with Ubuntu.

Having never actually owned an iPod myself, only iPhone and iPad, it was a vaguely educational experience too. I found nearly no useful information on dozens of fora – all of them only reporting either “it works” without checking the generation, or “it doesn’t work” with no resolution, or “it should work” with no evidence. Yay Linux!

There were two issues to address – firstly making the iPod block storage device visible to Linux and secondly finding something to manage the unconventional media database on the iPod itself.

It turned out that most iPods, certainly early generations, work well with Linux but this one happened not to. Most iPods are supported via libgpod, whether you’re using Banshee, Rhythmbox, even Amarok (I think) and others. I had no luck with Rhythmbox, Banshee, gtkpod, or simple block storage access for synchronising music.

It also turns out that Spotify one of my other favourite music players doesn’t use libgpod, which looked very promising.

So the procedure I used to get this one to work went something like this:

  1. Restore and/or initialise the iPod using the standard procedure with iTunes (I used iTunes v10 and latest iPod firmware 1.2) on a Windows PC. Do not use iTunes on OSX. Using OSX results in the iPod being formatted using a not-well-supported filesystem (hfsplus with journalling). Using Windows results in a FAT filesystem (mounted as vfat under Linux).Having said that, I did have some success making the OSX-initialised device visible to Linux but it required editing fstab and adding:
    /dev/sdb2 /media/ipod hfsplus user,rw,noauto,force 0 0

    which is pretty stinky. FAT-based filesystems have been well supported for a long time – best to stick with that. Rhythmbox, the player I was trying at the time, also didn’t support the new media database. It appeared to copy files on but failed every time, complaining about unsupported/invalid database checksums. According to various fora the hashes need reverse engineering.

  2. Install the Ubuntu Spotify Preview using the Ubuntu deb (not the Wine version). I used the instructions here.
  3. I have a free Spotify account, which I’ve had for ages and might not be possible to make any more. I was worried that not having a premium or unlimited account wouldn’t let me use the iPod sync, but in the end it worked fine. The iPod was seen and available in Spotify straight away and allowed synchronisation of specific playlists or all “Local Files”. In the end as long as Spotify was running and the iPod connected, I could just copy files directly into my ~/Music/ folder and Spotify would sync it onto the iPod immediately.

Superb, job done! (I didn’t try syncing any pictures)


Conway’s Game of Life in Perl

I wanted a quick implementation of Conway’s Game of Life this evening to muck about with, with the boys. Whipped this up in simple Perl for running in a terminal / on the commandline. It’s not the fastest implementation on the planet but that’s most likely just the way I’ve coded it.

Throw some 1s and 0s in the DATA block at the end to modify the start state, change the $WIDTH and $HEIGHT of the area, or uncomment the random data line in init() and see what you see.

# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
# Author:        rmp
# Created:       2012-10-12
# Last Modified: $Date: 2012-10-12 19:09:00 +0100 (Fri, 12 Oct 2012) $
# Id:            $Id$
# $HeadURL$
use strict;
use warnings;
use Time::HiRes qw(sleep);
use Readonly;
use English qw(-no_match_vars);
use Carp;

Readonly::Scalar my $WIDTH      => 78;
Readonly::Scalar my $HEIGHT     => 21;
Readonly::Scalar my $TURN_DELAY => 0.1;

our $VERSION = '0.01';

my $grid  = init();
my $turns = 0;
while(1) {
  $grid = turn($grid);
  sleep $TURN_DELAY;

sub init {
  # initialise with a manual input from the DATA block below
  local $RS = undef;
  my $data  = <data>;
  my $out   = [
	       map { [split //smx, $_] }
	       map { split /\n/smx, $_ }

  # fill the matrix with space
  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      $out->[$y]->[$x] ||= 0;
#      $out->[$y]->[$x] = rand >= 0.2 ? 0 : 1; # initialise with some random data
  return $out;

# draw to stdout/screen
sub render {
  my ($in) = @_;
  system $OSNAME eq 'MSWin32' ? 'cls' : 'clear';

  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];
  for my $y (@{$in}) {
    print q[|] or croak qq[Error printing: $ERRNO];
    print map { $_ ? q[O] : q[ ] } @{$y} or croak qq[Error printing: $ERRNO];
    print "|\r\n" or croak qq[Error printing: $ERRNO];
  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];

  return 1;

# the fundamental Game of Life rules
sub turn {
  my ($in) = @_;
  my $out  = [];

  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      my $topedge    = $y-1;
      my $bottomedge = $y+1;
      my $leftedge   = $x-1;
      my $rightedge  = $x+1;

      my $checks = [
		    grep { $_->[0] >= 0 && $_->[0] < $HEIGHT } # Y boundary checking
		    grep { $_->[1] >= 0 && $_->[1] < $WIDTH }  # X boundary checking
		    [$topedge,    $leftedge],
		    [$topedge,    $x],
		    [$topedge,    $rightedge],
		    [$y,          $leftedge],
		    [$y,          $rightedge],
		    [$bottomedge, $leftedge],
		    [$bottomedge, $x],
		    [$bottomedge, $rightedge],

      my $alive = scalar
	          grep { $_ }
	          map { $in->[$_->[0]]->[$_->[1]] }

      $out->[$y]->[$x] = (($in->[$y]->[$x] && $alive == 2) ||
			  $alive == 3);
  return $out;


p.s. WordPress is merrily swapping “DATA” for “data” on line 38 and adding an extra /data tag at the end of that code snippet. Fix line 38 and don’t copy & paste the close tag. Damn I hate WordPress :(

Another use for Selenium IDE

A dear friend of mine recently needed to recover all email from his mailbox. Normally this wouldn’t be a problem, there are plenty of options in any sane mail application – export or archive mailbox, select-all messages and “Send Again”/Redirect/Bounce to another address or at the very worst, select-all and forward. Most of these options are available with desktop mail applications – Pine, Squirrelmail, IMP, Outlook, Outlook Express, Windows Mail,, Thunderbird, Eudora and I’m sure loads of others.

Unfortunately the only access provided was through Microsoft’s Outlook Web Access (2007). This, whilst being fairly pretty in Lite (non-Internet Explorer browsers) mode and prettier/heavier in MSIE, does not have any useful bulk forwarding or export functionality at all. None. Not desperately handy, to be sure.

Ok, so my first port of call was to connect my which supports Exchange OWA access. No dice – spinning, hanging, no data. Hmm – odd. Ok, second I tried fetchExc a Java commandline tool which promised everything I needed but in the end delivered pretty obtuse error messages. After an hour’s fiddling I gave up with fetchExc and tried falling back to Perl with Email::Folder::Exchange. This had very similar results to fetchExc but a slightly different set of errors.

After much swearing and a lot more poking, probing and requesting of tips from other friends (thanks Ze) the OWA service was also found to be sitting behind Microsoft’s Internet Security and Acceleration server. This isn’t a product I’ve used before but I can only assume it’s an expensive reverse proxy, the sort of thing I’d compare to inexpensive Apache + mod_proxy + mod_security on a good day. This ISA service happened to block all remote SOAP (2000/2003) and WebDAV (2007/2010) access too. Great! No remote service access whatsoever.

Brute force to the rescue. I could, of course go in and manually forward each and every last mail, but that’s quite tedious and a huge amount of clicking and pasting in the same email address. Enter Selenium IDE.

Selenium is a suite of tools for remote controlling browsers, primarily for writing tests for interactive applications. I use it in my day to day work mostly for checking bits of dynamic javascript, DHTML, forms etc. are doing the right things when clicked/pressed/dragged and generally interacted with. OWA is just a (really badly written) webpage to interact with, after all.

I downloaded the excellent sideflow.js plugin which provides loop functionality not usually required for web app testing and after a bit of DOM inspection on the OWA pages I came up with the following plan –

  • click the subject link
  • click the “forward” button
  • enter the recipient address
  • click the send button
  • select the checkbox
  • press the “delete” button
  • repeat 500 times

The macro looked something like this:

<table cellpadding="1" cellspacing="1" border="1">
<tr><td rowspan="1" colspan="3">owa-selenium-macro</td></tr>

So I logged in, opened each folder in turn and replayed the macro in Selenium IDE as many times as I needed to. Bingo! Super kludgy but it worked well, was entertaining to watch and ultimately did the job.

