Bookmarks for March 8th through April 30th

These are my links for March 8th through April 30th:

Middleware and Monorails

Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction.

It’s two hours in to the three hour e-commerce process mapping meeting with three sets of consultants in the heart and heat of London with the traffic noise, police sirens and vaguely pleasant Hare Krishna chanting wafting in through the open windows of the meeting room. I’m minding my own business quietly in the corner, designing pipelines, data flows and object models, coding prolifically and generally doing much more useful things in the safe confines of my head when it happens… What’s your middleware? Hmm? What? Was that directed at me? Of course it was. Nuts.

Middleware, middleware, I’d better think fast. Not a term I use but I’m sure I can remember what it actually means. It must sit between things… Oh yes, it’s coming back to me now. Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction. An API for APIs. A tool for fooling developers into thinking that they’re not tightly coupling their applications together when instead they’re tightly coupling to a third-party system they have even less control over because they’re incapable of agreeing direct service communication specs with that other application. “It’s ok, it’s standards-based”. Sure it is. Whatever you say… Middleware is the consultants’ friend though – a clever sounding service that does the integration for you but generally requires little direct development and provides easy resale of the same work to multiple clients.

I’ve been developing large-scale, high traffic websites for a few years now and I’ve never had need for a component that specifically markets itself as middleware. Not once. I’ve used plenty of APIs, web services, object brokers, message queues, key-value stores and any other number of components with easily identifiable purposes but I think using middleware for middleware’s sake is still just a little too meta for me. Yessir! It’s a genuine, bona fide, API’d middleware. I see it absolutely as a Springfield Monorail application, but those Shelbyville folk are so much smarter, maybe I should be more like them.

What’s my middleware? None, I don’t have one. I don’t want one, I’m pretty sure I don’t need one. Cue surprised looks, amusement and disbelief.

Ok, twist my arm, I suppose I quite like Zapier. (Ahh, ok, he’s one of us after all)

Update: I tried to find a pretty picture to augment the post with but I couldn’t find anything on Google Images or Flickr that didn’t make me want to punch the screen.

Systems & Security Tools du jour

I’ve been to two events in the past two weeks which have started me thinking harder about the way we protect and measure our enterprise systems.

The first of the two events was the fourth Splunk Live in St. Paul’s, London last week. I’ve been a big fan of Splunk for a few years but I’ve never really tried it out in production. The second was InfoSec at Earl’s Court. More about that one later.

What is Splunk?

To be honest, splunk is different things to different people. Since inception it’s had great value as a log collation and event alerting tool for systems administrators as that was what it was originally designed to do. However as both DJ Skillman and Godfrey Sullivan pointed out, Splunk has grown into a lot more than that. It solved a lot of “Big Data” (how I hate that phrase) problems before Big Data was trendy, taking arbitrary unstructured data sources structuring them in useful ways, indexing the hell out of them and adding friendly, near-real-time reporting and alerting on top. Nowadays, given the right data sources, Splunk is capable of providing across-the-board Operational Intelligence, yielding tremendous opportunities in measuring value of processes and events.

How does it work?

In order to make the most out of a Splunk installation you require at least three basic things :-

  1. A data source - anything from a basic syslog or Apache web server log to a live high level ERP logistics event feed or even entire code commits
  2. An enrichment process – something to tag packets, essentially to assign value to indexed fields, allowing the association of fields from different feeds, e.g. tallying new orders with a customer database with stock keeping perhaps.
  3. A report – a canned report, presented on a dashboard for your CFO for example, or an email alert to tell your IT manager that someone squirting 5 day experiments in at the head of the analysis pipeline is going to go over-budget on your AWS analysis pipeline in three days’ time.

How far can you go with it?

Well, here’s a few of the pick ‘n’ mix selection of things I’d like to start indexing as soon as we sort out a) the restricted data limits of our so-far-free Splunk installation and b) what’s legal to do

  • Door id access (physical site presence)
  • VPN logins (virtual site presence)
  • Wifi device registrations (guest, internal, whatever)
  • VoIP + PSTN call logs (number, duration)
  • Environmentals – temperature, humidity of labs, offices, server rooms
  • System logs for everything (syslog, authentication, Apache, FTPd, MySQL connections, Samba, the works)
  • SGE job logs with user & project accounting
  • Application logs for anything we’ve written in house
  • Experimental metadata (who ran what when, where, why)
  • Domains for all incoming + outgoing mail, plus mail/attachment weights (useful for spotting outliers exfiltrating data)
  • Firewall: accepted incoming connections
  • Continuous Integration test results (software project, timings, memory, cpu footprints)
  • SVN/Git code commits (yes, it’s possible to log the entire change set)
  • JIRA tickets (who, what, when, project, component, priority)
  • ERP logs (supply chain, logistics, stock control, manufacturing lead times)
  • CRM + online store logs (customer info, helpdesk cases, orders)
  • anything and everything else with vaguely any business value

I think it’s pretty obvious that all this stuff taken together constitutes what most people call Big Data these days. There’s quite a distinction between that sort of mixed relational data and the plainer “lots of data” I deal with day to day, experimental data in the order of a terabyte-plus per device per day.

Charts from Tables with D3js and jQuery

I’ve been tinkering with D3js on and off for a couple of months now, purely for generating simple, inline charts in web pages, made from data already dumped into HTML tables. Doing this is easier than building, caching and referencing external bitmap (PNG, GIF or whatever) images with Gnuplot or GD::Graph and also simpler than building bitmap images and serving them base64 encoded inline with <img alt=”” src=”data:…” />.

Using jQuery (or similar) to extract data from an already-present HTML table means there’s almost no code required whenever you want to add and plot a new column that someone might want to report on. Pushing all the work to the client should also mean slightly lighter server loads, though granted it’s already done the heavy lifting during the query to generate the table.

I’ve used examples from a number of sources, mostly from over on the d3js.org website itself and Mike Bostock’s inspiring example gallery. Plus the ever useful jQuery and jQueryUI libraries.

The result is a tabbed (with a jqueryui-themed unordered list) report based on a data table below. Clicking on either a tab or a table heading (all except the date) will animate and redraw the chart above. The data are collected using a jQuery selector on column classes in each.

Feel free to take and reuse it – just pinch the frame source.

Bookmarks for December 12th through February 18th

These are my links for December 12th through February 18th:

Using the iPod Nano 6th gen with Ubuntu

440x330-ipod-nano6gen-frontToday I spent 3 hours wrestling with a secondhand ipod Nano, 6th gen (the “6” is the killer) for a friend, trying to make it work happily with Ubuntu.

Having never actually owned an iPod myself, only iPhone and iPad, it was a vaguely educational experience too. I found nearly no useful information on dozens of fora – all of them only reporting either “it works” without checking the generation, or “it doesn’t work” with no resolution, or “it should work” with no evidence. Yay Linux!

There were two issues to address – firstly making the iPod block storage device visible to Linux and secondly finding something to manage the unconventional media database on the iPod itself.

It turned out that most iPods, certainly early generations, work well with Linux but this one happened not to. Most iPods are supported via libgpod, whether you’re using Banshee, Rhythmbox, even Amarok (I think) and others. I had no luck with Rhythmbox, Banshee, gtkpod, or simple block storage access for synchronising music.

It also turns out that Spotify one of my other favourite music players doesn’t use libgpod, which looked very promising.

So the procedure I used to get this one to work went something like this:

  1. Restore and/or initialise the iPod using the standard procedure with iTunes (I used iTunes v10 and latest iPod firmware 1.2) on a Windows PC. Do not use iTunes on OSX. Using OSX results in the iPod being formatted using a not-well-supported filesystem (hfsplus with journalling). Using Windows results in a FAT filesystem (mounted as vfat under Linux).Having said that, I did have some success making the OSX-initialised device visible to Linux but it required editing fstab and adding:
    /dev/sdb2 /media/ipod hfsplus user,rw,noauto,force 0 0

    which is pretty stinky. FAT-based filesystems have been well supported for a long time – best to stick with that. Rhythmbox, the player I was trying at the time, also didn’t support the new media database. It appeared to copy files on but failed every time, complaining about unsupported/invalid database checksums. According to various fora the hashes need reverse engineering.

  2. Install the Ubuntu Spotify Preview using the Ubuntu deb (not the Wine version). I used the instructions here.
  3. I have a free Spotify account, which I’ve had for ages and might not be possible to make any more. I was worried that not having a premium or unlimited account wouldn’t let me use the iPod sync, but in the end it worked fine. The iPod was seen and available in Spotify straight away and allowed synchronisation of specific playlists or all “Local Files”. In the end as long as Spotify was running and the iPod connected, I could just copy files directly into my ~/Music/ folder and Spotify would sync it onto the iPod immediately.

Superb, job done! (I didn’t try syncing any pictures)

 

Bookmarks for November 27th through December 7th

These are my links for November 27th through December 7th:

Bookmarks for November 6th through November 26th

These are my links for November 6th through November 26th:

Bookmarks for June 1st through November 3rd

These are my links for June 1st through November 3rd:

Conway’s Game of Life in Perl

I wanted a quick implementation of Conway’s Game of Life this evening to muck about with, with the boys. Whipped this up in simple Perl for running in a terminal / on the commandline. It’s not the fastest implementation on the planet but that’s most likely just the way I’ve coded it.

Throw some 1s and 0s in the DATA block at the end to modify the start state, change the $WIDTH and $HEIGHT of the area, or uncomment the random data line in init() and see what you see.

#!/usr/local/bin/perl
# -*- mode: cperl; tab-width: 8; indent-tabs-mode: nil; basic-offset: 2 -*-
# vim:ts=8:sw=2:et:sta:sts=2
#########
# Author:        rmp
# Created:       2012-10-12
# Last Modified: $Date: 2012-10-12 19:09:00 +0100 (Fri, 12 Oct 2012) $
# Id:            $Id$
# $HeadURL$
#
use strict;
use warnings;
use Time::HiRes qw(sleep);
use Readonly;
use English qw(-no_match_vars);
use Carp;

Readonly::Scalar my $WIDTH      => 78;
Readonly::Scalar my $HEIGHT     => 21;
Readonly::Scalar my $TURN_DELAY => 0.1;

our $VERSION = '0.01';

my $grid  = init();
my $turns = 0;
while(1) {
  render($grid);
  $grid = turn($grid);
  sleep $TURN_DELAY;
  $turns++;
}

sub init {
  #########
  # initialise with a manual input from the DATA block below
  #
  local $RS = undef;
  my $data  = <data>;
  my $out   = [
	       map { [split //smx, $_] }
	       map { split /\n/smx, $_ }
	       $data
	      ];

  #########
  # fill the matrix with space
  #
  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      $out->[$y]->[$x] ||= 0;
#      $out->[$y]->[$x] = rand >= 0.2 ? 0 : 1; # initialise with some random data
    }
  }
  return $out;
}

#########
# draw to stdout/screen
#
sub render {
  my ($in) = @_;
  system $OSNAME eq 'MSWin32' ? 'cls' : 'clear';

  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];
  for my $y (@{$in}) {
    print q[|] or croak qq[Error printing: $ERRNO];
    print map { $_ ? q[O] : q[ ] } @{$y} or croak qq[Error printing: $ERRNO];
    print "|\r\n" or croak qq[Error printing: $ERRNO];
  }
  print q[+], q[-]x$WIDTH, "+\n" or croak qq[Error printing: $ERRNO];

  return 1;
}

#########
# the fundamental Game of Life rules
#
sub turn {
  my ($in) = @_;
  my $out  = [];

  for my $y (0..$HEIGHT-1) {
    for my $x (0..$WIDTH-1) {
      my $topedge    = $y-1;
      my $bottomedge = $y+1;
      my $leftedge   = $x-1;
      my $rightedge  = $x+1;

      my $checks = [
		    grep { $_->[0] >= 0 && $_->[0] < $HEIGHT } # Y boundary checking
		    grep { $_->[1] >= 0 && $_->[1] < $WIDTH }  # X boundary checking
		    [$topedge,    $leftedge],
		    [$topedge,    $x],
		    [$topedge,    $rightedge],
		    [$y,          $leftedge],
		    [$y,          $rightedge],
		    [$bottomedge, $leftedge],
		    [$bottomedge, $x],
		    [$bottomedge, $rightedge],
		   ];

      my $alive = scalar
	          grep { $_ }
	          map { $in->[$_->[0]]->[$_->[1]] }
		  @{$checks};

      $out->[$y]->[$x] = (($in->[$y]->[$x] && $alive == 2) ||
			  $alive == 3);
    }
  }
  return $out;
}

__DATA__
0
0
0
0
0
0
0000000010
0000000111
0000000101
0000000111
0000000010
</data>

p.s. WordPress is merrily swapping “DATA” for “data” on line 38 and adding an extra /data tag at the end of that code snippet. Fix line 38 and don’t copy & paste the close tag. Damn I hate WordPress :(