Bookmarks for March 8th through April 30th

These are my links for March 8th through April 30th:

Middleware and Monorails

Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction.

It’s two hours in to the three hour e-commerce process mapping meeting with three sets of consultants in the heart and heat of London with the traffic noise, police sirens and vaguely pleasant Hare Krishna chanting wafting in through the open windows of the meeting room. I’m minding my own business quietly in the corner, designing pipelines, data flows and object models, coding prolifically and generally doing much more useful things in the safe confines of my head when it happens… What’s your middleware? Hmm? What? Was that directed at me? Of course it was. Nuts.

Middleware, middleware, I’d better think fast. Not a term I use but I’m sure I can remember what it actually means. It must sit between things… Oh yes, it’s coming back to me now. Middleware, the point-and-click programmer’s equivalent of Perl. Middleware, yet another layer of largely unnecessary abstraction. An API for APIs. A tool for fooling developers into thinking that they’re not tightly coupling their applications together when instead they’re tightly coupling to a third-party system they have even less control over because they’re incapable of agreeing direct service communication specs with that other application. “It’s ok, it’s standards-based”. Sure it is. Whatever you say… Middleware is the consultants’ friend though – a clever sounding service that does the integration for you but generally requires little direct development and provides easy resale of the same work to multiple clients.

I’ve been developing large-scale, high traffic websites for a few years now and I’ve never had need for a component that specifically markets itself as middleware. Not once. I’ve used plenty of APIs, web services, object brokers, message queues, key-value stores and any other number of components with easily identifiable purposes but I think using middleware for middleware’s sake is still just a little too meta for me. Yessir! It’s a genuine, bona fide, API’d middleware. I see it absolutely as a Springfield Monorail application, but those Shelbyville folk are so much smarter, maybe I should be more like them.

What’s my middleware? None, I don’t have one. I don’t want one, I’m pretty sure I don’t need one. Cue surprised looks, amusement and disbelief.

Ok, twist my arm, I suppose I quite like Zapier. (Ahh, ok, he’s one of us after all)

Update: I tried to find a pretty picture to augment the post with but I couldn’t find anything on Google Images or Flickr that didn’t make me want to punch the screen.

Systems & Security Tools du jour

I’ve been to two events in the past two weeks which have started me thinking harder about the way we protect and measure our enterprise systems.

The first of the two events was the fourth Splunk Live in St. Paul’s, London last week. I’ve been a big fan of Splunk for a few years but I’ve never really tried it out in production. The second was InfoSec at Earl’s Court. More about that one later.

What is Splunk?

To be honest, splunk is different things to different people. Since inception it’s had great value as a log collation and event alerting tool for systems administrators as that was what it was originally designed to do. However as both DJ Skillman and Godfrey Sullivan pointed out, Splunk has grown into a lot more than that. It solved a lot of “Big Data” (how I hate that phrase) problems before Big Data was trendy, taking arbitrary unstructured data sources structuring them in useful ways, indexing the hell out of them and adding friendly, near-real-time reporting and alerting on top. Nowadays, given the right data sources, Splunk is capable of providing across-the-board Operational Intelligence, yielding tremendous opportunities in measuring value of processes and events.

How does it work?

In order to make the most out of a Splunk installation you require at least three basic things :-

  1. A data source - anything from a basic syslog or Apache web server log to a live high level ERP logistics event feed or even entire code commits
  2. An enrichment process – something to tag packets, essentially to assign value to indexed fields, allowing the association of fields from different feeds, e.g. tallying new orders with a customer database with stock keeping perhaps.
  3. A report – a canned report, presented on a dashboard for your CFO for example, or an email alert to tell your IT manager that someone squirting 5 day experiments in at the head of the analysis pipeline is going to go over-budget on your AWS analysis pipeline in three days’ time.

How far can you go with it?

Well, here’s a few of the pick ‘n’ mix selection of things I’d like to start indexing as soon as we sort out a) the restricted data limits of our so-far-free Splunk installation and b) what’s legal to do

  • Door id access (physical site presence)
  • VPN logins (virtual site presence)
  • Wifi device registrations (guest, internal, whatever)
  • VoIP + PSTN call logs (number, duration)
  • Environmentals – temperature, humidity of labs, offices, server rooms
  • System logs for everything (syslog, authentication, Apache, FTPd, MySQL connections, Samba, the works)
  • SGE job logs with user & project accounting
  • Application logs for anything we’ve written in house
  • Experimental metadata (who ran what when, where, why)
  • Domains for all incoming + outgoing mail, plus mail/attachment weights (useful for spotting outliers exfiltrating data)
  • Firewall: accepted incoming connections
  • Continuous Integration test results (software project, timings, memory, cpu footprints)
  • SVN/Git code commits (yes, it’s possible to log the entire change set)
  • JIRA tickets (who, what, when, project, component, priority)
  • ERP logs (supply chain, logistics, stock control, manufacturing lead times)
  • CRM + online store logs (customer info, helpdesk cases, orders)
  • anything and everything else with vaguely any business value

I think it’s pretty obvious that all this stuff taken together constitutes what most people call Big Data these days. There’s quite a distinction between that sort of mixed relational data and the plainer “lots of data” I deal with day to day, experimental data in the order of a terabyte-plus per device per day.

Charts from Tables with D3js and jQuery

I’ve been tinkering with D3js on and off for a couple of months now, purely for generating simple, inline charts in web pages, made from data already dumped into HTML tables. Doing this is easier than building, caching and referencing external bitmap (PNG, GIF or whatever) images with Gnuplot or GD::Graph and also simpler than building bitmap images and serving them base64 encoded inline with <img alt=”” src=”data:…” />.

Using jQuery (or similar) to extract data from an already-present HTML table means there’s almost no code required whenever you want to add and plot a new column that someone might want to report on. Pushing all the work to the client should also mean slightly lighter server loads, though granted it’s already done the heavy lifting during the query to generate the table.

I’ve used examples from a number of sources, mostly from over on the website itself and Mike Bostock’s inspiring example gallery. Plus the ever useful jQuery and jQueryUI libraries.

The result is a tabbed (with a jqueryui-themed unordered list) report based on a data table below. Clicking on either a tab or a table heading (all except the date) will animate and redraw the chart above. The data are collected using a jQuery selector on column classes in each.

Feel free to take and reuse it – just pinch the frame source.