Bookmarks for October 7th through October 13th

These are my links for October 7th through October 13th:

Adventures in UTF-8

I think I’m very nearly at the verge of beginning to understand UTF-8.

Internal UTF-8 string, encoded
Wrong:

sentinel:~ rmp$ perl -MHTML::Entities -e 'print encode_entities("°")'
°

Right:

sentinel:~ rmp$ perl -Mutf8 -MHTML::Entities -e 'print encode_entities("°")'
°

External UTF-8 input, encoded
Wrong:

sentinel:~ rmp$ echo "°" | perl -MHTML::Entities -e 'print encode_entities(<>)'
&Acirc;&deg;

Right:

sentinel:~ rmp$ echo "°" | perl -MHTML::Entities -e 'binmode STDIN, ":utf8"; print encode_entities(<>)'
&deg;

External UTF-8 string, as UTF-8 (unencoded)
Wrong:

sentinel:~ rmp$ echo "°" | perl -e 'binmode STDIN, ":utf8"; print <>'
?

Right:

sentinel:~ rmp$ echo "°" | perl -e 'binmode STDIN, ":utf8"; 
binmode STDOUT, ":utf8"; print <>'
°

External Input – Encoding after-the-fact
Wrong:

sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::upgrade($in);
binmode STDOUT, ":utf8"; print $in'
°

Wrong:

sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::encode($in);
binmode STDOUT, ":utf8"; print $in'
°

Wrong:

sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::downgrade($in); 
binmode STDOUT, ":utf8"; print $in'
°

Right:

sentinel:~ rmp$ echo "°" | perl -Mutf8 -e '$in=<>; utf8::decode($in);
binmode STDOUT, ":utf8"; print $in'
°

Generating MSCACHE & NTLM hashes using Perl

I’ve been doing a lot of tinkering recently whilst working on the revised rainbowcracklimited.com website. Naturally it uses Perl on the back end so I’ve had to find out how to make Windows-style hashes of various types using largely non-native means.

On the whole I’ve been able to make good use of the wealth of CPAN modules – Digest::MD4, Digest::MD5, Digest::SHA and Authen::Passphrase but for one reason and another I’ve wanted to find out how to make NTLM and MSCACHE hashes “by-hand”. It turns out this is pretty easy:

NTLM is just a MD4 digest of the password in Unicode, or to be specific utf16 2-byte characters + surrogates:

perl -M"Unicode::String latin1" -M"Digest::MD4 md4_hex" -e 'print md4_hex(latin1("cheese")->utf16le),"\n"'

MSCACHE is a little bit more fiddly as it also encodes the Unicode username as well:

perl -M"Unicode::String latin1" -M"Digest::MD4 md4_hex" -e 'print md4_hex(latin1("cheese")->utf16le . latin1(lc "Administrator")->utf16le),"\n"'

An Interview Question

I’d like to share a basic interview question I’ve used in the past. I’ve used this in a number of different guises over the years, both at Sanger and at ONT but the (very small!) core remains the same. It still seems to be able to trip up a lot of people who sell themselves as senior developers on their CVs and demand £35k+ salaries.

You have a list of characters.

  1. Remove duplicates

The time taken for the interviewee to scratch their head determines whether they’re a Perl programmer, or at least think like one – this is an idomatic question in Perl. It’s a fairly standard solution to anyone who uses hashes, maps or associative arrays in any language. It’s certainly a lot harder without them.

The answer I would expect to see would run something like this:

#########
# pass in an array ref of characters, e.g.
# remove_dupes([qw(a e r o i g n o s e w f e r g e r i g e o n k)]);
#
sub remove_dupes {
  my $chars_in  = shift;
  my $seen      = {};
  my $chars_out = [];

  for my $char (@{$chars_in}) {
    if(!$seen->{$char}++) {
      push @{$chars_out}, $char;
    }
  }

  return $chars_out;
}

Or for the more adventurous, using a string rather than an array:

#########
# pass in a string of characters, e.g.
# remove_dupes(q[uyavubnopwemgnisudhjopwenfbuihrpgbwogpnskbjugisjb]);
#
sub remove_dupes {
  my $str  = shift;
  my $seen = {};
  $str     =~ s/(.)/( !$seen->{$1}++ ) ? $1 : q[]/smegx;
  return $str;
}

The natural progression from Q1 then follows. It should be immediately obvious to the interviewee if they answered Q1 inappropriately.

  1. List duplicates
#########
# pass in an array ref of characters, e.g.
# list_dupes([qw(a e r o i g n o s e w f e r g e r i g e o n k)]);
#
sub list_dupes {
  my $chars_in  = shift;
  my $seen      = {};
  my $chars_out = [];

  for my $char (@{$chars_in}) {
    $seen->{$char}++;
  }

  return [ grep { $seen->{$_} > 1 } keys %{$seen} ];
}

and with a string

#########
# pass in a string of characters, e.g.
# list_dupes(q[uyavubnopwemgnisudhjopwenfbuihrpgbwogpnskbjugisjb]);
#
sub list_dupes {
  my $str  = shift;
  my $seen = {};
  $str     =~ s/(.)/( $seen->{$1}++ > 1) ? $1 : q[]/smegx;
  return $str;
}

The standard follow-up is then “Given more time, what would you do to improve this?”. Well? What would you do? I know what I would do before I even started – WRITE SOME TESTS!

It’s pretty safe to assume that any communicative, personable candidate who starts off writing a test on the board will probably be head and shoulders above any other.

If I’m interviewing you tomorrow and you’re reading this now, it’s also safe to mention it. Interest in the subject and a working knowledge of the intertubes generally comes in handy for a web developer. I’m hiring you as an independent thinker!

Web Frameworking

It seems to be the wrong time to be reading such things, but over on InfoQ there’s a nice article introducing web development of RESTful services using Erlang and the Yaws high performance web server.

I say “the wrong time” as this week has kicked off the “Advancing with Rails” course by David A. Black of Ruby Power and Light fame. The course is fairly advanced in terms of required rails knowledge so it’s a bit of a baptism by fire for me and a few others having never written any Ruby before.

Rails is proving moderately easy to pick up but as I’ve remarked to a couple of people, it doesn’t seem any easier coding with Rails than with Perl. Perhaps it’s because I’ve never done it before but I reckon it’s a lot harder spending my time figuring out what the heck DHH meant something to do than it is doing it myself.

Even though it’s nowhere near as mature, I do reckon my ClearPress framework has a lot going for it – it’s pretty feature-complete in terms of ORM, views and templating ( TT2 ). It has similar convention over configuration features meaning it’s not designed for plugging in other alternative layers but it is absolutely possible to do (and I suspect without as much effort as is required in Rails). I still need to iron out some wrinkles in the autogenerated code from the application builder and provide some default authorisation and authentication mechanisms, some of which may come in the next release. But in the meantime it’s easy to add these features, which is exactly what we’ve done for the new sequencing run tracking app, NPG to tie it to the WTSI website single sign on (MySQL and LDAP under the hood).

7 utilities for improving application quality in Perl

I’d like to share with you a list of what are probably my top utilities for improving code quality (style, documentation, testing) with a largely Perl flavour. In loosely important-but-dull to exciting-and-weird order…

Test::More. Billed as yet another framework for writing test scripts Test::More extends Test::Simple and provides a bunch of more useful methods beyond Simple’s ok(). The ones I use most being use_ok() for testing compilation, is() for testing equality and like() for testing similarity with regexes.

ExtUtils::MakeMaker. Another one of Mike Schwern’s babies, MakeMaker is used to set up a folder structure and associated ‘make’ paraphernalia when first embarking on writing a module or application. Although developers these days tend to favour Module::Build over MakeMaker I prefer it for some reason (probably fear of change) and still make regular mileage using it.

Test::Pod::Coverage – what a great module! Check how good your documentation coverage is with respect to the code. No just a subroutine header won’t do! I tend to use Test::Pod::Coverage as part of…

Test::Distribution . Automatically run a battery of standard tests including pod coverage, manifest integrity, straight compilation and a load of other important things.

perlcritic, Test::Perl::Critic . The Perl::Critic set of tools is amazing. It’s built on PPI and implements the Perl Best Practices book by Damien Conway. Now I realise that not everyone agrees with a lot of what Damien says but the point is that it represents a standard to work to (and it’s not that bad once you’re used to it). Since I discovered perlcritic I’ve been developing all my code as close to perlcritic -1 (the most severe) as I can. It’s almost instantly made my applications more readable through systematic appearance and made faults easier to spot even before Test::Perl::Critic comes in.

Devel::Cover. I’m almost ashamed to say I only discovered this last week after dipping into Ian Langworthy and chromatic’s book ‘Perl Testing’. Devel::Cover gives code exercise metrics, i.e. how much of your module or application was actually executed by that test. It collates stats from all modules matching a user-specified pattern and dumps them out in a natty coloured table, very suitable for tying into your CI system.

Selenium . Ok, not strictly speaking a tool I’m using right this minute but it’s next on my list of integration tools. Selenium is a non-interactive, automated, browser-testing framework written in Javascript. This tool definitely has legs and it seems to have come a long way since I first found it in the middle of 2006. I’m hoping to have automated interface testing up and running before the end of the year as part of the Perl CI system I’m planning on putting together for the new sequencing pipeline.

Hiring Perl Developers – how hard can it be?

All the roles I’ve had during my time at Sanger have more or less required the development of production quality Perl code, usually OO and increasingly using MVC patterns. Why is it then that very nearly every Perl developer I’ve interviewed in the past 8 years is woefully lacking, specifically in OO Perl but more generally in half-decent programming skills?

It’s been astonishing, not in a good way, how many have been unable to demonstrate use of hashes. Some have been too scared of them (their words, not mine) and some have never felt the need. For those of you who aren’t Perl programmers, hashes (aka associative arrays) are a pretty crucial feature of the language and fundamental to its OO implementation.

Now I program in Perl sometimes more than 7-8 hours a day. For many years this also involved reworking other people’s code. I can very easily say that if you claim to be a Perl programmer and have never used hashes then you’re not going to get a Perl-related job because of your technical skills. With a good, interactive and engaging personality and a desire for self-improvement you might get away with it, but certainly not on technical merit.

It’s also quite worrying how many of these interviewees are unable to describe the basics of object-oriented programming yet have, for example, developed and sold a commercial ERP system, presumably for big bucks. Man, these people must have awesome marketing!

Frankly a number of the bioinformaticians already working there have similar skills to the interviewees and often worse communication skills, so maybe I’m simply setting my standards too high.

I really hope this situation improves when Perl 6 goes public though I’m sure it’ll take longer to become common parlance. As long as it happens before those smug RoR types take over the world I’ll be happy ;)

Sporting Developments

I recently started reading Agile Software Development with Scrum by Schwaber and Beedle. It’s a great introduction to this branch of the Agile movement. It’s easy to read and contains practical advice and straight-forward explainations of the terms and processes involved with Scrum.

Even more satisfying than the read itself was the realisation that I’ve been using a good number of the Scrum techniques in managing projects within my team for the last three years or so. I love the idea of a development team reaching a nirvana-like hyper-productive state though one of the examples of a four-person team at Quattro producing 1000 lines of C++ a week took me aback.

In the middle of last month I moved to a new position at WTSI, Team Leader for the New Sequencing Pipeline development team (currently consisting of me). Since then I’ve been working on what I’ll now call a code sprint and last week I had my first product increment. The product is a smallish system for tracking runs on the new technology sequencing machines but is around 10,000 lines of Perl (excluding templates, CSS & tests) built on a light MVC framework I produced in the same time. A one man-team producing 3,333 loc in a week seems ultra-productive and I can’t believe it’s *purely* down to the fact that Perl is easier to write than C++.

Anyway, I’m on a C++ course all next week, so I’ll soon be able to tell. Shame it’s not about Rails instead ;)