Bookmarks for May 1st through May 22nd

These are my links for May 1st through May 22nd:

SVN Server Integration with HTTPS, Active Directory, PAM & Winbind

Subversion on a whiteboard
Image CC by johntrainor
In this post I’d like to explain how it’s possible to integrate SVN (Subversion) source control using WebDAV and HTTPS using Apache and Active Directory to provide authentication and access control.

It’s generally accepted that SVN over WebDAV/HTTPS  provides finer granulation security controls than SVN+SSH. The problem is that SVN+SSH is really easy to set up, requiring knowledge of svnadmin and the filesystem and very little else but WebDAV+HTTPS requires knowledge of Apache and its modules relating to WebDAV, authentication and authorisation which is quite a lot more to ask. Add to that authenticating to AD and you have yourself a lovely string of delicate single point of failure components. Ho-hum, not a huge amount you can do about that but at least the Apache components are pretty robust.

For this article I’m using CentOS but everything should be transferrable to any distribution with a little tweakage.

Repository Creation

Firstly then, pick a disk or volume with plenty of space, we’re using make your repository – same as you would for svn+ssh:

svnadmin create /var/svn/repos

Apache Modules

Install the prerequisite Apache modules:

yum install mod_dav_svn

This should also install mod_authz_svn which we’ll also be making use of. Both should end up in Apache’s module directory, in this case /etc/httpd/modules/

Download and install mod_authnz_external from its Google Code page. This allows Apache basic authentication to hook into an external authentication mechanism. should end up in Apache’s module directory but in my case it ended up in its default location of /usr/lib/httpd/modules/.

Download and install the companion pwauth utility from its Google Code page. In my case it installs to /usr/local/sbin/pwauth and needs suexec permissions (granted using chmod +s).

Apache Configuration (HTTP)


Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/
LoadModule mime_module		modules/
LoadModule log_config_module	modules/
LoadModule dav_module		modules/
LoadModule dav_svn_module	modules/
LoadModule auth_basic_module    modules/
LoadModule authz_svn_module	modules/
LoadModule authnz_external_module modules/

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	<location / >
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

			Require valid-user

Network Time (NTP)

In order to join a Windows domain, accurate and synchronised time is crucial, so you’ll need to be running NTPd.

yum install ntp
chkconfig ntpd on
service ntpd start

Samba Configuration

Here’s where AD comes in and in my experience this is by far the most unreliable service. Install and configure samba:

yum install samba
chkconfig winbind on

Edit your /etc/samba/smb.conf to pull information from AD.

	workgroup = EXAMPLE
	realm = EXAMPLE.COM
	security = ADS
	allow trusted domains = No
	use kerberos keytab = Yes
	log level = 3
	log file = /var/log/samba/%m
	max log size = 50
	printcap name = cups
	idmap backend = idmap_rid:EXAMPLE=600-20000
	idmap uid = 600-20000
	idmap gid = 600-20000
	template shell = /bin/bash
	winbind enum users = Yes
	winbind enum groups = Yes
	winbind use default domain = Yes
	winbind offline logon = yes

Join the machine to the domain – you’ll need an account with domain admin credentials to do this:

net ads join -U administrator

Check the join is behaving ok:

[root@svn conf]# net ads info
LDAP server:
LDAP server name:
Bind Path: dc=EXAMPLE,dc=COM
LDAP port: 389
Server time: Tue, 15 May 2012 22:44:34 BST
KDC server:
Server time offset: 130

(Re)start winbind to pick up the new configuration:

service winbind restart

PAM & nsswitch.conf

PAM needs to know where to pull its information from, so we tell it about the new winbind service in /etc/pam.d/system-auth.

# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required
auth        sufficient nullok try_first_pass
auth        requisite uid >= 500 quiet
auth        sufficient try_first_pass
auth        required

account     required broken_shadow
account     sufficient
account     sufficient uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore]
account     required

password    requisite try_first_pass retry=3
password    sufficient md5 shadow nullok try_first_pass use_authtok
password    sufficient use_authtok
password    required

session     optional revoke
session     required
session     [success=1 default=ignore] service in crond quiet use_uid
session     required      /lib/security/ 
session     required
session     optional

YMMV with PAM. It can take quite a lot of fiddling around to make it work perfectly. This obviously has an extremely close correlation to how flaky users find the authentication service. If you’re running on 64-bit you may find you need to install 64-bit versions of pam modules, e.g. mkhomedir which aren’t installed by default.

We also modify nsswitch.conf to tell other, non-pam aspects of the system where to pull information from:

passwd:     files winbind
shadow:     files winbind
group:      files winbind

To check the authentication information is coming back correctly you can use wbinfo but I like seeing data by using getent group or getent passwd. The output of these two commands will contain domain accounts if things are working correctly and only local system accounts otherwise.

External Authentication

We’re actually going to use system accounts for authentication. To stop people continuing to use svn+ssh (and thus bypassing the authorisation controls) we edit /etc/ssh/sshd_config and use AllowUsers or AllowGroups and specify all permitted users. Using AllowGroups will also provide AD group control of permitted logins but as the list is small it’s probably overkill. My sshd_config list looks a lot like this:

AllowUsers	root rmp contractor itadmin

To test external authentication run /usr/local/sbin/pwauth as below. “yay” should be displayed if things are working ok. Note the password here is displayed in clear-text:

[root@svn conf]# pwauth && echo 'yay' || echo 'nay'

Access Controls

/etc/httpd/authz_svn.conf is the only part which should require any modifications over time – the access controls specify who is allowed to read and/or write to each svn project, in fact as everything’s a URL now you can arbitrarily restrict subfolders of projects too but that’s a little OTT. It can be arbitrarily extended and can take local and active directory usernames. I’m sure mod_authz_svn has full documentation about what you can and can’t put in here.

# Allow anonymous read access to everything by default.
* = r
rmp = rw

rmp = rw
bob = rw



So far that’s all the basic components. The last piece in the puzzle is enabling SSL for Apache. I use the following /etc/httpd/httpd.conf:


Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/
LoadModule mime_module		modules/
LoadModule log_config_module	modules/
LoadModule proxy_module		modules/
LoadModule proxy_http_module	modules/
LoadModule rewrite_module	modules/
LoadModule dav_module		modules/
LoadModule dav_svn_module	modules/
LoadModule auth_basic_module    modules/
LoadModule authz_svn_module	modules/
LoadModule ssl_module		modules/
LoadModule authnz_external_module modules/

Include conf.d/ssl.conf

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>

	Rewrite		/	[R=permanent,L]

<virtualhost *:443>

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	SSLEngine on
	SSLProtocol all -SSLv2

	SSLCertificateFile	/etc/httpd/conf/svn.crt
	SSLCertificateKeyFile	/etc/httpd/conf/svn.key

	<location />
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

			Require valid-user

/etc/httpd/conf.d/ssl.conf is pretty much the unmodified distribution ssl.conf and looks like this:

LoadModule ssl_module modules/

Listen 443

AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl

SSLPassPhraseDialog  builtin

SSLSessionCache         shmcb:/var/cache/mod_ssl/scache(512000)
SSLSessionCacheTimeout  300

SSLMutex default

SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin

SSLCryptoDevice builtin

SetEnvIf User-Agent ".*MSIE.*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

You’ll need to build yourself a certificate, self-signed if necessary, but that’s a whole other post. I recommend searching the web for “openssl self signed certificate” and you should find what you need. The above httpd.conf references the key and certificate under /etc/httpd/conf/svn.key and /etc/httpd/conf/svn.crt respectively.

The mod_authnz_external+pwauth combination can be avoided if you can persuade mod_authz_ldap to play nicely. There are a few different ldap modules around on the intertubes and after a lot of trial and even more error I couldn’t make any of them work reliably if at all.

And if all this leaves you feeling pretty nauseous it’s quite natural. To remedy this, go use git instead.

Thoughts on the WDTV Live Streaming Multimedia Player

A couple of weeks ago I had some Amazon credit to use and I picked up a Western Digital TV Live. I’ve been using it on and off since then and figured I’d jot down some thoughts.


Well how does it look? It’s small for starters, smaller than a double-CD case if you can remember those, around an inch deep. Probably a little larger than the Cyclone players although I don’t have any of those to compare with. It’s also very light indeed – not having a hard disk or power supply built in means the player itself can’t have much more than a motherboard in. I imagine the heaviest component is probably a power regulator heatsink or the case itself. It doesn’t sound like it has any fans in either which means there’s no audible running noise. I’ve wall-wart power bricks which make more running noise than this unit.

Mounting is performed using a couple of recesses on the back. I put a single screw into the VESA mount on the back of the kitchen TV and hung the WDTV from that. The infrared receiver seems pretty receptive just behind the top of the TV, facing upwards and the heaviest component to worry about is the HDMI or component AV cable – not a big deal at all.


The on-screen interface is pleasant and usable once you work your way around the icons and menus. The main screens – Music/Video/Services/Settings are easy enough but the functionality of the coloured menus isn’t too clear until you’ve either played around with them enough, or read the manual (haha). Associating to Wifi is a bit of a pain if you have a long WPA key as the soft keyboard isn’t too great. I did wonder if it’s possible to attach a USB keyboard just to enter passwords etc. but I didn’t try that out.

Connecting to NFS and SMB/CIFS shared drives is relatively easy. It helps if the shares are already configured to allow guest access or have a dedicated account for media players for example. The WDTV Live really wants read-write access for any shares you’re going to use permanently so it can generate its own indices. I like navigating folders and files rather than special device-specific libraries so I’m not particularly keen on this, but if it improves the multimedia experience so be it. I’ve enough multimedia devices in the house now, each with their own method of indexing that remembering which index folders from device A need to be ignored device B is becoming a bit of a nuisance. I haven’t had more than the usual set of problems with sending remote audio to the WDTV Live from a bunch of different Android devices, or using it as a Media Renderer from the DiskStation Audio Station app.

The remote control feels solid, with positive button actions and a responsive receiver. It’s laid out logically I guess, by which I mean it’s laid out in roughly the same way as most other video & multimedia remote controls I’ve used.

Firmware Updates

So normally I expect to buy some sort of gadget like this, use it for a couple of months, find a handful of bugs and never receive any firmware updates for it ever again. However I’ve been pleasantly surprised. In the two weeks I’ve had the WDTV I’ve had two firmware updates, one during the initial installation and the most recent in the last couple of days to address, amongst other things, slow frontend performance when background tasks are running (read “multimedia indexing on network shares” here). I briefly had a scan around the web to see if there was an XBMC port and there didn’t appear to be although there were some requests. I haven’t looked to see what CPU the WDTV has inside but it’s probably a low power ARM or Broadcom or similar so would take some effort to port XBMC to (from memory I seem to recall there is an ARM port in the works though). The regular firmware is downloadable and hackable however and there’s at least one unofficial version around.


Video playback has been smooth on everything I’ve tried. The videos I’ve played back have all been different formats, different container formats, different resolutions etc. and all streamed over 802.11G wifi and ethernet. I didn’t have any trouble with either type of networking so I haven’t checked to see whether the wired port is 100Mbps or 1GbE. I haven’t tried USB playback and there’s no SD card slot, which you might expect.

Audio playback is smooth although the interface took a little getting used to. I’ve been used to the XBMC and Synology DSAudio style of Queue/Play but this device always seems to queue+play which is actually what you want a lot of the time. I don’t have a digital audio receiver so I haven’t tried the SPDIF out.

Picture playback is acceptable but I found the transitions pretty jumpy, at least with 12 and 14Mpx images over wifi.


Overall I’m pretty happy with this device. It’s cheap, small, quiet and unobtrusive but packs a fair punch in terms of features. My biggest gripe is that it’s really slow doing its indexing. I thought the reason could have been because it was running over wifi but even after attaching it to a wired network it’s taken three days solid scanning our family snaps and home videos (a mix of still-camera video captures, miniDV transfers and HD camcorder). It doesn’t give you an idea of how far it’s progressed or how much is left to go so the only option seems to be to leave it and let it run. I did also have an initial problem where the WDTV didn’t detect it had HDMI plugged in, preferring to use the composite video out. Unscientifically, at the same time as I updated the firmware I reversed the cable so I don’t know quite what fixed it but it seems to have been fine since.

If I had to give an overall score for the WDTV Live, I’d probably say somewhere around 8/10.


Haplotype Consensus Clustering

Way back in the annals of history (2002) I wrote a bit of code to perform haplotype groupings for early Ensembl-linked data. Like my recent kmer scanner example, it used one of my favourite bits of Perl – the regex engine. I dug the old script out of an backup and it was, as you’d expect, completely horrible. So for fun I gave it a makeover this evening, in-between bits of Silent Witness.

This is what it looks like now. Down to 52 lines of code from 118 lines in the 2002 version. I guess the last 10 years have made me a little over twice as concise.

#!/usr/local/bin/perl -T
use strict;
use warnings;

# read everything in
my $in = [];
while(my $line = <>) {
  chomp $line;
  if(!$line) {

  # build regex pattern
  $line =~ s{[X]}{.}smxg;

  # store
  push @{$in}, uc $line;

my $consensii = {};

# iterate over inputs
SEQ: for my $seq (sort { srt($a, $b) } @{$in}) {
  # iterate over consensus sequences so far
  for my $con (sort { srt($a, $b) } keys %{$consensii}) {
    if($seq =~ /^$con$/smx ||
       $con =~ /^$seq$/smx) {
      # if input matches consensus, store & jump to next sequence
      push @{$consensii->{$con}}, $seq;
      next SEQ;

  # if no match was found, create a new consensus container
  $consensii->{$seq} = [$seq];

# custom sort routine
# - firstly sort by sequence length
# - secondly sort by number of "."s (looseness)
sub srt {
  my ($x, $y) = @_;
  my $lx = length $x;
  my $ly = length $y;

  if($lx < $ly) {
    return -1;
  if($ly > $lx) {
    return 1;

  my $nx = $x =~ tr{.}{.};
  my $ny = $y =~ tr{.}{.};

  return $nx < => $ny;

# tally and print everything out
while(my ($k, $v) = each %{$consensii}) {
  $k =~ s/[.]/X/sxmg;
  print $k, " [", (scalar @{$v}) ,"]\n";
  for my $m (@{$v}) {
    $m =~ s/[.]/X/sxmg;
    print "  $m\n";

The input file looks something like this:



and the output looks a little like this – consensus [number of sequences] followed by an indented list of matching sequences:

elwood:~/dev rmp$ ./haplotype-sort < haplotype-in.txt 

naïve kmer scanner

Another bit of fun, basically the opposite of yesterday’s post, here we’re detecting the number of unique kmers present in a sequence. It’s easy to do this with an iterating substr approach but I like Perl’s regex engine a lot so I wanted to do it using that. Okay, I wanted to do it entirely in one /e regex but it’s slightly trickier and a lot less clear manipulating pos inside a /e substitution function.

use strict;
use warnings;

my $k     = 3;
my $match = q[\s*[ACTG]\s*]x$k;
my $seen  = {};

while($str =~ m{($match)}smxgi) {
  my $m = $1;
  $m    =~ s/\s*//smxg;


  pos $str = (pos $str) - $k + 1;

  local $, = "\n";
  print sort keys %{$seen};

printf "\n%d unique ${k}mers\n", scalar keys %{$seen};

$k is the size of the kmers we’re looking for. In this case 3, as we were generating yesterday.
$match attempts to take care of matches across newlines, roughly what one might find inside a FASTA. YMMV.
$seen keeps track of uniques we’ve encountered so far in $str.

The while loop iterates through matches found by the regex engine and pos, a function you don’t see too often, resets the start position for the next match, in this case to the current position minus 1 less than the length of the match (pos – k + 1).

The output looks something like this:

elwood:~/dev rmp$ ./kmers 
64 unique 3mers

If I were really keen I’d make use this in a regression test for yesterday’s toy.

naïve kmer sequence generator

This evening, for “fun”, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here’s a small, unintelligent, brute-force function I came up with.
Its alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it’s been able to hit all distinct combinations, tracked in the $seen hash. The final sequence ends up in $str.

use strict;
use warnings;

my $bases     = [qw(A C T G)];
my $k         = 3;
my $seen      = {};
my $str       = q[];
my $max_perms = (scalar @{$bases})**$k;
my $pos       = -1;

POS: while((scalar keys %{$seen}) < $max_perms) {
  $pos ++;

  for my $base (@{$bases}) {
    my $triple = sprintf q[%s%s],
                 (substr $str, $pos-($k-1), ($k-1)),
    if($pos < ($k-1) ||
       !$seen->{$triple}++) {
      $str .= $base;
      next POS;
  $str .= $bases->[-1];

sub report {
  print "len=@{[length $str]} seen @{[scalar keys %{$seen}]}/$max_perms kmers\n";

print $str, "\n";

Executing for k=3, bases = ACTG the output looks like this:

elwood:~/dev rmp$ ./seqgen
len=0 seen 0/64 kmers
len=1 seen 0/64 kmers
len=2 seen 0/64 kmers
len=3 seen 1/64 kmers
len=4 seen 2/64 kmers
len=5 seen 3/64 kmers
len=6 seen 4/64 kmers
len=7 seen 5/64 kmers
len=8 seen 6/64 kmers
len=9 seen 7/64 kmers
len=10 seen 8/64 kmers
len=11 seen 9/64 kmers
len=12 seen 10/64 kmers
len=13 seen 10/64 kmers
len=14 seen 11/64 kmers
len=15 seen 12/64 kmers
len=16 seen 13/64 kmers
len=17 seen 14/64 kmers
len=18 seen 15/64 kmers
len=19 seen 16/64 kmers
len=20 seen 17/64 kmers
len=21 seen 18/64 kmers
len=22 seen 19/64 kmers
len=23 seen 20/64 kmers
len=24 seen 21/64 kmers
len=25 seen 22/64 kmers
len=26 seen 23/64 kmers
len=27 seen 24/64 kmers
len=28 seen 25/64 kmers
len=29 seen 26/64 kmers
len=30 seen 27/64 kmers
len=31 seen 28/64 kmers
len=32 seen 29/64 kmers
len=33 seen 30/64 kmers
len=34 seen 31/64 kmers
len=35 seen 32/64 kmers
len=36 seen 33/64 kmers
len=37 seen 34/64 kmers
len=38 seen 35/64 kmers
len=39 seen 36/64 kmers
len=40 seen 37/64 kmers
len=41 seen 37/64 kmers
len=42 seen 38/64 kmers
len=43 seen 39/64 kmers
len=44 seen 40/64 kmers
len=45 seen 41/64 kmers
len=46 seen 42/64 kmers
len=47 seen 43/64 kmers
len=48 seen 44/64 kmers
len=49 seen 45/64 kmers
len=50 seen 46/64 kmers
len=51 seen 47/64 kmers
len=52 seen 48/64 kmers
len=53 seen 49/64 kmers
len=54 seen 50/64 kmers
len=55 seen 51/64 kmers
len=56 seen 52/64 kmers
len=57 seen 53/64 kmers
len=58 seen 54/64 kmers
len=59 seen 55/64 kmers
len=60 seen 56/64 kmers
len=61 seen 57/64 kmers
len=62 seen 58/64 kmers
len=63 seen 59/64 kmers
len=64 seen 60/64 kmers
len=65 seen 61/64 kmers
len=66 seen 62/64 kmers
len=67 seen 63/64 kmers
len=68 seen 64/64 kmers

As you can see, it manages to fit 64 distinct base triples in a string of only 68 characters. It could probably be packed a little more efficiently but I don’t think that’s too bad for a first attempt.