In this post I’d like to explain how it’s possible to integrate SVN (Subversion) source control using WebDAV and HTTPS using Apache and Active Directory to provide authentication and access control.
It’s generally accepted that SVN over WebDAV/HTTPS Â provides finer granulation security controls than SVN+SSH. The problem is that SVN+SSH is really easy to set up, requiring knowledge of svnadmin and the filesystem and very little else but WebDAV+HTTPS requires knowledge of Apache and its modules relating to WebDAV, authentication and authorisation which is quite a lot more to ask. Add to that authenticating to AD and you have yourself a lovely string of delicate single point of failure components. Ho-hum, not a huge amount you can do about that but at least the Apache components are pretty robust.
For this article I’m using CentOS but everything should be transferrable to any distribution with a little tweakage.
Repository Creation
Firstly then, pick a disk or volume with plenty of space, we’re using make your repository – same as you would for svn+ssh:
svnadmin create /var/svn/repos
Apache Modules
Install the prerequisite Apache modules:
yum install mod_dav_svn
This should also install mod_authz_svn which we’ll also be making use of. Both should end up in Apache’s module directory, in this case /etc/httpd/modules/
Download and install mod_authnz_external from its Google Code page. This allows Apache basic authentication to hook into an external authentication mechanism. mod_authnz_external.so should end up in Apache’s module directory but in my case it ended up in its default location of /usr/lib/httpd/modules/.
Download and install the companion pwauth utility from its Google Code page. In my case it installs to /usr/local/sbin/pwauth and needs suexec permissions (granted using chmod +s).
Join the machine to the domain – you’ll need an account with domain admin credentials to do this:
net ads join -U administrator
Check the join is behaving ok:
[root@svn conf]# net ads info
LDAP server: 192.168.100.10
LDAP server name: ad00.example.com
Realm: EXAMPLE.COM
Bind Path: dc=EXAMPLE,dc=COM
LDAP port: 389
Server time: Tue, 15 May 2012 22:44:34 BST
KDC server: 192.168.100.10
Server time offset: 130
(Re)start winbind to pick up the new configuration:
service winbind restart
PAM & nsswitch.conf
PAM needs to know where to pull its information from, so we tell it about the new winbind service in /etc/pam.d/system-auth.
#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth required pam_env.so
auth sufficient pam_unix.so nullok try_first_pass
auth requisite pam_succeed_if.so uid >= 500 quiet
auth sufficient pam_winbind.so try_first_pass
auth required pam_deny.so
account required pam_unix.so broken_shadow
account sufficient pam_localuser.so
account sufficient pam_succeed_if.so uid < 500 quiet
account [default=bad success=ok user_unknown=ignore] pam_winbind.so
account required pam_permit.so
password requisite pam_cracklib.so try_first_pass retry=3
password sufficient pam_unix.so md5 shadow nullok try_first_pass use_authtok
password sufficient pam_winbind.so use_authtok
password required pam_deny.so
session optional pam_keyinit.so revoke
session required pam_limits.so
session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session required /lib/security/pam_mkhomedir.so
session required pam_unix.so
session optional pam_winbind.so
YMMV with PAM. It can take quite a lot of fiddling around to make it work perfectly. This obviously has an extremely close correlation to how flaky users find the authentication service. If you’re running on 64-bit you may find you need to install 64-bit versions of pam modules, e.g. mkhomedir which aren’t installed by default.
We also modify nsswitch.conf to tell other, non-pam aspects of the system where to pull information from:
To check the authentication information is coming back correctly you can use wbinfo but I like seeing data by using getent group or getent passwd. The output of these two commands will contain domain accounts if things are working correctly and only local system accounts otherwise.
External Authentication
We’re actually going to use system accounts for authentication. To stop people continuing to use svn+ssh (and thus bypassing the authorisation controls) we edit /etc/ssh/sshd_config and use AllowUsers or AllowGroups and specify all permitted users. Using AllowGroups will also provide AD group control of permitted logins but as the list is small it’s probably overkill. My sshd_config list looks a lot like this:
AllowUsers root rmp contractor itadmin
To test external authentication run /usr/local/sbin/pwauth as below. “yay” should be displayed if things are working ok. Note the password here is displayed in clear-text:
/etc/httpd/authz_svn.conf is the only part which should require any modifications over time – the access controls specify who is allowed to read and/or write to each svn project, in fact as everything’s a URL now you can arbitrarily restrict subfolders of projects too but that’s a little OTT. It can be arbitrarily extended and can take local and active directory usernames. I’m sure mod_authz_svn has full documentation about what you can and can’t put in here.
#
# Allow anonymous read access to everything by default.
#
[/]
* = r
rmp = rw
[/myproject]
rmp = rw
bob = rw
...
SSL
So far that’s all the basic components. The last piece in the puzzle is enabling SSL for Apache. I use the following /etc/httpd/httpd.conf:
You’ll need to build yourself a certificate, self-signed if necessary, but that’s a whole other post. I recommend searching the web for “openssl self signed certificate” and you should find what you need. The above httpd.conf references the key and certificate under /etc/httpd/conf/svn.key and /etc/httpd/conf/svn.crt respectively.
The mod_authnz_external+pwauth combination can be avoided if you can persuade mod_authz_ldap to play nicely. There are a few different ldap modules around on the intertubes and after a lot of trial and even more error I couldn’t make any of them work reliably if at all.
And if all this leaves you feeling pretty nauseous it’s quite natural. To remedy this, go use git instead.
A couple of weeks ago I had some Amazon credit to use and I picked up a Western Digital TV Live. I’ve been using it on and off since then and figured I’d jot down some thoughts.
Looks
Well how does it look? It’s small for starters, smaller than a double-CD case if you can remember those, around an inch deep. Probably a little larger than the Cyclone players although I don’t have any of those to compare with. It’s also very light indeed – not having a hard disk or power supply built in means the player itself can’t have much more than a motherboard in. I imagine the heaviest component is probably a power regulator heatsink or the case itself. It doesn’t sound like it has any fans in either which means there’s no audible running noise. I’ve wall-wart power bricks which make more running noise than this unit.
Mounting is performed using a couple of recesses on the back. I put a single screw into the VESA mount on the back of the kitchen TV and hung the WDTV from that. The infrared receiver seems pretty receptive just behind the top of the TV, facing upwards and the heaviest component to worry about is the HDMI or component AV cable – not a big deal at all.
Interface
The on-screen interface is pleasant and usable once you work your way around the icons and menus. The main screens – Music/Video/Services/Settings are easy enough but the functionality of the coloured menus isn’t too clear until you’ve either played around with them enough, or read the manual (haha). Associating to Wifi is a bit of a pain if you have a long WPA key as the soft keyboard isn’t too great. I did wonder if it’s possible to attach a USB keyboard just to enter passwords etc. but I didn’t try that out.
Connecting to NFS and SMB/CIFS shared drives is relatively easy. It helps if the shares are already configured to allow guest access or have a dedicated account for media players for example. The WDTV Live really wants read-write access for any shares you’re going to use permanently so it can generate its own indices. I like navigating folders and files rather than special device-specific libraries so I’m not particularly keen on this, but if it improves the multimedia experience so be it. I’ve enough multimedia devices in the house now, each with their own method of indexing that remembering which index folders from device A need to be ignored device B is becoming a bit of a nuisance. I haven’t had more than the usual set of problems with sending remote audio to the WDTV Live from a bunch of different Android devices, or using it as a Media Renderer from the DiskStation Audio Station app.
The remote control feels solid, with positive button actions and a responsive receiver. It’s laid out logically I guess, by which I mean it’s laid out in roughly the same way as most other video & multimedia remote controls I’ve used.
Firmware Updates
So normally I expect to buy some sort of gadget like this, use it for a couple of months, find a handful of bugs and never receive any firmware updates for it ever again. However I’ve been pleasantly surprised. In the two weeks I’ve had the WDTV I’ve had two firmware updates, one during the initial installation and the most recent in the last couple of days to address, amongst other things, slow frontend performance when background tasks are running (read “multimedia indexing on network shares” here). I briefly had a scan around the web to see if there was an XBMC port and there didn’t appear to be although there were some requests. I haven’t looked to see what CPU the WDTV has inside but it’s probably a low power ARM or Broadcom or similar so would take some effort to port XBMC to (from memory I seem to recall there is an ARM port in the works though). The regular firmware is downloadable and hackable however and there’s at least one unofficial version around.
Performance
Video playback has been smooth on everything I’ve tried. The videos I’ve played back have all been different formats, different container formats, different resolutions etc. and all streamed over 802.11G wifi and ethernet. I didn’t have any trouble with either type of networking so I haven’t checked to see whether the wired port is 100Mbps or 1GbE. I haven’t tried USB playback and there’s no SD card slot, which you might expect.
Audio playback is smooth although the interface took a little getting used to. I’ve been used to the XBMC and Synology DSAudio style of Queue/Play but this device always seems to queue+play which is actually what you want a lot of the time. I don’t have a digital audio receiver so I haven’t tried the SPDIF out.
Picture playback is acceptable but I found the transitions pretty jumpy, at least with 12 and 14Mpx images over wifi.
Conclusions
Overall I’m pretty happy with this device. It’s cheap, small, quiet and unobtrusive but packs a fair punch in terms of features. My biggest gripe is that it’s really slow doing its indexing. I thought the reason could have been because it was running over wifi but even after attaching it to a wired network it’s taken three days solid scanning our family snaps and home videos (a mix of still-camera video captures, miniDV transfers and HD camcorder). It doesn’t give you an idea of how far it’s progressed or how much is left to go so the only option seems to be to leave it and let it run. I did also have an initial problem where the WDTV didn’t detect it had HDMI plugged in, preferring to use the composite video out. Unscientifically, at the same time as I updated the firmware I reversed the cable so I don’t know quite what fixed it but it seems to have been fine since.
If I had to give an overall score for the WDTV Live, I’d probably say somewhere around 8/10.
Way back in the annals of history (2002) I wrote a bit of code to perform haplotype groupings for early Ensembl-linked data. Like my recent kmer scanner example, it used one of my favourite bits of Perl – the regex engine. I dug the old script out of an backup and it was, as you’d expect, completely horrible. So for fun I gave it a makeover this evening, in-between bits of Silent Witness.
This is what it looks like now. Down to 52 lines of code from 118 lines in the 2002 version. I guess the last 10 years have made me a little over twice as concise.
#!/usr/local/bin/perl -T
use strict;
use warnings;
#########
# read everything in
#
my $in = [];
while(my $line = <>) {
chomp $line;
if(!$line) {
next;
}
#########
# build regex pattern
#
$line =~ s{[X]}{.}smxg;
#########
# store
#
push @{$in}, uc $line;
}
my $consensii = {};
#########
# iterate over inputs
#
SEQ: for my $seq (sort { srt($a, $b) } @{$in}) {
#########
# iterate over consensus sequences so far
#
for my $con (sort { srt($a, $b) } keys %{$consensii}) {
if($seq =~ /^$con$/smx ||
$con =~ /^$seq$/smx) {
#########
# if input matches consensus, store & jump to next sequence
#
push @{$consensii->{$con}}, $seq;
next SEQ;
}
}
#########
# if no match was found, create a new consensus container
#
$consensii->{$seq} = [$seq];
}
#########
# custom sort routine
# - firstly sort by sequence length
# - secondly sort by number of "."s (looseness)
#
sub srt {
my ($x, $y) = @_;
my $lx = length $x;
my $ly = length $y;
if($lx < $ly) {
return -1;
}
if($ly > $lx) {
return 1;
}
my $nx = $x =~ tr{.}{.};
my $ny = $y =~ tr{.}{.};
return $nx < => $ny;
}
#########
# tally and print everything out
#
while(my ($k, $v) = each %{$consensii}) {
$k =~ s/[.]/X/sxmg;
print $k, " [", (scalar @{$v}) ,"]\n";
for my $m (@{$v}) {
$m =~ s/[.]/X/sxmg;
print " $m\n";
}
}
Another bit of fun, basically the opposite of yesterday’s post, here we’re detecting the number of unique kmers present in a sequence. It’s easy to do this with an iterating substr approach but I like Perl’s regex engine a lot so I wanted to do it using that. Okay, I wanted to do it entirely in one /e regex but it’s slightly trickier and a lot less clear manipulating pos inside a /e substitution function.
#!/usr/local/bin/perl
use strict;
use warnings;
my $str = q[AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG];
my $k = 3;
my $match = q[\s*[ACTG]\s*]x$k;
my $seen = {};
while($str =~ m{($match)}smxgi) {
my $m = $1;
$m =~ s/\s*//smxg;
$seen->{$m}++;
pos $str = (pos $str) - $k + 1;
}
{
local $, = "\n";
print sort keys %{$seen};
}
printf "\n%d unique ${k}mers\n", scalar keys %{$seen};
$k is the size of the kmers we’re looking for. In this case 3, as we were generating yesterday.
$match attempts to take care of matches across newlines, roughly what one might find inside a FASTA. YMMV.
$seen keeps track of uniques we’ve encountered so far in $str.
The while loop iterates through matches found by the regex engine and pos, a function you don’t see too often, resets the start position for the next match, in this case to the current position minus 1 less than the length of the match (pos – k + 1).
The output looks something like this:
elwood:~/dev rmp$ ./kmers
AAA
AAC
AAG
AAT
ACA
ACC
ACG
ACT
AGA
AGC
AGG
AGT
ATA
ATC
ATG
ATT
CAA
CAC
CAG
CAT
CCA
CCC
CCG
CCT
CGA
CGC
CGG
CGT
CTA
CTC
CTG
CTT
GAA
GAC
GAG
GAT
GCA
GCC
GCG
GCT
GGA
GGC
GGG
GGT
GTA
GTC
GTG
GTT
TAA
TAC
TAG
TAT
TCA
TCC
TCG
TCT
TGA
TGC
TGG
TGT
TTA
TTC
TTG
TTT
64 unique 3mers
If I were really keen I’d make use this in a regression test for yesterday’s toy.
This evening, for “fun”, I was tinkering with a couple of methods for generating sequences containing diverse, distinct, kmer subsequences. Here’s a small, unintelligent, brute-force function I came up with.
Its alphabet is set at the top in $bases, as is k, the required length of the distinct subsequences. It keeps going until it’s been able to hit all distinct combinations, tracked in the $seen hash. The final sequence ends up in $str.
#!/usr/local/bin/perl
use strict;
use warnings;
my $bases = [qw(A C T G)];
my $k = 3;
my $seen = {};
my $str = q[];
my $max_perms = (scalar @{$bases})**$k;
my $pos = -1;
POS: while((scalar keys %{$seen}) < $max_perms) {
$pos ++;
report();
for my $base (@{$bases}) {
my $triple = sprintf q[%s%s],
(substr $str, $pos-($k-1), ($k-1)),
$base;
if($pos < ($k-1) ||
!$seen->{$triple}++) {
$str .= $base;
next POS;
}
}
$str .= $bases->[-1];
}
sub report {
print "len=@{[length $str]} seen @{[scalar keys %{$seen}]}/$max_perms kmers\n";
}
report();
print $str, "\n";
Executing for k=3, bases = ACTG the output looks like this:
elwood:~/dev rmp$ ./seqgen
len=0 seen 0/64 kmers
len=1 seen 0/64 kmers
len=2 seen 0/64 kmers
len=3 seen 1/64 kmers
len=4 seen 2/64 kmers
len=5 seen 3/64 kmers
len=6 seen 4/64 kmers
len=7 seen 5/64 kmers
len=8 seen 6/64 kmers
len=9 seen 7/64 kmers
len=10 seen 8/64 kmers
len=11 seen 9/64 kmers
len=12 seen 10/64 kmers
len=13 seen 10/64 kmers
len=14 seen 11/64 kmers
len=15 seen 12/64 kmers
len=16 seen 13/64 kmers
len=17 seen 14/64 kmers
len=18 seen 15/64 kmers
len=19 seen 16/64 kmers
len=20 seen 17/64 kmers
len=21 seen 18/64 kmers
len=22 seen 19/64 kmers
len=23 seen 20/64 kmers
len=24 seen 21/64 kmers
len=25 seen 22/64 kmers
len=26 seen 23/64 kmers
len=27 seen 24/64 kmers
len=28 seen 25/64 kmers
len=29 seen 26/64 kmers
len=30 seen 27/64 kmers
len=31 seen 28/64 kmers
len=32 seen 29/64 kmers
len=33 seen 30/64 kmers
len=34 seen 31/64 kmers
len=35 seen 32/64 kmers
len=36 seen 33/64 kmers
len=37 seen 34/64 kmers
len=38 seen 35/64 kmers
len=39 seen 36/64 kmers
len=40 seen 37/64 kmers
len=41 seen 37/64 kmers
len=42 seen 38/64 kmers
len=43 seen 39/64 kmers
len=44 seen 40/64 kmers
len=45 seen 41/64 kmers
len=46 seen 42/64 kmers
len=47 seen 43/64 kmers
len=48 seen 44/64 kmers
len=49 seen 45/64 kmers
len=50 seen 46/64 kmers
len=51 seen 47/64 kmers
len=52 seen 48/64 kmers
len=53 seen 49/64 kmers
len=54 seen 50/64 kmers
len=55 seen 51/64 kmers
len=56 seen 52/64 kmers
len=57 seen 53/64 kmers
len=58 seen 54/64 kmers
len=59 seen 55/64 kmers
len=60 seen 56/64 kmers
len=61 seen 57/64 kmers
len=62 seen 58/64 kmers
len=63 seen 59/64 kmers
len=64 seen 60/64 kmers
len=65 seen 61/64 kmers
len=66 seen 62/64 kmers
len=67 seen 63/64 kmers
len=68 seen 64/64 kmers
AAACAATAAGAAGCACCATCAGTACTATTAGGACGATGAGGCCCTCCGCTTCTGCGTCGGTTTGTGGG
As you can see, it manages to fit 64 distinct base triples in a string of only 68 characters. It could probably be packed a little more efficiently but I don’t think that’s too bad for a first attempt.