Content Delivery Network (CDN) using Linode VPS

This month one of the neat things I’ve done was to set up a small content delivery network (CDN) for speedy downloading of files across the globe. For one reason and another (mostly the difficulty in doing this purely with DNS and the desire not to use AWS), I opted to do this using my favourite VPS provider, Linode. All in all (and give or take DNS propagation time) I reckon it’s possible to deploy a multi-site CDN in under 30 minutes given a bit of practice. Not too shabby!

For this recipe you will need:

  1. Linode account
  2. A domain name and DNS management

What you’ll end up with:

  1. 3x Ubuntu 12.04 LTS VPS, one each in London, Tokyo and California
  2. 3x NodeBalancers, one each in London, Tokyo and California
  3. 1x user-facing general web address
  4. 3x continent-facing web addresses

I’m going to use “mycdn.com” wherever I refer to my DNS / domain. You should substitute your domain name wherever you see it.

So, firstly log in to Linode.

Create three new Linode 1024 small VPSes (or whatever size you think you’ll need). I set mine up as Ubuntu 12.04 LTS with 512MB swap but otherwise nothing special. Set one each to be in London, Tokyo and Fremont. Set the root password on each. Under “Settings”, give each VPS a label. I called mine vps-<city>-01. Under “Remote Settings”, give each a private IP and note them down together with the VPS/data centre they’re in.

At this point it’s also useful (but not strictly necessary) to give each node a DNS CNAME for its external IP address, just so you can log in to them easily by name later.

Boot all three machines and check you can login to them. I find it useful here to do an

apt-get update ; apt-get dist-upgrade.

You can also now install Apache and mod_geoip on each node:

apt-get install apache2 libapache2-mod-geoip
a2enmod include
a2enmod rewrite

You should now be able to bring up a web browser on each VPS (public IP or CNAME) in turn and see the default Apache “It works!” page.

Ok.. still with me? Next we’ll ask Linode to fire up three NodeBalancers, again one in each of the data centres, for each VPS. I labelled mine cdn-lb-<city>-01. Each one can be configured with a Port, 80 with, for now, the default settings. Add a host to each NodeBalancer with the private IP of each VPS, and the port, e.g. 192.168.128.123:80 . Note that each VPS hasn’t yet been configured to listen on those interfaces so each NodeBalancer won’t recognise its host as being up.

Ok. Let’s fix those private interfaces. SSH into each VPS using the root account and the password you set earlier. Edit /etc/network/interfaces and add:

auto eth0:1
iface eth0:1 inet static
	address <VPS private address here>
	netmask <VPS private netmask here>

Note that your private netmask is very unlikely to be 255.255.255.0 (probably) like your home network and yes, this does make a difference. Once that configuration is in, you can:

ifup eth0:1

Now we can add DNS CNAMEs for each NodeBalancer. Take the public IP for each NodeBalancer over to your DNS manager and add a meaningful CNAME for each one. I used continental regions americas, apac, europe, but you might prefer to be more specific than that (e.g. us-west, eu-west, …). Once the DNS propagates you should be able to see each of your Apache “It works!” pages again in your browser, but this time the traffic is running through the NodeBalancer (you might need to wait a few seconds before the NodeBalancer notices the VPS is now up).

Ok so let’s take stock. We have three VPS, each with a NodeBalancer and each running a web server. We could stop here and just present a homepage to each user telling them to manually select their local mirror – and some sites do that, but we can do a bit better.

Earlier we installed libapache2-mod-geoip. This includes a (free) database from MaxMind which maps IP address blocks to the continents they’re allocated to (via the ISP who’s bought them). The Apache module takes the database and sets a series of environment variables for each and every visitor IP. We can use this to have a good guess at roughly where a visitor is and bounce them out to the nearest of our NodeBalancers – magic!

So, let’s poke the Apache configuration a bit. rm /etc/apache2/sites-enabled/000-default. Create a new file /etc/apache2/sites-available/mirror.mycdn.com and give it the following contents:

<VirtualHost>
	ServerName mirror.mycdn.com
	ServerAlias *.mycdn.com
	ServerAdmin webmaster@mycdn.com

	DocumentRoot /mirror/htdocs

	DirectoryIndex index.shtml index.html

	GeoIPEnable     On
	GeoIPScanProxyHeaders     On

	RewriteEngine     On

	RewriteCond %{HTTP_HOST} !americas.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} NA|SA
	RewriteRule (.*) http://americas.mycdn.com$1 [R=permanent,L]

	RewriteCond %{HTTP_HOST} !apac.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} AS|OC
	RewriteRule (.*) http://apac.mycdn.com$1 [R=permanent,L]

	RewriteCond %{HTTP_HOST} !europe.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} EU|AF
	RewriteRule (.*) http://europe.mycdn.com$1 [R=permanent,L]

	<Directory />
		Order deny,allow
		Deny from all
		Options None
	</Directory>

	<Directory /mirror/htdocs>
		Order allow,deny
		Allow from all
		Options IncludesNoExec
	</Directory>
</VirtualHost>

Now ln -s /etc/apache2/sites-available/mirror.mycdn.com /etc/apache2/sites-enabled/ .

mkdir -p /mirror/htdocs to make your new document root and add a file called index.shtml there. The contents should look something like:

<html>
 <body>
  <h1>MyCDN Test Page</h1>
  <h2><!--#echo var="HTTP_HOST" --></h2>
<!--#set var="mirror_eu"       value="http://europe.mycdn.com/" -->
<!--#set var="mirror_apac"     value="http://apac.mycdn.com/" -->
<!--#set var="mirror_americas" value="http://americas.mycdn.com/" -->

<!--#if expr="${GEOIP_CONTINENT_CODE} == AF"-->
 <!--#set var="continent" value="Africa"-->
 <!--#set var="mirror" value="${mirror_eu}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == AS"-->
 <!--#set var="continent" value="Asia"-->
 <!--#set var="mirror" value="${mirror_apac}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == EU"-->
 <!--#set var="continent" value="Europe"-->
 <!--#set var="mirror" value="${mirror_eu}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == NA"-->
 <!--#set var="continent" value="North America"-->
 <!--#set var="mirror" value="${mirror_americas}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == OC"-->
 <!--#set var="continent" value="Oceania"-->
 <!--#set var="mirror" value="${mirror_apac}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == SA"-->
 <!--#set var="continent" value="South America"-->
 <!--#set var="mirror" value="${mirror_americas}"-->
<!--#endif -->
<!--#if expr="${GEOIP_CONTINENT_CODE}"-->
 <p>
  You appear to be in <!--#echo var="continent"-->.
  Your nearest mirror is <a href="<!--#echo var="mirror" -->"><!--#echo var="mirror" --></a>.
 </p>
 <p>
  Or choose from one of the following:
 </p>
<!--#else -->
 <p>
  Please choose your nearest mirror:
 </p>
<!--#endif -->

<ul>
 <li><a href="<!--#echo var="mirror_eu"       -->"><!--#echo var="mirror_eu"        --></a> Europe (London)</a></li>
 <li><a href="<!--#echo var="mirror_apac"     -->"><!--#echo var="mirror_apac"      --></a> Asia/Pacific (Tokyo)</a></li>
 <li><a href="<!--#echo var="mirror_americas" -->"><!--#echo var="mirror_americas"  --></a> USA (Fremont, CA)</a></li>
</ul>

<pre style="color:#ccc;font-size:smaller">
http-x-forwarded-for=<!--#echo var="HTTP_X_FORWARDED_FOR" -->
GEOIP_CONTINENT_CODE=<!--#echo var="GEOIP_CONTINENT_CODE" -->
</pre>
 </body>
</html>

Then apachectl restart to pick up the new virtualhost and visit each one of your NodeBalancer CNAMEs in turn. The ones which aren’t local to you should redirect you out to your nearest server.

Pretty neat! The last step is to add a user-facing A record, I used mirror.mycdn.com, and set it up to DNS-RR (Round-Robin) the addresses of the three NodeBalancers. Now Set up a cron job to rsync your content to the three target VPSes, or a script to push content on-demand. Job done!

For extra points:

  1. Clone another VPS behind each NodeBalancer so that each continent is fault tolerant, meaning you can reboot one VPS in each pair without losing continental service.
  2. Explore whether it’s safe to add the public IP of one Nodebalancer to the Host configuration of a NodeBalancer on another continent, effectively making a resilient loop.

SVN Server Integration with HTTPS, Active Directory, PAM & Winbind

Subversion on a whiteboard
Image CC by johntrainor
In this post I’d like to explain how it’s possible to integrate SVN (Subversion) source control using WebDAV and HTTPS using Apache and Active Directory to provide authentication and access control.

It’s generally accepted that SVN over WebDAV/HTTPS  provides finer granulation security controls than SVN+SSH. The problem is that SVN+SSH is really easy to set up, requiring knowledge of svnadmin and the filesystem and very little else but WebDAV+HTTPS requires knowledge of Apache and its modules relating to WebDAV, authentication and authorisation which is quite a lot more to ask. Add to that authenticating to AD and you have yourself a lovely string of delicate single point of failure components. Ho-hum, not a huge amount you can do about that but at least the Apache components are pretty robust.

For this article I’m using CentOS but everything should be transferrable to any distribution with a little tweakage.

Repository Creation

Firstly then, pick a disk or volume with plenty of space, we’re using make your repository – same as you would for svn+ssh:

svnadmin create /var/svn/repos

Apache Modules

Install the prerequisite Apache modules:

yum install mod_dav_svn

This should also install mod_authz_svn which we’ll also be making use of. Both should end up in Apache’s module directory, in this case /etc/httpd/modules/

Download and install mod_authnz_external from its Google Code page. This allows Apache basic authentication to hook into an external authentication mechanism. mod_authnz_external.so should end up in Apache’s module directory but in my case it ended up in its default location of /usr/lib/httpd/modules/.

Download and install the companion pwauth utility from its Google Code page. In my case it installs to /usr/local/sbin/pwauth and needs suexec permissions (granted using chmod +s).

Apache Configuration (HTTP)

ServerName svn.example.com
ServerAdmin me@example.com

Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/mod_setenvif.so
LoadModule mime_module		modules/mod_mime.so
LoadModule log_config_module	modules/mod_log_config.so
LoadModule dav_module		modules/mod_dav.so
LoadModule dav_svn_module	modules/mod_dav_svn.so
LoadModule auth_basic_module    modules/mod_auth_basic.so
LoadModule authz_svn_module	modules/mod_authz_svn.so
LoadModule authnz_external_module modules/mod_authnz_external.so

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>
	ServerName	svn.example.com

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	<location / >
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

		<limitexcept GET PROPFIND OPTIONS REPORT>
			Require valid-user
		</limitexcept>
	</location>
</virtualhost>

Network Time (NTP)

In order to join a Windows domain, accurate and synchronised time is crucial, so you’ll need to be running NTPd.

yum install ntp
chkconfig ntpd on
ntpdate ntp.ubuntu.com
service ntpd start

Samba Configuration

Here’s where AD comes in and in my experience this is by far the most unreliable service. Install and configure samba:

yum install samba
chkconfig winbind on

Edit your /etc/samba/smb.conf to pull information from AD.

[global]
	workgroup = EXAMPLE
	realm = EXAMPLE.COM
	security = ADS
	allow trusted domains = No
	use kerberos keytab = Yes
	log level = 3
	log file = /var/log/samba/%m
	max log size = 50
	printcap name = cups
	idmap backend = idmap_rid:EXAMPLE=600-20000
	idmap uid = 600-20000
	idmap gid = 600-20000
	template shell = /bin/bash
	winbind enum users = Yes
	winbind enum groups = Yes
	winbind use default domain = Yes
	winbind offline logon = yes

Join the machine to the domain – you’ll need an account with domain admin credentials to do this:

net ads join -U administrator

Check the join is behaving ok:

[root@svn conf]# net ads info
LDAP server: 192.168.100.10
LDAP server name: ad00.example.com
Realm: EXAMPLE.COM
Bind Path: dc=EXAMPLE,dc=COM
LDAP port: 389
Server time: Tue, 15 May 2012 22:44:34 BST
KDC server: 192.168.100.10
Server time offset: 130

(Re)start winbind to pick up the new configuration:

service winbind restart

PAM & nsswitch.conf

PAM needs to know where to pull its information from, so we tell it about the new winbind service in /etc/pam.d/system-auth.

#%PAM-1.0
# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth        required      pam_env.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 500 quiet
auth        sufficient    pam_winbind.so try_first_pass
auth        required      pam_deny.so

account     required      pam_unix.so broken_shadow
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 500 quiet
account     [default=bad success=ok user_unknown=ignore] pam_winbind.so
account     required      pam_permit.so

password    requisite     pam_cracklib.so try_first_pass retry=3
password    sufficient    pam_unix.so md5 shadow nullok try_first_pass use_authtok
password    sufficient    pam_winbind.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      /lib/security/pam_mkhomedir.so 
session     required      pam_unix.so
session     optional      pam_winbind.so

YMMV with PAM. It can take quite a lot of fiddling around to make it work perfectly. This obviously has an extremely close correlation to how flaky users find the authentication service. If you’re running on 64-bit you may find you need to install 64-bit versions of pam modules, e.g. mkhomedir which aren’t installed by default.

We also modify nsswitch.conf to tell other, non-pam aspects of the system where to pull information from:

passwd:     files winbind
shadow:     files winbind
group:      files winbind

To check the authentication information is coming back correctly you can use wbinfo but I like seeing data by using getent group or getent passwd. The output of these two commands will contain domain accounts if things are working correctly and only local system accounts otherwise.

External Authentication

We’re actually going to use system accounts for authentication. To stop people continuing to use svn+ssh (and thus bypassing the authorisation controls) we edit /etc/ssh/sshd_config and use AllowUsers or AllowGroups and specify all permitted users. Using AllowGroups will also provide AD group control of permitted logins but as the list is small it’s probably overkill. My sshd_config list looks a lot like this:

AllowUsers	root rmp contractor itadmin

To test external authentication run /usr/local/sbin/pwauth as below. “yay” should be displayed if things are working ok. Note the password here is displayed in clear-text:

[root@svn conf]# pwauth && echo 'yay' || echo 'nay'
rmp
mypassword

Access Controls

/etc/httpd/authz_svn.conf is the only part which should require any modifications over time – the access controls specify who is allowed to read and/or write to each svn project, in fact as everything’s a URL now you can arbitrarily restrict subfolders of projects too but that’s a little OTT. It can be arbitrarily extended and can take local and active directory usernames. I’m sure mod_authz_svn has full documentation about what you can and can’t put in here.

#
# Allow anonymous read access to everything by default.
#
[/]
* = r
rmp = rw

[/myproject]
rmp = rw
bob = rw

...

SSL

So far that’s all the basic components. The last piece in the puzzle is enabling SSL for Apache. I use the following /etc/httpd/httpd.conf:

ServerName svn.example.com
ServerAdmin me@example.com

Listen		*:80
NameVirtualHost *:80

User		nobody
Group		nobody

LoadModule setenvif_module	modules/mod_setenvif.so
LoadModule mime_module		modules/mod_mime.so
LoadModule log_config_module	modules/mod_log_config.so
LoadModule proxy_module		modules/mod_proxy.so
LoadModule proxy_http_module	modules/mod_proxy_http.so
LoadModule rewrite_module	modules/mod_rewrite.so
LoadModule dav_module		modules/mod_dav.so
LoadModule dav_svn_module	modules/mod_dav_svn.so
LoadModule auth_basic_module    modules/mod_auth_basic.so
LoadModule authz_svn_module	modules/mod_authz_svn.so
LoadModule ssl_module		modules/mod_ssl.so
LoadModule authnz_external_module modules/mod_authnz_external.so

Include conf.d/ssl.conf

LogFormat	"%v %A:%p %h %l %u %{%Y-%m-%d %H:%M:%S}t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" clean
CustomLog	/var/log/httpd/access_log	clean

<virtualhost *:80>
	ServerName		svn.example.com

	Rewrite		/	https://svn.example.com/	[R=permanent,L]
</virtualhost>

<virtualhost *:443>
	ServerName	svn.example.com

	AddExternalAuth         pwauth  /usr/local/sbin/pwauth
	SetExternalAuthMethod   pwauth  pipe

	SSLEngine on
	SSLProtocol all -SSLv2

	SSLCipherSuite		ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
	SSLCertificateFile	/etc/httpd/conf/svn.crt
	SSLCertificateKeyFile	/etc/httpd/conf/svn.key

	<location />
		DAV			svn
		SVNPath			/var/svn/repos
		AuthType		Basic
		AuthName		"SVN Repository"
		AuthzSVNAccessFile	/etc/httpd/conf/authz_svn.acl
		AuthBasicProvider	external
		AuthExternal		pwauth
		Satisfy			Any

		<limitexcept GET PROPFIND OPTIONS REPORT>
			Require valid-user
		</limitexcept>
	
</virtualhost>

/etc/httpd/conf.d/ssl.conf is pretty much the unmodified distribution ssl.conf and looks like this:

LoadModule ssl_module modules/mod_ssl.so

Listen 443

AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl

SSLPassPhraseDialog  builtin

SSLSessionCache         shmcb:/var/cache/mod_ssl/scache(512000)
SSLSessionCacheTimeout  300

SSLMutex default

SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin

SSLCryptoDevice builtin

SetEnvIf User-Agent ".*MSIE.*" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

You’ll need to build yourself a certificate, self-signed if necessary, but that’s a whole other post. I recommend searching the web for “openssl self signed certificate” and you should find what you need. The above httpd.conf references the key and certificate under /etc/httpd/conf/svn.key and /etc/httpd/conf/svn.crt respectively.

The mod_authnz_external+pwauth combination can be avoided if you can persuade mod_authz_ldap to play nicely. There are a few different ldap modules around on the intertubes and after a lot of trial and even more error I couldn’t make any of them work reliably if at all.

And if all this leaves you feeling pretty nauseous it’s quite natural. To remedy this, go use git instead.

Bookmarks for October 27th through November 15th

These are my links for October 27th through November 15th:

Bookmarks for October 7th through October 13th

These are my links for October 7th through October 13th:

Apache Forward-Proxy REMOTE_ADDR propagation

I had an interesting problem this morning with the Apache forward-proxy supporting the WTSI sequencing farm.

It would be useful for the intranet service for tracking runs to know which (GA2) sequencer is requesting pages but because they’re on a dedicated subnet they have to use a forward-proxy for fetching pages (and then only from intranet services).

Now I’m very familiar using the X-Forwarded-For header and HTTP_X_FORWARDED_FOR environment variable (and their friends) which do something very similar for reverse-proxies but forward-proxies usually want to disguise the fact there’s an arbitrary number of clients behind them, usually with irrelevant RFC1918 private IP addresses too.

So what I want to do is slightly unusual – take the remote_addr of the client and stuff it into a different header. I could use X-Forwarded-For but it doesn’t feel right. Proxy-Via is also not right here as that’s really for the proxy servers themselves. So, I figured mod_headers on the proxy would allow me to add additional headers to the request, even though it’s forwarded on. Also following a tip I saw here using my favourite mod_rewrite and after a bit of fiddling I can up with this:

#########
# copy remote addr to an internal variable
#
RewriteEngine  On
RewriteCond  %{REMOTE_ADDR}  (.*)
RewriteRule   .*  -  [E=SEQ_ADDR:%1]

#########
# set X-Sequencer header from the internal variable
#
RequestHeader  set  X-Sequencer  %{SEQ_ADDR}

These rules sit in the container managing my proxy, after ProxyRequests and ProxyVia and before a small set of ProxyMatch restrictions.

The RewriteCond traps the contents of the REMOTE_ADDR environment variable (it’s not an HTTP header – it comes from the end of the network socket as determined by the server). The RewriteRule unconditionally copies the last RewriteCond match %1 into a new environment variable SEQ_ADDR. After this mod_headers sets the X-Sequencer request header (for the proxied request) to the value of the SEQ_ADDR environment variable.

This works very nicely though I’d have hoped a more elegant solution would be this:

RequestHeader set X-Sequencer %{REMOTE_ADDR}

but this doesn’t seem to work and I’m not sure why. Anyway, by comparing $ENV{HTTP_X_SEQUENCER} to a shared lookup table, the sequencing apps running on the intranet can now track which sequencer is making requests. Yay!