Content Delivery Network (CDN) using Linode VPS

This month one of the neat things I’ve done was to set up a small content delivery network (CDN) for speedy downloading of files across the globe. For one reason and another (mostly the difficulty in doing this purely with DNS and the desire not to use AWS), I opted to do this using my favourite VPS provider, Linode. All in all (and give or take DNS propagation time) I reckon it’s possible to deploy a multi-site CDN in under 30 minutes given a bit of practice. Not too shabby!

For this recipe you will need:

  1. Linode account
  2. A domain name and DNS management

What you’ll end up with:

  1. 3x Ubuntu 12.04 LTS VPS, one each in London, Tokyo and California
  2. 3x NodeBalancers, one each in London, Tokyo and California
  3. 1x user-facing general web address
  4. 3x continent-facing web addresses

I’m going to use “mycdn.com” wherever I refer to my DNS / domain. You should substitute your domain name wherever you see it.

So, firstly log in to Linode.

Create three new Linode 1024 small VPSes (or whatever size you think you’ll need). I set mine up as Ubuntu 12.04 LTS with 512MB swap but otherwise nothing special. Set one each to be in London, Tokyo and Fremont. Set the root password on each. Under “Settings”, give each VPS a label. I called mine vps-<city>-01. Under “Remote Settings”, give each a private IP and note them down together with the VPS/data centre they’re in.

At this point it’s also useful (but not strictly necessary) to give each node a DNS CNAME for its external IP address, just so you can log in to them easily by name later.

Boot all three machines and check you can login to them. I find it useful here to do an

apt-get update ; apt-get dist-upgrade.

You can also now install Apache and mod_geoip on each node:

apt-get install apache2 libapache2-mod-geoip
a2enmod include
a2enmod rewrite

You should now be able to bring up a web browser on each VPS (public IP or CNAME) in turn and see the default Apache “It works!” page.

Ok.. still with me? Next we’ll ask Linode to fire up three NodeBalancers, again one in each of the data centres, for each VPS. I labelled mine cdn-lb-<city>-01. Each one can be configured with a Port, 80 with, for now, the default settings. Add a host to each NodeBalancer with the private IP of each VPS, and the port, e.g. 192.168.128.123:80 . Note that each VPS hasn’t yet been configured to listen on those interfaces so each NodeBalancer won’t recognise its host as being up.

Ok. Let’s fix those private interfaces. SSH into each VPS using the root account and the password you set earlier. Edit /etc/network/interfaces and add:

auto eth0:1
iface eth0:1 inet static
	address <VPS private address here>
	netmask <VPS private netmask here>

Note that your private netmask is very unlikely to be 255.255.255.0 (probably) like your home network and yes, this does make a difference. Once that configuration is in, you can:

ifup eth0:1

Now we can add DNS CNAMEs for each NodeBalancer. Take the public IP for each NodeBalancer over to your DNS manager and add a meaningful CNAME for each one. I used continental regions americasapaceurope, but you might prefer to be more specific than that (e.g. us-westeu-west, …). Once the DNS propagates you should be able to see each of your Apache “It works!” pages again in your browser, but this time the traffic is running through the NodeBalancer (you might need to wait a few seconds before the NodeBalancer notices the VPS is now up).

Ok so let’s take stock. We have three VPS, each with a NodeBalancer and each running a web server. We could stop here and just present a homepage to each user telling them to manually select their local mirror – and some sites do that, but we can do a bit better.

Earlier we installed libapache2-mod-geoip. This includes a (free) database from MaxMind which maps IP address blocks to the continents they’re allocated to (via the ISP who’s bought them). The Apache module takes the database and sets a series of environment variables for each and every visitor IP. We can use this to have a good guess at roughly where a visitor is and bounce them out to the nearest of our NodeBalancers – magic!

So, let’s poke the Apache configuration a bit. rm /etc/apache2/sites-enabled/000-default. Create a new file /etc/apache2/sites-available/mirror.mycdn.com and give it the following contents:

<VirtualHost>
	ServerName mirror.mycdn.com
	ServerAlias *.mycdn.com
	ServerAdmin webmaster@mycdn.com

	DocumentRoot /mirror/htdocs

	DirectoryIndex index.shtml index.html

	GeoIPEnable     On
	GeoIPScanProxyHeaders     On

	RewriteEngine     On

	RewriteCond %{HTTP_HOST} !americas.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} NA|SA
	RewriteRule (.*) http://americas.mycdn.com$1 [R=permanent,L]

	RewriteCond %{HTTP_HOST} !apac.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} AS|OC
	RewriteRule (.*) http://apac.mycdn.com$1 [R=permanent,L]

	RewriteCond %{HTTP_HOST} !europe.mycdn.com
	RewriteCond %{ENV:GEOIP_CONTINENT_CODE} EU|AF
	RewriteRule (.*) http://europe.mycdn.com$1 [R=permanent,L]

	<Directory />
		Order deny,allow
		Deny from all
		Options None
	</Directory>

	<Directory /mirror/htdocs>
		Order allow,deny
		Allow from all
		Options IncludesNoExec
	</Directory>
</VirtualHost>

Now ln -s /etc/apache2/sites-available/mirror.mycdn.com /etc/apache2/sites-enabled/ .

mkdir -p /mirror/htdocs to make your new document root and add a file called index.shtml there. The contents should look something like:

<html>
 <body>
  <h1>MyCDN Test Page</h1>
  <h2><!--#echo var="HTTP_HOST" --></h2>
<!--#set var="mirror_eu"       value="http://europe.mycdn.com/" -->
<!--#set var="mirror_apac"     value="http://apac.mycdn.com/" -->
<!--#set var="mirror_americas" value="http://americas.mycdn.com/" -->

<!--#if expr="${GEOIP_CONTINENT_CODE} == AF"-->
 <!--#set var="continent" value="Africa"-->
 <!--#set var="mirror" value="${mirror_eu}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == AS"-->
 <!--#set var="continent" value="Asia"-->
 <!--#set var="mirror" value="${mirror_apac}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == EU"-->
 <!--#set var="continent" value="Europe"-->
 <!--#set var="mirror" value="${mirror_eu}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == NA"-->
 <!--#set var="continent" value="North America"-->
 <!--#set var="mirror" value="${mirror_americas}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == OC"-->
 <!--#set var="continent" value="Oceania"-->
 <!--#set var="mirror" value="${mirror_apac}"-->

<!--#elif expr="${GEOIP_CONTINENT_CODE} == SA"-->
 <!--#set var="continent" value="South America"-->
 <!--#set var="mirror" value="${mirror_americas}"-->
<!--#endif -->
<!--#if expr="${GEOIP_CONTINENT_CODE}"-->
 <p>
  You appear to be in <!--#echo var="continent"-->.
  Your nearest mirror is <a href="<!--#echo var="mirror" -->"><!--#echo var="mirror" --></a>.
 </p>
 <p>
  Or choose from one of the following:
 </p>
<!--#else -->
 <p>
  Please choose your nearest mirror:
 </p>
<!--#endif -->

<ul>
 <li><a href="<!--#echo var="mirror_eu"       -->"><!--#echo var="mirror_eu"        --></a> Europe (London)</a></li>
 <li><a href="<!--#echo var="mirror_apac"     -->"><!--#echo var="mirror_apac"      --></a> Asia/Pacific (Tokyo)</a></li>
 <li><a href="<!--#echo var="mirror_americas" -->"><!--#echo var="mirror_americas"  --></a> USA (Fremont, CA)</a></li>
</ul>

<pre style="color:#ccc;font-size:smaller">
http-x-forwarded-for=<!--#echo var="HTTP_X_FORWARDED_FOR" -->
GEOIP_CONTINENT_CODE=<!--#echo var="GEOIP_CONTINENT_CODE" -->
</pre>
 </body>
</html>

Then apachectl restart to pick up the new virtualhost and visit each one of your NodeBalancer CNAMEs in turn. The ones which aren’t local to you should redirect you out to your nearest server.

Pretty neat! The last step is to add a user-facing A record, I used mirror.mycdn.com, and set it up to DNS-RR (Round-Robin) the addresses of the three NodeBalancers. Now Set up a cron job to rsync your content to the three target VPSes, or a script to push content on-demand. Job done!

For extra points:

  1. Clone another VPS behind each NodeBalancer so that each continent is fault tolerant, meaning you can reboot one VPS in each pair without losing continental service.
  2. Explore whether it’s safe to add the public IP of one Nodebalancer to the Host configuration of a NodeBalancer on another continent, effectively making a resilient loop.

Automated basic Xen snapshots

With a little bit of shell-scripting I made this. It runs on a dom0 and scans through virtual machines running on a cluster, requesting snapshots of each one. You may want to use vm-snapshot-with-quiesce if it’s supported by xen-tools on your domU machines.

for i in `xe vm-list params=name-label | grep name | awk '{print $NF}' | xargs echo`; do echo $i; xe vm-snapshot vm=$i new-name-label="$i-`date +'%Y-%m-%dT%H:%M:%S'`"; done

If you cron the above script every day then after a few days you may want to start deleting the oldest snapshots:

for i in `xe vm-list params=name-label | grep name | awk '{print $NF}'`; do snapshot=`xe vm-list name-label=$i params=snapshots | awk '{print $NF}'`; echo "$i earliest snapshot is $snapshot"; if [ "$snapshot" ]; then xe vm-uninstall vm=$snapshot force=true; fi; done

What does Technology Monoculture really cost for SME?

http://www.flickr.com/photos/ndrwfgg/140859675/sizes/m/

Open, open, open. Yes, I sound like a stuck record but every time I hit this one it makes me really angry.

I regularly source equipment and software for small-medium enterprises, SMEs. Usually these are charities and obviously they want to save as much money as they can with their hardware and software costs. Second-hand hardware is usually order of the day. PCs around 3-years old are pretty easy to obtain and will usually run most current software.

But what about that software? On the surface the answer seems simple: To lower costs use free or Open Source software (OSS). The argument for Linux, OpenOffice and other groupware applications is pretty compelling. So what does it really mean on the ground?

Let’s take our example office:
Three PCs called “office1”, “office2” and “finance” connected together using powerline networking. There’s an ADSL broadband router which provides wireless for three laptops and also a small NAS with RAID1 for backups and shared files.

Okay, now the fun starts. The office has grown “organically” over the last 10 years. The current state is that Office1 runs XP 64-bit; Office2 runs Vista Ultimate and the once-per-week-use “finance” runs Windows 2000 for Sage and a Gift Aid returns package. All three use Windows Backup weekly to the NAS. Office1 & Office2 use Microsoft Office 2007. Office1 uses Exchange for mail and calendars, Office2 uses Windows Mail and Palm Desktop. Both RDP and VNC are also used to manage all machines.

So, what happens now is that the Gift Aid package is retired and the upgrade is to use web access but can’t run on MSIE 6. Okay. Upgrade to MSIE 8. Nope – won’t run on Win2k. How about MSIE 7? Nope, can’t download that any more (good!). Right, then an operating system upgrade is in order.

What do I use? Ubuntu of course. Well, is it that easy? I need to support the (probably antique) version of Sage Accounts on there. So how about Windows XP? Hmm – XP is looking a bit long in the tooth now. Vista? You must be joking – train-wreck! So Windows 7 is the only option. Can’t use Home Premium because it doesn’t support RDP without hacking it. So I’m forced to use Win 7 Pro. That’s £105 for the OEM version or £150 for the “full” version. All that and I’ll probably still have to upgrade Sage, AND the finance machine is only used once a week. What the hell?

Back to the drawing-board.

What else provides RDP? Most virtualisation systems do – Xen, virtualbox and the like. I use Virtualbox quite a lot and it comes with a great RDP service built in for whatever virtual machine is running. Cool – so I can virtualise the win2k instance using something like the VMWare P2V converter and upgrade the hardware and it’ll run everything, just faster (assuming the P2V works ok)…

No, wait – that still doesn’t upgrade the browser for the Gift Aid access. Ok, I could create a new WinXP virtual machine – that’s more recent than Win2k and bound to be cheaper – because Virtualbox gives me RDP I don’t need the professional version, “xp home” would do, as much as it makes me cringe. How much does that cost? Hell, about £75 for the OEM version. What??? For an O/S that’ll be retired in a couple of years? You have to be kidding! And I repeat, Vista is not an option, it’s a bad joke.

I’m fed up with this crap!

Okay, options, options, I need options. Virtualise the existing Win2k machine for Sage and leave the Ubuntu Firefox web browser installation for the updated Gift Aid. Reckon that’ll work? It’ll leave the poor techno-weenie guy who does the finances with a faster PC which is technically capable of doing everything he needs but with an unfamiliar interface.

If I were feeling particularly clever I could put Firefox on the Win2k VM, make the VM start on boot using VBoxHeadless; configure Ubuntu to auto-login and add a Win2k-VM-RDP session as a startup item for the auto-login user. Not a bad solution but pretty hacky, even for my standards (plus it would need to shut-down the domain0 host when the VM shuts down).

All this and it’s still only for one of the PCs. You know what I’d like to do? Virtualise everything and stick them all on a central server. Then replace all the desktop machines with thin clients and auto-login-RDP settings. There’s a lot to be said for that – centralised backups, VM snapshotting, simplified (one-off-cost) hardware investment, but again there’s a caveat – I don’t think that I’d want to do that over powerline networking. I’d say a minimum requirement of 100MBps Ethernet, so networking infrastructure required, together with the new server. *sigh*.

I bet you’re thinking what has all this got to do with technology monoculture? Well, imagine the same setup without any Microsoft involved.

All the same existing hardware, Ubuntu on each, OpenOffice, Evolution Mail & Calendar or something like Egroupware perhaps or even Google Apps (docs/calendar/mail etc. – though that’s another rant for another day). No need for much in the way of hardware upgrades. No need for anything special in the way of networking. Virtualise anything which absolutely has to be kept, e.g. Sage, without enforcing a change to the Linux version.

I don’t know what the answer is. What I do know is that I don’t want to spend up to £450 (or whatever it adds up to for upgrade or OEM versions) just to move three PCs to Windows 7. Then again with Windows 8, 9, 10, 2020 FOREVER. It turns out you simply cannot do Microsoft on a shoestring. Once you buy in you’re stuck and people like Microsoft (and they’re not the only ones) have a license to print money, straight out of your pocket into their coffers.

Of course that’s not news to me, and it’s probably not news to you, but if you’re in a SME office like this and willing to embrace a change to OSS you can save hundreds if not thousands of pounds for pointless, unnecessary software. Obviously the bigger your working environment is, the quicker these costs escalate. The sooner you make the change, the sooner you start reducing costs.

Remind me to write about the state of IT in the UK education system some time. It’s like lighting a vast bonfire made of cash, only worse side-effects.

Exa-, Peta-, Tera-scale Informatics: Are *YOU* in the cloud yet?

http://www.flickr.com/photos/pagedooley/2511369048/

One of the aspects of my job over the last few years, both at Sanger and now at Oxford Nanopore Technologies has been the management of tera-, verging on peta- scale data on a daily basis.

Various methods of handling filesystems this large have been around for a while now and I won’t go into them here. Building these filesystems is actually fairly straightforward as most of them are implemented as regular, repeatable units – great for horizontal scale-out.

No, what makes this a difficult problem isn’t the sheer volume of data, it’s the amount of churn. Churn can be defined as the rate at which new files are added and old files are removed.

To illustrate – when I left Sanger, if memory serves, we were generally recording around a terabyte of new data a day. The staging area there was around 0.5 Petabytes (using the Lustre filesystem) but didn’t balance correctly across the many disks. This meant we had to keep the utilised space below around 90% for fear of filling up an individual storage unit (and leading to unexpected errors). Ok, so that’s 450TB. That left 45 days of storage – one and a half months assuming no slack.

Fair enough. Sort of. collect the data onto the staging area, analyse it there and shift it off. Well, that’s easier said than done – you can shift it off onto slower, cheaper storage but that’s generally archival space so ideally you only keep raw data. If the raw data are too big then you keep the primary analysis and ditch the raw. But there’s a problem with that:

  • lots of clever people want to squeeze as much interesting stuff out of the raw data as possible using new algorithms.
  • They also keep finding things wrong with the primary analyses and so want to go back and reanalyse.
  • Added to that there are often problems with the primary analysis pipeline (bleeding-edge software bugs etc.).
  • That’s not mentioning the fact that nobody ever wants to delete anything

As there’s little or no slack in the system, very often people are too busy to look at their own data as soon as it’s analysed so it might sit there broken for a week or four. What happens then is there’s a scrum for compute-resources so they can analyse everything before the remaining 2-weeks of staging storage is up. Then even if there are problems found it can be too late to go back and reanalyse because there’s a shortage of space for new runs and stopping the instruments running because you’re out of space is a definite no-no!

What the heck? Organisationally this isn’t cool at all. Situations like this are only going to worsen! The technologies are improving all the time – run-times are increasing, read-lengths are increasing, base-quality is increasing, analysis is becoming better and more instruments are becoming available to more people who are using them for more things. That’s a many, many-fold increase in storage requirements.

So how to fix it? Well I can think of at least one pretty good way. Don’t invest in on-site long-term staging- or scratch-storage. If you’re worried by all means sort out an awesome backup system but nearline it or offline to a decent tape archive or something and absolutely do not allow user-access. Instead of long-term staging storage buy your company the fattest Internet pipe it can handle. Invest in connectivity, then simply invest in cloud storage. There are enough providers out there now to make this a competitive and interesting marketplace with opportunities for economies of scale.

What does this give you? Well, many benefits – here are a few:

  • virtually unlimited storage
  • only pay for what you use
  • accountable costs – know exactly how much each project needs to invest
  • managed by storage experts
  • flexible computing attached to storage on-demand
  • no additional power overheads
  • no additional space overheads

Most of those I more-or-less take for granted these days. The one I find interesting at the moment is the costing issue. It can be pretty hard to hold one centralised storage area accountable for different groups – they’ll often pitch in for proportion of the whole based on their estimated use compared to everyone else. With accountable storage offered by the cloud each group can manage and pay for their own space. The costs are transparent to them and the responsibility has been delegated away from central management. I think that’s an extremely attractive prospect!

The biggest argument I hear against cloud storage & computing is that your top secret, private data is in someone else’s hands. Aside from my general dislike of secret data, these days I still don’t believe this is a good argument. There are enough methods for handling encryption and private networking that this pretty-much becomes a non-issue. Encrypt the data on-site, store the keys in your own internal database, ship the data to the cloud and when you need to run analysis fetch the appropriate keys over an encrypted link, decode the data on demand, re-encrypt the results and ship them back. Sure the encryption overheads add expense to the operation but I think the costs are far outweighed.