What does Technology Monoculture really cost for SME?

http://www.flickr.com/photos/ndrwfgg/140859675/sizes/m/

Open, open, open. Yes, I sound like a stuck record but every time I hit this one it makes me really angry.

I regularly source equipment and software for small-medium enterprises, SMEs. Usually these are charities and obviously they want to save as much money as they can with their hardware and software costs. Second-hand hardware is usually order of the day. PCs around 3-years old are pretty easy to obtain and will usually run most current software.

But what about that software? On the surface the answer seems simple: To lower costs use free or Open Source software (OSS). The argument for Linux, OpenOffice and other groupware applications is pretty compelling. So what does it really mean on the ground?

Let’s take our example office:
Three PCs called “office1”, “office2” and “finance” connected together using powerline networking. There’s an ADSL broadband router which provides wireless for three laptops and also a small NAS with RAID1 for backups and shared files.

Okay, now the fun starts. The office has grown “organically” over the last 10 years. The current state is that Office1 runs XP 64-bit; Office2 runs Vista Ultimate and the once-per-week-use “finance” runs Windows 2000 for Sage and a Gift Aid returns package. All three use Windows Backup weekly to the NAS. Office1 & Office2 use Microsoft Office 2007. Office1 uses Exchange for mail and calendars, Office2 uses Windows Mail and Palm Desktop. Both RDP and VNC are also used to manage all machines.

So, what happens now is that the Gift Aid package is retired and the upgrade is to use web access but can’t run on MSIE 6. Okay. Upgrade to MSIE 8. Nope – won’t run on Win2k. How about MSIE 7? Nope, can’t download that any more (good!). Right, then an operating system upgrade is in order.

What do I use? Ubuntu of course. Well, is it that easy? I need to support the (probably antique) version of Sage Accounts on there. So how about Windows XP? Hmm – XP is looking a bit long in the tooth now. Vista? You must be joking – train-wreck! So Windows 7 is the only option. Can’t use Home Premium because it doesn’t support RDP without hacking it. So I’m forced to use Win 7 Pro. That’s £105 for the OEM version or £150 for the “full” version. All that and I’ll probably still have to upgrade Sage, AND the finance machine is only used once a week. What the hell?

Back to the drawing-board.

What else provides RDP? Most virtualisation systems do – Xen, virtualbox and the like. I use Virtualbox quite a lot and it comes with a great RDP service built in for whatever virtual machine is running. Cool – so I can virtualise the win2k instance using something like the VMWare P2V converter and upgrade the hardware and it’ll run everything, just faster (assuming the P2V works ok)…

No, wait – that still doesn’t upgrade the browser for the Gift Aid access. Ok, I could create a new WinXP virtual machine – that’s more recent than Win2k and bound to be cheaper – because Virtualbox gives me RDP I don’t need the professional version, “xp home” would do, as much as it makes me cringe. How much does that cost? Hell, about £75 for the OEM version. What??? For an O/S that’ll be retired in a couple of years? You have to be kidding! And I repeat, Vista is not an option, it’s a bad joke.

I’m fed up with this crap!

Okay, options, options, I need options. Virtualise the existing Win2k machine for Sage and leave the Ubuntu Firefox web browser installation for the updated Gift Aid. Reckon that’ll work? It’ll leave the poor techno-weenie guy who does the finances with a faster PC which is technically capable of doing everything he needs but with an unfamiliar interface.

If I were feeling particularly clever I could put Firefox on the Win2k VM, make the VM start on boot using VBoxHeadless; configure Ubuntu to auto-login and add a Win2k-VM-RDP session as a startup item for the auto-login user. Not a bad solution but pretty hacky, even for my standards (plus it would need to shut-down the domain0 host when the VM shuts down).

All this and it’s still only for one of the PCs. You know what I’d like to do? Virtualise everything and stick them all on a central server. Then replace all the desktop machines with thin clients and auto-login-RDP settings. There’s a lot to be said for that – centralised backups, VM snapshotting, simplified (one-off-cost) hardware investment, but again there’s a caveat – I don’t think that I’d want to do that over powerline networking. I’d say a minimum requirement of 100MBps Ethernet, so networking infrastructure required, together with the new server. *sigh*.

I bet you’re thinking what has all this got to do with technology monoculture? Well, imagine the same setup without any Microsoft involved.

All the same existing hardware, Ubuntu on each, OpenOffice, Evolution Mail & Calendar or something like Egroupware perhaps or even Google Apps (docs/calendar/mail etc. – though that’s another rant for another day). No need for much in the way of hardware upgrades. No need for anything special in the way of networking. Virtualise anything which absolutely has to be kept, e.g. Sage, without enforcing a change to the Linux version.

I don’t know what the answer is. What I do know is that I don’t want to spend up to £450 (or whatever it adds up to for upgrade or OEM versions) just to move three PCs to Windows 7. Then again with Windows 8, 9, 10, 2020 FOREVER. It turns out you simply cannot do Microsoft on a shoestring. Once you buy in you’re stuck and people like Microsoft (and they’re not the only ones) have a license to print money, straight out of your pocket into their coffers.

Of course that’s not news to me, and it’s probably not news to you, but if you’re in a SME office like this and willing to embrace a change to OSS you can save hundreds if not thousands of pounds for pointless, unnecessary software. Obviously the bigger your working environment is, the quicker these costs escalate. The sooner you make the change, the sooner you start reducing costs.

Remind me to write about the state of IT in the UK education system some time. It’s like lighting a vast bonfire made of cash, only worse side-effects.

Exa-, Peta-, Tera-scale Informatics: Are *YOU* in the cloud yet?

http://www.flickr.com/photos/pagedooley/2511369048/

One of the aspects of my job over the last few years, both at Sanger and now at Oxford Nanopore Technologies has been the management of tera-, verging on peta- scale data on a daily basis.

Various methods of handling filesystems this large have been around for a while now and I won’t go into them here. Building these filesystems is actually fairly straightforward as most of them are implemented as regular, repeatable units – great for horizontal scale-out.

No, what makes this a difficult problem isn’t the sheer volume of data, it’s the amount of churn. Churn can be defined as the rate at which new files are added and old files are removed.

To illustrate – when I left Sanger, if memory serves, we were generally recording around a terabyte of new data a day. The staging area there was around 0.5 Petabytes (using the Lustre filesystem) but didn’t balance correctly across the many disks. This meant we had to keep the utilised space below around 90% for fear of filling up an individual storage unit (and leading to unexpected errors). Ok, so that’s 450TB. That left 45 days of storage – one and a half months assuming no slack.

Fair enough. Sort of. collect the data onto the staging area, analyse it there and shift it off. Well, that’s easier said than done – you can shift it off onto slower, cheaper storage but that’s generally archival space so ideally you only keep raw data. If the raw data are too big then you keep the primary analysis and ditch the raw. But there’s a problem with that:

  • lots of clever people want to squeeze as much interesting stuff out of the raw data as possible using new algorithms.
  • They also keep finding things wrong with the primary analyses and so want to go back and reanalyse.
  • Added to that there are often problems with the primary analysis pipeline (bleeding-edge software bugs etc.).
  • That’s not mentioning the fact that nobody ever wants to delete anything

As there’s little or no slack in the system, very often people are too busy to look at their own data as soon as it’s analysed so it might sit there broken for a week or four. What happens then is there’s a scrum for compute-resources so they can analyse everything before the remaining 2-weeks of staging storage is up. Then even if there are problems found it can be too late to go back and reanalyse because there’s a shortage of space for new runs and stopping the instruments running because you’re out of space is a definite no-no!

What the heck? Organisationally this isn’t cool at all. Situations like this are only going to worsen! The technologies are improving all the time – run-times are increasing, read-lengths are increasing, base-quality is increasing, analysis is becoming better and more instruments are becoming available to more people who are using them for more things. That’s a many, many-fold increase in storage requirements.

So how to fix it? Well I can think of at least one pretty good way. Don’t invest in on-site long-term staging- or scratch-storage. If you’re worried by all means sort out an awesome backup system but nearline it or offline to a decent tape archive or something and absolutely do not allow user-access. Instead of long-term staging storage buy your company the fattest Internet pipe it can handle. Invest in connectivity, then simply invest in cloud storage. There are enough providers out there now to make this a competitive and interesting marketplace with opportunities for economies of scale.

What does this give you? Well, many benefits – here are a few:

  • virtually unlimited storage
  • only pay for what you use
  • accountable costs – know exactly how much each project needs to invest
  • managed by storage experts
  • flexible computing attached to storage on-demand
  • no additional power overheads
  • no additional space overheads

Most of those I more-or-less take for granted these days. The one I find interesting at the moment is the costing issue. It can be pretty hard to hold one centralised storage area accountable for different groups – they’ll often pitch in for proportion of the whole based on their estimated use compared to everyone else. With accountable storage offered by the cloud each group can manage and pay for their own space. The costs are transparent to them and the responsibility has been delegated away from central management. I think that’s an extremely attractive prospect!

The biggest argument I hear against cloud storage & computing is that your top secret, private data is in someone else’s hands. Aside from my general dislike of secret data, these days I still don’t believe this is a good argument. There are enough methods for handling encryption and private networking that this pretty-much becomes a non-issue. Encrypt the data on-site, store the keys in your own internal database, ship the data to the cloud and when you need to run analysis fetch the appropriate keys over an encrypted link, decode the data on demand, re-encrypt the results and ship them back. Sure the encryption overheads add expense to the operation but I think the costs are far outweighed.