Generating PDFs from .docs, .ppts and friends

So a few weeks ago I set up a system to fix a standing problem I’ve had for a looong time. We have an internal web-based repository for tracking versions of documentation. It takes any sorts of files but the majority of them seem to end up as .docs and .ppts.

That’s fine for the minority of our users on Windows but by far and away most are on UNIX – primarily Linux & Tru64 – and they can’t do much with those formats. Ok, so some of them have OpenOffice installed but for those who do it’s still a pretty bulky application to startup to view these things quickly.

The answer it seemed was to convert them on-the-fly or on-update to a format which is more easily viewed. PDF seemed the obvious choice, but how to generate? Initially we had a copy of Acrobat Distiller running on a Windows PC watching a shared folder on a network drive but that really isn’t the right solution. The wvPS suite was also considered but at the time the webservers were running on Alpha/Tru64 and the suite didn’t compile up easily.

Last week one of the queries I had reminded me that I still didn’t have a solution to this problem. Googling around a bit I found this thread which describes a macro to perform the right sort of thing. Combining this with the openoffice commandline options -invisible -headless -nologo you’d think this would work without also requiring X. Unfortunately openoffice still wants a display – presumably to access fonts, but I’m not sure.

Enter Xvfb, the X virtual framebuffer. This is a virtual X display which effectively runs headless. You can screengrab from it using the xwud tool (another part of the Xfree86 distribution). Running Xvfb on a given display and telling openoffice to open on that display means the whole system can run headlessly and is scriptable at last. The machines running the webservers are significantly more beefy than those on people’s desks so things don't appear to be too slow to kick off.

Another useful thing to do is to duplicate the OpenOffice macro to support other file-types and selecting the right macro via a command-line switch based on the extension of the file to convert. It seems we have a workable solution and relatively flexible one at that. Conversion quality is generally pretty good too. Here’s looking forward to OOo version 2!

… of course the only thing is that now the web systems are linux-based wvPS will work properly. Doh. The consolation is that we can do PPTs, RTFs and of course the native StarOffice and OpenOffice formats and other weird and wonderful things.