Complex Systems

This build system has a lot of moving parts.  And I think the reason is that it is built around the central batch queuing system at the national laboratory where it runs.  In theory and at the beginning, that was a good idea.  We were sometimes triggering new builds every 15 to 30 minutes and the entire build process takes about an hour or two to run (there’s a lot of code to be compiled and tested).  By using the batch farm, we could have all these builds running and not piling up on one another by leveraging a tiny fraction of the thousands of CPU cores available in the farm.

But that came with tradeoffs.  For example, since the various parts of the process could potentially (and usually do) run on different machines, you can’t us local storage and have to use network disks (via AFS in our case) to hold all the code and test data.  This doesn’t seem to be an issue for the *nix systems but for some reason accessing the network disks from Windows is sloooooow.  A process that takes 10 minutes on the *nix boxes can take 30-40 (or more sometimes) minutes on the Windows boxes, reason unknown.  There are other tradeoffs as well, all increasing the complexity.

And then a couple of things happened.  The lab never really supported Mac OS X in the batch farm, so we had to get our own OS X boxes.  And we somewhat pioneered Windows usage so we had to get those boxes ourselves as well.  And then they dropped Redhat EL 4, and 32 bit Redhat EL 5.  So now, the only OS supported in the main farm that we were supposed to use is Redhat EL5 64-bit.  Everything else runs on our own project purchaced machines, but we’re still wedded to this complex infrastructure of using the batch farm.

Luckily, we’re starting to move away from that.  But it’s painfully slow, mainly since I seem to spend all my time running around propping up the beast to keep it in production and have little time to work on an alternative.  But at least there is motion in the right direction.  Towards simpler systems.  And away from complexity.

Page 2 of 2 | Previous page

Leave a comment