Archive for February, 2011

Cleaning Out Your Code

February 21, 2011

In my spare time, I’m working on a game. (I wonder if there is any programmer who can’t say that to some extent?)  The details aren’t important (it’s based on the old Star Frontiers RPG) but if you want to check it out you can find it on my gaming site in it’s own forum topic.

I’ve been working on it off an on for several years now.  It’s primarily been a learning exercise for me but it’s getting to the point now where it is actually playable.  I’ve implemented something like 90% of the rule set. And therein lies my dillema.  My efforts on the project have been in spurts.  A couple of weeks spending 10 or more hours on it and then months where I don’t do anything.

During the “on” times, I’m fired up and want to get things implemented and add new features and get them out the door as quickly as possible.  And so I do whatever I can to “just make it work” and get it out there for people to play with.  And so along the way I’ve incurred a lot of technical debt and the code has accumulated a bit of code cruft as well (well, probably much more than a bit).

I was first introduced to the concept of technical debt when reading Jeff Attwood’s post Paying Down Your Technical Debt.  I could defnitely relate to the topic as I was going through the same thing both in my at work code and my game code.

And it seems all I’ve done since then is build up more debt.  I’ve got these last little bits of the game’s rules to implement and it seems that every addition requires reworking things I’ve done before.  But I’m getting better and have started to pay down the debt and clean up the code to make it better and more modular.

As developers, we probably all have to deal with this at some point.  And if we are developing stuff for others to use, they like to see progress and additional functionality with each release.  Unless the bugs were show stoppers, a release that just fixes bugs isn’t very interesting to your users, they want it to do something more than the last one.

But you still need to pay down your technical debt and clean up your code.  And so for the last couple of releases that I’ve done, I’ve adopted a new strategy for dealing with this.  In each release, I’ve added one piece of new functionality, hopefully something that will pique the users’ interest so that they’ll grab the new version and try it out.  But at the same time, I’m working really hard in the background to clean up and refactor the code.  For every visible addition to the game, there have been two or three backend changes that the user doesn’t see (since it doesn’t impact the UI or game play) but which help me get the code under control and pay down my debt.

And it seems to be working, at least to some extent.  In this last go around I fixed several bugs and in the end, the main code was actually smaller than the original and a little easier to understand.  So I must have done something right.

It’s an on-going battle.  In the end, I guess if you are shipping code, you’re winning.  But the faster you can ship, the bigger the win.  And cleaning out your code and paying down your technical debt just makes things easier and makes it possible to have the bigger win.

Complex Systems

February 10, 2011

I hate Windows, it seems that all my problems at work come from having to deal with Windows.  And Mac OS X, I hate Mac OS X as well for the same reason.

Actually, I don’t really hate thoses operating systems, but it got your attention.  I actually think they are both perfectly fine operating systems.  But they do cause all my headaches at work.  I’m a Linux user by default and venturing into the realm of Windows and OS X always seems to give me headaches.

And it is not really the operating systems that cause me the headaches.  The real issue is the complexity of the systems that I have to work with.  As the main part of my job, I maintain and (try to) enhance and extend two fairly complex systems.  One is the public data server for data from the primary instrument on a NASA satellite mission, the other is the software build system for the primary instrument team for that mission.

Both of these systems suffer to some extent from the second system effect as described by Fred Brooks in the Mythical Man Month, as both are the follow-on systems to earlier systems that worked quite well. And both second systems were written by the author of the first system.

In the case of the data server, I only have myself to blame, since I am the original author.  I did all the trade studies, wrote the requirements and design documents, and implemented the system.  In fact, knowing about the second system effect, I tried really hard to avoid suffering from it.  And for the most part, I think I succeeded.  It’s a realtively small, focused system that does one thing really fast.

But it is still complex.  And it still gives me headaches when things go wrong.  And I wrote it.  I understand intuitively what it is supposed to be doing and how it works.  I can only imagine the headaches the guy who was maintianing it for the year I was off working on a different project had.

The other system, on the other hand, was not written by me, and I don’t have the intuitive grasp of the system like the original developer did.  Although I’m getting a better feel for it every day.  And in many ways, this system much more complex than the data server.  It’s an automated build system.  When a user checks in and tags new code, the build system launches a series of processes that checks out the code, builds it, runs all the associated tests, bundles up user, developer and source distributions and publishes all the results (including e-mailing developers about any of their packages that failed to compile or pass their tests).

It’s a fairly standard build system.  Except that it all has to run on seven different operating systems.  With six different compilers.  And it runs on a batch queuing system and talks to four different databases on two different MySQL servers.  Did I mention it was fairly complex?

Just to enumerate, the operating systems we currently support are 32 and 64 bit Redhat Enterprise Linux 4 & 5, Mac OS X 10.6 (Snow Leopard), Mac OS X 10.4 (Tiger, going away as soon as the Snow Leopard support is fully functional) and Windows XP (with Windows 7 support looming soon).  The compilers we currently support are four versions of gcc (3.4, 4.0, 4.1 and 4.2) and two versions of Visual Studio (2003 and 2008).  It’s not actually as bad as it sounds.  With the exception of two versions of VS running on Win XP, there is only one compiler supported per *nix style OS.  This variety is actually a good thing as it helps keep the codebase clean since it has to work everywhere.

The real trouble comes from the infrastructure supporting the system and the ways it interacts (or doesn’t) with these different operating systems.

The programs that run the build system were written in C++ using the Qt library.  Now I didn’t know anything about Qt when I acquired the responsibility for the project but after sifting through the code, I think I can understand why this was chosen.  One of the main reasons was the use of the timer and process control functionality, both to launch checks at specific intervals and to kill build or, more importantly, test processes that have hung and are taking to long.  Only that latter doesn’t seem to work on Snow Leopard, as we found out when one of our packages was seg faulting in the tests and instead of dying, it was going into an infinite loop.  And since the build system code didn’t properly kill it, the entire system hung up for that OS.  And right now I can’t tell if the problem is Qt, the underlying OS, how we’re applying it, or some combination of the three.  Complexity.

This build system has a lot of moving parts.  And I think the reason is that it is built around the central batch queuing system at the national laboratory where it runs.  In theory and at the beginning, that was a good idea.  We were sometimes triggering new builds every 15 to 30 minutes and the entire build process takes about an hour or two to run (there’s a lot of code to be compiled and tested).  By using the batch farm, we could have all these builds running and not piling up on one another by leveraging a tiny fraction of the thousands of CPU cores available in the farm.

But that came with tradeoffs.  For example, since the various parts of the process could potentially (and usually do) run on different machines, you can’t us local storage and have to use network disks (via AFS in our case) to hold all the code and test data.  This doesn’t seem to be an issue for the *nix systems but for some reason accessing the network disks from Windows is sloooooow.  A process that takes 10 minutes on the *nix boxes can take 30-40 (or more sometimes) minutes on the Windows boxes, reason unknown.  There are other tradeoffs as well, all increasing the complexity.

And then a couple of things happened.  The lab never really supported Mac OS X in the batch farm, so we had to get our own OS X boxes.  And we somewhat pioneered Windows usage so we had to get those boxes ourselves as well.  And then they dropped Redhat EL 4, and 32 bit Redhat EL 5.  So now, the only OS supported in the main farm that we were supposed to use is Redhat EL5 64-bit.  Everything else runs on our own project purchaced machines, but we’re still wedded to this complex infrastructure of using the batch farm.

Luckily, we’re starting to move away from that.  But it’s painfully slow, mainly since I seem to spend all my time running around propping up the beast to keep it in production and have little time to work on an alternative.  But at least there is motion in the right direction.  Towards simpler systems.  And away from complexity.

Hello world!

February 7, 2011

This is a programming blog so ‘Hello world!’ is an appropriate first title.

Welcome to Programming Space, a blog where I hope to talk about my experiences as a programmer, astronomer, and other topics losely related to those areas.  I think I suffer from what Jeff Atwood (@codinghorror) describes as the Fear of Writing.  It’s not that I don’t and can’t write, but it is usually to a small audience.  And I’m not very quick about it.  I type reasonably well (~60 wpm) but my thoughts go so much faster than my fingers and I’m editing before I even get it on the page.  And I like to write in detail.  And explain things.  And give all the background.  And it just takes a long time for me to compose anything in written form.

And so I’m going to take the plunge and start this blog.

This is actually the third blog I’ve started.  The first, which is still limping along, is a gaming blog called Star Frontiers Lives On! and is dedicated to an old school Science Fiction role-playing game that I love.  As part of my blogging efforts, I plan on putting out more content there as well.  You can expect to see more on the topic of Star Frontiers appearing on these pages since I spend a lot of my free time working on Star Frontiers projects.

My second attempt at blogging was a personal blog that originally had similar goals to this one.  It is dead.  And so we are here.

Welcome once again to Programming Space.  I hope you enjoy the content in the coming weeks and months.

Switch to our mobile site