A long time ago (10 years or so which is an eon in the tech industries) a really smart dude named Martin Fowler wrote a book by the name of Refactoring. The book has since become the de facto manual on the topic. The reason I bring this up is that in his book Fowler identifies common “bad smells” in code and provides formulas (or for real math geeks transformations) for how to clean them up. Unfortunately many programmers haven’t read his book, and many who have read it don’t know how to deal with the more complex stenches that are common in code today.
Which brings me to the three month old Lasagna in the back of my fridge. The other day I found some code with a 1000 line conditional statement comprised of one if statement, one else statement, and about 150 else if blocks.
To the untrained programmer this is just a bit of undesirable or “legacy” code. They don’t understand that the first impression aroma that they are picking up on is a complex medley of Fowler smells which form into a revolting and equally distinct odor.
On the surface you have a smell somewhat like Limburger cheese (an odd mix between dirty socks and BO) which you may mistake for the Hobo in the next desk over. In fact this is the smell Fowler identified as a code bloater, the long method. The second smell hides a bit better. It is more akin to the smell of mildew and unwashed gym clothes that you find in a high school locker room. Fowler identifies this as the smell of switch statements. In our case it is coming from the conditional expressions themselves, but in essence a switch statement is just an if/else if/else statement with cleaner syntax.
In the code I’m talking about here the next bombardment of your olfactory receptors is something that smells on the surface vaguely familiar, like the dead-fish smell in your freezer (this is usually the point where you realize what you thought was lasagna from a few months ago is actually leftover pumpkin pie from Thanksgiving 2006). This scent is probably coming from the fact that those same conditional checks have been there so long they’ve propagated throughout the entire application, so you can find those same checks being repeated over and over again (much like how the pumpkin pie’s scent has staked a significant claim over my fridge). In this code the if statements set enumerations that were later checked, switched on, and otherwise manhandled. This scent is the dead-ringer that makes identifying the solution straight forward.
If you find that your code is riddled with flow control statements that set state and you have to re-check your state variables periodically you have just found the perfect use case for behaviors.
The application this code lives in is a tool distributed with a very popular development environment. The tool reads in a text file and executes a series of operations based on the instructions in the file (basically it is a custom script interpreter). When I first saw this application almost three years ago I felt that the code was unmaintainable, more complicated than it needed to be, and in general a hacky mess, but at the same time it was really powerful, open ended, and could be used for a lot of things other than what it was originally designed for. The script parser supported about 30 different commands and identified them by doing a string compare against the first word on each line in the script file. After identifying the command the tool used an enum to identify each operation and wherever it needed to know which operation it was executing to determine whether or not to do something it would either compare or switch over the enum.
Fast forward a few years, two jobs and a lot of technical development on my part.
Today I’m at a different company using the same middleware vendor and the application has been expanded to about 150 commands and the core architecture remains the same. That means there is a 1000-line chain of conditionals to determine the type of the command, a 400 line switch statement to execute commands and hundreds of conditional checks on an enum’s type.
So how do we fix it?
Up above I mentioned a buzzword “behaviors”. If you boil this problem down essentially you have 150 operations that need to execute functionality at different points in a process. The first thing you need to do is identify where those execution points are. In the application I’m working on this is pretty easy. There is a setup step, an execute step, and a cleanup step. The primary reason for there being three steps is the ability to have the execution step running on a separate thread from the main application thread which will manage setup and cleanup.
Cool! With that established how do I make this work? In my case eliminating the large switch statement and the conditional checks on the type is trivial. The hardest part is understanding that your interpreter doesn’t really care what command is running as long as each command knows what it is supposed to do. That shift of the burden of knowledge from the interpreter to the command is key. All the interpreter really cares about is that the command does the right thing. So you define a base command type with virtual Setup, Execute, and Cleanup methods, and any common functionality that each command needs. That will look something like this:
class Command
{
virtual void Setup(Interpreter&);
virtual void Execute(Interpreter&);
virtual void Cleanup(Interpreter&);
}
With this as a base you can have a derived object for each command and your execution logic becomes three virtual function calls instead of a 150 case switch statement and a bunch of leading and trailing if statements. That means that the execution framework can treat every command exactly the same. That gets rid of the need for the switch statement and a bunch of conditional checks. Cool huh? No not really this is the easy part. Getting rid of the if/else if/else block is where the big win is at.
What makes getting rid of the if/else if/else block tricky is that you are essentially taking input and based on the input creating one of 150-some different objects. It is understandable that someone would think that they might need to identify each possible input and implement a separate handler for each possible input (as we see with this massive if/else if/else block). But what if we could go from a string/int/enum/whatever directly to a constructed object of the type we want?
I’m going to actually provide two solutions for this. One is a generic solution that can be applied to pretty much any programming language, and the other is more specific to the Microsoft .NET languages. The reason I’m giving both solutions is that the tool I’m working with is written in C#, but I’ve solved this problem before in C++ and think it is useful to give that solution too. Also be forewarned: being able to implement the .NET solution requires knowledge of reflection, and being able to implement the more generic solution requires an understanding of function pointers (Yay right?).
Disclaimer: The .NET implementation really isn't anything fancy. I'm pretty sure it is almost exactly how Microsoft implements MSBuild, and I know it is the same way they do a lot of the events in WPF.
In .NET we can make use of reflection to convert a string to an object for us. So let’s say we have a command “Run”. If we pick a strict naming convention such that, for example, Run’s implementation is in a class derived from Command named “RunCommand” we can basically take the command’s name of each line in the script as we execute it and using reflection we can construct RunCommand objects without ever needing to know that the command in the text file is Run.
The code to do this is actually pretty simple, and would look something like this:
class CommandFactory
{
// Dictionary of available commands
private Dictionary map = new Dictionary();
// Use Reflection to populate factory
public CommandFactory()
{
Type[] commandTypes = Assembly.GetAssembly(typeof(Command)).GetTypes();
foreach (Type commandType in commandTypes)
{
if (!typeof(Command).IsAssignableFrom(commandType) || (commandType == typeof(Command)))
{
continue; // not a Command
}
// Do some name mangling to strip off trailing 'Command' suffix
string commandName = commandType.Name;
string commandStr = "command";
commandName = commandName.ToLower();
// strip off 'Command' suffix if it exists
if (commandName.EndsWith(commandStr))
{
commandName = commandName.Substring(0, commandName.Length - commandStr.Length);
}
map.Add(commandName, commandType);
}
}
public Command Create(string name)
{
Command toReturn = null;
try
{
toReturn = (Command)Activator.CreateInstance(map[name.ToLower()]);
}
catch (KeyNotFoundException)
{
throw new Exception(String.Format("Command \"{0}\" not found", name));
}
return toReturn;
}
This solution should work pretty well, and can be adapted to allow you to specify arguments for the constructor too. All in all it is pretty elegant. The C++ solution isn’t quite as elegant, and I won’t go too deep into it because it is a lot of code.
In the C# code the command factory gets all the supported commands by reflecting into the assembly. We can’t do that in C++ (unless we are using Microsoft’s fake C++ which doesn’t count). Because of that we require that all available commands implement a static factory method, and we register the function pointer to that static method with the command name in our factory. The factory then keeps a binary tree or hash table mapping command names to factory method pointers. You will find the worst case binary tree lookup is a hell of a lot better than the worst case run through the conditional block.
You can do a few things to lighten the load on C++ programmers. One thing that comes to mind would be having a dummy class that during object construction registers a command. That class could be defined with a macro which for our run command could be something like “REGISTER_COMMAND( Run )”. The macro would at compile time expand out to a zero-size class which has a default constructor registering the string “Run” with RunCommand::Factory, and a static variable declaration. That way when your application loads during static initialization (which happens before main is entered) the Run command is registered and your programmer doesn’t need to worry about it.
Once this is done your code is a lot more maintainable. Adding a new command basically becomes implementing the command’s class and for any non-.NET language you add registering with the factory. Problem solved. Smells resolved.
Posted in Bad Smells, Code, Process, Software Development
|