Legaciness

July 29, 2008

I’ve been looking at software metrics recently as a way to track how we’re doing with our code quality. Of course, not all metrics are equal, and they are often about as helpful and accurate as a 7-day weather report… Neal Ford from Thoughtworks recently gave a presentation on his take of software metrics, which he said was something like “reading tea leaves”. A single metric on its own, out of context, may not tell you anything at all (what does a cyclomatic complexity rating of 25 mean, anyways? Is that good or bad? Hell, this rating doesn’t even have UNITS!). It’s only when grouping them together with other metrics and comparing them relatively against other pieces of code that you start to get some sort of idea.

I consider the CRAP4J crappiness rating to be something of an exception. The people who came up with it had a great idea: what if we take two metrics that offset one another, compare them and boil it down into a new value? In other words, what if we do some of the tea reading for you? The two metrics they are using are McCabe’s cyclomatic complexity rating, which gives you a measure more or less of how many different paths there are through a given method (the more paths, the more complex), and a code coverage measure of their own something like Cobertura which can measure the number of paths that are covered by unit tests. This makes perfect sense: if you have a lot of paths, that’s bad, but if you’re testing all the paths, that should be perfectly OK. If you were trying to read these leaves on your own, you would have to paw through all the parts of code that have the highest complexity rating, then try to look up a corresponding report on test coverage…

So my hat’s off to the guys that did CRAP4J.

I’ve also been considering coming up with a metric of my own that I call “Legaciness”. I get my inspiration from Michael Feather’s fantastic book “Working Effectively with Legacy Code”. In his book, he defines legacy code as code which isn’t covered by unit tests. That in itself is a great definition. But the focus of my metric wouldn’t be on the test coverage itself (which we can already measure). The majority of the book focuses on all the tricky issues that can make “legacy” code hard to test in the first place. He lists all sorts of bad practices, like static method calls, doing Bad Things in the constructor, and so on. Then the second half of the book enumerates a series of patterns that can be used in Java, C++ and other languages to safely and slowly refactor your code JUST ENOUGH to be able to slip them into some unit tests, after which you should be able to redesign your code as you wish.

The key here that I’d like to measure is exactly HOW HARD would it be to get some unit tests around your code. In other words, what would be the amount of effort necessary to refactor a class for unit testing; how many time would the Legacy Patterns have to be invoked in order to get 100% test coverage for that class? This measure is what I would call “Legaciness”.

I’ve gone through a lot of the thinking process as to how this sort of thing would be measured, and a number complications have crept up:

  1. Not all static calls are an obstacle to unit testing. For example, a utility method that converts a String to all caps wouldn’t get in the way. So, the metric ideally should take that sort of thing into account.
  2. Explicit calls to “new” are generally a bad thing, since there’s no way to substitute the object being instantiated with a mock object. But, again, if the class being instantiated is a simple Data Transfer Object (DTO) with a bunch of getters and setters, what’s the problem?
  3. Even if a call to “new” is a problem, one of the patterns to remove the obstacle is to simply extract the line of code into a protected method which can then be overridden in a test class. So the metric would have to somehow take into account when one of the legacy code patterns has been implemented to surmount these obstacles.
  4. The Legaciness score would have to be crafty enough to take into account certain levels of indirection. For example, it might be slightly bad that a method instantiates a class via “new”. But it should be considered MUCH WORSE if the class it instantiates also has a really bad Legaciness rating via, say, direct calls to I/O methods and threading and so on and so forth. So, there seems to be some need for an “inheritance” of Legaciness through these dependencies.
  5. When you get down to it, a lot of these bad dependencies are due to I/O calls, threading, container dependencies and so on. These dependencies are very likely outside your own code base (for example, in the base Java APIs). The only realistic measure I could think of is to come up with a base Legaciness score for all known API methods. This seems like more work than it’s worth (actually, Neal Ford mentioned to me that someone did this very thing with Ruby: assigned a score to each base API method for some nefarious metric of their own). Also, I don’t know if the same would be needed for other 3rd-party libraries, or if it would be enough to cover just the java.* and javax.* libraries, and then analyze their Legaciness based on their use of the Java APIs.

It seems to me that Legaciness would be another very useful metric to have, much like CRAP4J. If the latter tells you where your code is complex and untested, the former could tell you just how hard it would be to to give it the test coverage it needs. I think this metric would also be useful to help educate developers as to some of the same best practices that TDD is supposed to encourage.

Unfortunately, I don’t have the time to work out the details of such a metric. If anyone out there’s interested, or has any ideas, let me know!


Basic Principles of Module Design

July 28, 2008

The presentation I gave at the Dr. Dobbs conference was a case study regarding a project I am currently working on which we are calling “Remodularization”. As you might guess by the name, the project is a purely technical one in which we have gone through the effort of redefining the architecture of our software with regards to its modules.

What is a module?

Now, before I go in to the details of how we are doing that, it needs to stated just what I mean by a “module”. Given the rather casual way the term is tossed about in our industry, you might have a lot of ideas as to what I may be referring to here. The general notion is that its somehow a grouping of functionality or code in some cohesive manner.

In more formal terms, it’s “an implementation unit that provides a coherent unit of functionality.” Modules are generally a static grouping of code, as opposed to a “component” (as defined for Component and Connector (C&C) views), which are mainly a runtime element. In this context, “module” can still mean a lot of things, so I’ll cut right to the chase here by saying that, for the sake of the application I’m working on, and for this discussion, a “module” is specifically a Maven 2 project. It can also roughly be considered an Eclipse Java project or an IntelliJ project, since the main point here is that we’re talking about the dependency relationships between several independent groupings of Java code. But the bottom line is that my application is written in Java, and we use Maven 2 to build it and manage its dependencies.

What are modules for?

In order to come up with a set of guidelines for module design, of course we have to start with the reasons why we want to use modules in the first place. Modules divide our application into smaller bite-sized chunks, but what do we get from that? What are the downsides to breaking up the application into pieces? The main purposes of a software module are to provide:

  • Decomposition: Using modules helps improve clarity of the application by breaking it up into smaller parts that are easier to understand by developers and other stakeholders. Of course, cohesion is an important consideration here: if you don’t match your conceptual view of the system to the module view, then clarity may not be improved, or may even be reduced (“Why do we have a module called ‘gui’ and another called ‘client’ when they both just have GUI Client code?”). Having too many modules may be as bad as having too few.
  • Encapsulation: As with lower design elements, like packages and façades, modules can be used to limit the impact that changes in the system may have on other parts of the system. But it’s important to manage the dependencies between modules properly to really reap these benefits.
  • Information hiding: This is just another angle on the decomposition benefit. Clarity is again improved with a divide-and-conquer approach by limiting the scope of what developers need to know about a system. If modules are well-defined, and provide a clear interface, then developers working on one module do not need to know the implementation details of the others.

But there are some other forces at play here, which I’ll get into below.

Design Heuristics

Although I’ve seen a lot of discussion around the basic forces involved in module design, I’ve never really seen anyone go to the trouble of boiling them down to a view basic principles that could be used in a practical sense. So here is my list of guidelines that I used for the Remodularization project. Please feel free to post a comment to add your own!

Deployment Location

Code that is deployed on separate servers or tiers should be kept in different modules. This has traditionally NOT been a requirement for module design. Especially in non-Java applications, it’s common to see large modules which are broken up and packaged into several different deployment artifacts for execution on multiple servers. However, as a side-effect of working with Java and with Maven 2 (whose general philosophy is “one artifact per project”), this makes complete sense. Since module = project = JAR (or WAR, EAR, etc.), mixing code from different tiers (e.g. GUI and server-side code) in a single module would mean deploying unnecessary code to each tier.

For my application, this specifically means that we have a separate set of modules for the GUI code and for the server-side functionality.

Functional Cohesion

As I mentioned above, one of the main reasons for decomposing your applications into modules is to provide clarity in your architecture. To some degree, your modules should match business or other concepts important to your system so that understanding the design can be somewhat intuitive. There is no hard and fast rule here, however, for what is the proper granularity for this decomposition. If your application is pretty small (such that a single developer could understand the whole system), then there may be no need to provide clarity through modularization. The design must still reflect a sense of cohesion, but it may be sufficient to represent this decomposition via packages within a single module.

If you do have a large application, then it may be advantageous to divide its implementation along functional units. For my system, there is already a set of business concepts that are understood by users, business analysts and developers alike, and they are reflected in the application via separate menu options and sets of screens. Once I designed a “basic pattern” for application modules (which involves modules for gui-side, server-side and shared code (see below)), I repeated this same basic pattern for each “functional module”. This approach helps keep the modules down to a reasonable size, and is self-documenting for developers (“Where should I put the code for the GUI screens for our reports? Oh, right… the GUI reports module.”). This division also allows us to be more explicit about the expected dependency relationships between functional modules (“Reports should know about Trades, but you shouldn’t have to worry about report configurations while trading.”).

Reusability

Once you’ve broken up your application into your more conceptual modules, it becomes clear that this sort of organization isn’t enough. According to the DRY principle (“Don’t Repeat Yourself”), you shouldn’t have the same piece of code implemented in two different locations. That adds to maintenance overhead, increases the risk of bugs and so on. But if you have a StringUtils class, or a JMS library, or some other piece of infrastructure that you want to use in all your modules, where are you going to put it? The obvious answer is to create a separate module (call it Lib, or Utils, or whatever suits the purpose) that can be shared by all. There are other solutions, but none are as simple and straightforward as this.

Is one reusable module enough? I guess that depends on the purpose of the code that is being shared, the size of the shared module, and the number of modules that are sharing it, among other factors. When I start looking for a place to keep my code, I tend to think of it in terms of its “scope”. Who do I think will need this class, beyond my immediate purpose for it? Is it infrastructure, or is it module-specific business logic? Will it be needed on the GUI, on the server, or both? And so on.

Potentially, you could have as many shared scopes as you have connections between modules, which could get a little hairy. In my opinion, you should look to reduce this to a more practical level, along the lines of what makes sense to you and the developers. In my case, we have business objects and logic which are used on both the GUI and the servers. So, for each functional module, we have pulled this logic out into a functional “domain” module for each functional area. Also, there is shared infrastructure code for all the GUI modules and all the server modules, so there’s a gui.lib and a server.lib module. And, finally, for the occasional StringUtils-type class, we have a plain vanilla “lib” module.

Optionality

As something of a specialization of the “deployment” principle, what should you do about code that sometimes shouldn’t be deployed AT ALL? The Eclipse platform (as well as just about every other IDE platform) supports a very sophisticated feature set for plug-in functionality. Are they deployed together with Eclipse on installation? Well, some of them. But the great majority are developed, packaged and installed separately.

At my company, we also have a lot of customer-specific code, which should not be deployed into every production environment. In fact, we have to be careful about how we manage this code without mingling it with more generic code, and without creating a separate code branch for every customer.

So where do you put code that shouldn’t be deployed with every installation, or that needs to be isolated from your main code base for some reason? By now, I’m sure you’ve guessed the answer: pull it out into another module. Again, this isn’t a mandate. This can also be achieved by special compile-time directives, by using separate branches, or even a smattering of “if” statements. But modules are a very simple and clear mechanism for achieving this end.

Technical Constraints

The specific technology platform you are using to develop your software may also have an impact on the design of your modules. For my application, we are using Maven 2 for the build process, which follows more or less a “one artifact per module” approach. This means that I need to also think of modules as deployment artifacts, which may influence my decisions. Other limitations, like maximum number of modules for an IDE or platform, or questions of efficiency, may also affect your choices.

So, in my case, the technology we’re using influenced the design in two ways. First of all, this “one JAR per project approach” meant that I should consider where I want to deploy my JARs. This lead to the first heuristic: GUI stuff goes in a GUI module, and server-side code goes elsewhere. One other aspect that I haven’t talked about yet is that we are using EJBs for remote procedure calls. Again, this isn’t a requirement, but the standard Maven 2 way to build and package EJBs is by using a special Maven 2 project type which automatically packages your interface and implementation classes into client and server JARs. This specific facility for compilation and packaging of EJBs leads me to the conclusion that I should have a separate set of modules specifically for Enterprise Java Beans. I haven’t actually decided yet if I should use one module per EJB, one per functional module, or one for all. But the point is that the technology itself is having an influence.

One other way the technology can affect these decisions is with regards to granularity. One benefit of using modules is that developers can worry about only the parts of the system they are currently working on (thanks to information hiding and encapsulation). One concrete realization of this is within the IDE itself. If you have a very large application, and you are dividing your code up amongst various projects, it is not always necessary to have the entire application open at all times. For example, if you are working on a generic library module (one that is at the bottom of the dependency stack), it is possible to open ONLY that project in Eclipse. This turns the development environment much more lightweight and responsive to the developer. Having this in mind, it might be advantageous to organize your modules around the responsibilities of your developers with the objective of making their day-to-day work much more manageable.

Work Assignment

With that last comment in mind, it should be pointed out that one of the interested stakeholders of a module view of a system is the project manager. Why? Because modules tend to be a great way to break up the system into chunks for project planning. Modules can be used as project milestones (“First, we’ll implement the user registration unit”), to show important interdependencies (“We can’t release this without the security unit, but we don’t need to hold up the big launch for our plug-ins.”), and to organize teams for parallel development (“We can have the GUI team work on the screens while the server team does the back end.”).

Modules can therefore be used to organize your work around teams, either in terms of parallel implementation, or in terms of specialized skill sets. In my case, we are not designing our modules around teams, but we are doing a bit of the opposite. We have assigned module ownership to different teams according to their specializations (business logic, infrastructure knowledge, GUI skills and so on). These teams don’t have exclusive ownership of those modules, but they do have the last word on the design of those modules and a responsibility to improve their code quality in general.

Software Lifecycle

As Joe Yoder pointed out in his excellent article “Big Ball of Mud”, one of the original inspirations for modularization was the need for what he calls “Shearing Layers”. If there is one part of the code that changes frequently, but is used by a lot of other code, there is a natural tendency to want to protect the dependent code from these changes. The tendency is to create a façade, some kind of stable interface that provides the encapsulation needed to reduce the impact of changes to the more volatile areas of the application. Modules are a natural approach to providing this type of encapsulation.

Sometimes different areas of code may have a totally different lifecycle from one another. I have seen applications where it is possible that different parts of the system may be released on totally different schedules. This is more comment in component-based systems, or “pluggable” platforms. But even applications which are designed to accept production patches to certain parts of the system might benefit from a proper organization of their modules.

Conclusion

In software architecture circles, the “module view” on a system is considered one of the primary aspects that must be considered by the architect. Anyone who is involved in designing or implementing a software application one way or another has to make decisions regarding the organization of their code. Yet rarely have I seen concise guidelines in a practical sense. I’ve tried to codify here some of my thinking on the subject, including the principles I used on my own project. I hope you find it useful, and I’d be happy to hear some of your thoughts on the matter as well.


High T: High tech talks with High, T. over high tea

July 25, 2008

Hi all,

After many years of reading tech blogs on the sidelines, I’ve decided it’s time to start my own. You might say my inspiration came from a bit of blog-envy after attending and speaking at the Dr. Dobb’s Architecture & Design World 2008 conference in Chicago this week. I won’t deny it. But really it came from a creeping realization that there are a lot of projects and ideas that I had left untouched and un-nurtured over the course of the last few years.

To remedy the situation, I’ll be presenting some of these thoughts and ideas in the areas of software development (Java, in particular, but not solely), design and architecture in the hopes that some of you out there will find them interesting enough to start a dialog.

[Ed. note: hate the name of this blog, but I had to pick something to get started, and with a cool last name like “High”, what else could I do? So, until the name changes… would you like one lump or two?]