Basic Principles of Module Design

July 28, 2008

The presentation I gave at the Dr. Dobbs conference was a case study regarding a project I am currently working on which we are calling “Remodularization”. As you might guess by the name, the project is a purely technical one in which we have gone through the effort of redefining the architecture of our software with regards to its modules.

What is a module?

Now, before I go in to the details of how we are doing that, it needs to stated just what I mean by a “module”. Given the rather casual way the term is tossed about in our industry, you might have a lot of ideas as to what I may be referring to here. The general notion is that its somehow a grouping of functionality or code in some cohesive manner.

In more formal terms, it’s “an implementation unit that provides a coherent unit of functionality.” Modules are generally a static grouping of code, as opposed to a “component” (as defined for Component and Connector (C&C) views), which are mainly a runtime element. In this context, “module” can still mean a lot of things, so I’ll cut right to the chase here by saying that, for the sake of the application I’m working on, and for this discussion, a “module” is specifically a Maven 2 project. It can also roughly be considered an Eclipse Java project or an IntelliJ project, since the main point here is that we’re talking about the dependency relationships between several independent groupings of Java code. But the bottom line is that my application is written in Java, and we use Maven 2 to build it and manage its dependencies.

What are modules for?

In order to come up with a set of guidelines for module design, of course we have to start with the reasons why we want to use modules in the first place. Modules divide our application into smaller bite-sized chunks, but what do we get from that? What are the downsides to breaking up the application into pieces? The main purposes of a software module are to provide:

  • Decomposition: Using modules helps improve clarity of the application by breaking it up into smaller parts that are easier to understand by developers and other stakeholders. Of course, cohesion is an important consideration here: if you don’t match your conceptual view of the system to the module view, then clarity may not be improved, or may even be reduced (“Why do we have a module called ‘gui’ and another called ‘client’ when they both just have GUI Client code?”). Having too many modules may be as bad as having too few.
  • Encapsulation: As with lower design elements, like packages and façades, modules can be used to limit the impact that changes in the system may have on other parts of the system. But it’s important to manage the dependencies between modules properly to really reap these benefits.
  • Information hiding: This is just another angle on the decomposition benefit. Clarity is again improved with a divide-and-conquer approach by limiting the scope of what developers need to know about a system. If modules are well-defined, and provide a clear interface, then developers working on one module do not need to know the implementation details of the others.

But there are some other forces at play here, which I’ll get into below.

Design Heuristics

Although I’ve seen a lot of discussion around the basic forces involved in module design, I’ve never really seen anyone go to the trouble of boiling them down to a view basic principles that could be used in a practical sense. So here is my list of guidelines that I used for the Remodularization project. Please feel free to post a comment to add your own!

Deployment Location

Code that is deployed on separate servers or tiers should be kept in different modules. This has traditionally NOT been a requirement for module design. Especially in non-Java applications, it’s common to see large modules which are broken up and packaged into several different deployment artifacts for execution on multiple servers. However, as a side-effect of working with Java and with Maven 2 (whose general philosophy is “one artifact per project”), this makes complete sense. Since module = project = JAR (or WAR, EAR, etc.), mixing code from different tiers (e.g. GUI and server-side code) in a single module would mean deploying unnecessary code to each tier.

For my application, this specifically means that we have a separate set of modules for the GUI code and for the server-side functionality.

Functional Cohesion

As I mentioned above, one of the main reasons for decomposing your applications into modules is to provide clarity in your architecture. To some degree, your modules should match business or other concepts important to your system so that understanding the design can be somewhat intuitive. There is no hard and fast rule here, however, for what is the proper granularity for this decomposition. If your application is pretty small (such that a single developer could understand the whole system), then there may be no need to provide clarity through modularization. The design must still reflect a sense of cohesion, but it may be sufficient to represent this decomposition via packages within a single module.

If you do have a large application, then it may be advantageous to divide its implementation along functional units. For my system, there is already a set of business concepts that are understood by users, business analysts and developers alike, and they are reflected in the application via separate menu options and sets of screens. Once I designed a “basic pattern” for application modules (which involves modules for gui-side, server-side and shared code (see below)), I repeated this same basic pattern for each “functional module”. This approach helps keep the modules down to a reasonable size, and is self-documenting for developers (“Where should I put the code for the GUI screens for our reports? Oh, right… the GUI reports module.”). This division also allows us to be more explicit about the expected dependency relationships between functional modules (“Reports should know about Trades, but you shouldn’t have to worry about report configurations while trading.”).

Reusability

Once you’ve broken up your application into your more conceptual modules, it becomes clear that this sort of organization isn’t enough. According to the DRY principle (“Don’t Repeat Yourself”), you shouldn’t have the same piece of code implemented in two different locations. That adds to maintenance overhead, increases the risk of bugs and so on. But if you have a StringUtils class, or a JMS library, or some other piece of infrastructure that you want to use in all your modules, where are you going to put it? The obvious answer is to create a separate module (call it Lib, or Utils, or whatever suits the purpose) that can be shared by all. There are other solutions, but none are as simple and straightforward as this.

Is one reusable module enough? I guess that depends on the purpose of the code that is being shared, the size of the shared module, and the number of modules that are sharing it, among other factors. When I start looking for a place to keep my code, I tend to think of it in terms of its “scope”. Who do I think will need this class, beyond my immediate purpose for it? Is it infrastructure, or is it module-specific business logic? Will it be needed on the GUI, on the server, or both? And so on.

Potentially, you could have as many shared scopes as you have connections between modules, which could get a little hairy. In my opinion, you should look to reduce this to a more practical level, along the lines of what makes sense to you and the developers. In my case, we have business objects and logic which are used on both the GUI and the servers. So, for each functional module, we have pulled this logic out into a functional “domain” module for each functional area. Also, there is shared infrastructure code for all the GUI modules and all the server modules, so there’s a gui.lib and a server.lib module. And, finally, for the occasional StringUtils-type class, we have a plain vanilla “lib” module.

Optionality

As something of a specialization of the “deployment” principle, what should you do about code that sometimes shouldn’t be deployed AT ALL? The Eclipse platform (as well as just about every other IDE platform) supports a very sophisticated feature set for plug-in functionality. Are they deployed together with Eclipse on installation? Well, some of them. But the great majority are developed, packaged and installed separately.

At my company, we also have a lot of customer-specific code, which should not be deployed into every production environment. In fact, we have to be careful about how we manage this code without mingling it with more generic code, and without creating a separate code branch for every customer.

So where do you put code that shouldn’t be deployed with every installation, or that needs to be isolated from your main code base for some reason? By now, I’m sure you’ve guessed the answer: pull it out into another module. Again, this isn’t a mandate. This can also be achieved by special compile-time directives, by using separate branches, or even a smattering of “if” statements. But modules are a very simple and clear mechanism for achieving this end.

Technical Constraints

The specific technology platform you are using to develop your software may also have an impact on the design of your modules. For my application, we are using Maven 2 for the build process, which follows more or less a “one artifact per module” approach. This means that I need to also think of modules as deployment artifacts, which may influence my decisions. Other limitations, like maximum number of modules for an IDE or platform, or questions of efficiency, may also affect your choices.

So, in my case, the technology we’re using influenced the design in two ways. First of all, this “one JAR per project approach” meant that I should consider where I want to deploy my JARs. This lead to the first heuristic: GUI stuff goes in a GUI module, and server-side code goes elsewhere. One other aspect that I haven’t talked about yet is that we are using EJBs for remote procedure calls. Again, this isn’t a requirement, but the standard Maven 2 way to build and package EJBs is by using a special Maven 2 project type which automatically packages your interface and implementation classes into client and server JARs. This specific facility for compilation and packaging of EJBs leads me to the conclusion that I should have a separate set of modules specifically for Enterprise Java Beans. I haven’t actually decided yet if I should use one module per EJB, one per functional module, or one for all. But the point is that the technology itself is having an influence.

One other way the technology can affect these decisions is with regards to granularity. One benefit of using modules is that developers can worry about only the parts of the system they are currently working on (thanks to information hiding and encapsulation). One concrete realization of this is within the IDE itself. If you have a very large application, and you are dividing your code up amongst various projects, it is not always necessary to have the entire application open at all times. For example, if you are working on a generic library module (one that is at the bottom of the dependency stack), it is possible to open ONLY that project in Eclipse. This turns the development environment much more lightweight and responsive to the developer. Having this in mind, it might be advantageous to organize your modules around the responsibilities of your developers with the objective of making their day-to-day work much more manageable.

Work Assignment

With that last comment in mind, it should be pointed out that one of the interested stakeholders of a module view of a system is the project manager. Why? Because modules tend to be a great way to break up the system into chunks for project planning. Modules can be used as project milestones (“First, we’ll implement the user registration unit”), to show important interdependencies (“We can’t release this without the security unit, but we don’t need to hold up the big launch for our plug-ins.”), and to organize teams for parallel development (“We can have the GUI team work on the screens while the server team does the back end.”).

Modules can therefore be used to organize your work around teams, either in terms of parallel implementation, or in terms of specialized skill sets. In my case, we are not designing our modules around teams, but we are doing a bit of the opposite. We have assigned module ownership to different teams according to their specializations (business logic, infrastructure knowledge, GUI skills and so on). These teams don’t have exclusive ownership of those modules, but they do have the last word on the design of those modules and a responsibility to improve their code quality in general.

Software Lifecycle

As Joe Yoder pointed out in his excellent article “Big Ball of Mud”, one of the original inspirations for modularization was the need for what he calls “Shearing Layers”. If there is one part of the code that changes frequently, but is used by a lot of other code, there is a natural tendency to want to protect the dependent code from these changes. The tendency is to create a façade, some kind of stable interface that provides the encapsulation needed to reduce the impact of changes to the more volatile areas of the application. Modules are a natural approach to providing this type of encapsulation.

Sometimes different areas of code may have a totally different lifecycle from one another. I have seen applications where it is possible that different parts of the system may be released on totally different schedules. This is more comment in component-based systems, or “pluggable” platforms. But even applications which are designed to accept production patches to certain parts of the system might benefit from a proper organization of their modules.

Conclusion

In software architecture circles, the “module view” on a system is considered one of the primary aspects that must be considered by the architect. Anyone who is involved in designing or implementing a software application one way or another has to make decisions regarding the organization of their code. Yet rarely have I seen concise guidelines in a practical sense. I’ve tried to codify here some of my thinking on the subject, including the principles I used on my own project. I hope you find it useful, and I’d be happy to hear some of your thoughts on the matter as well.