Documenting Architecture Decisions

August 25, 2008

Record Your Rationale

I just posted another axiom to the “97 Things” project called “Record your rationale”. The basic idea is that you should keep a record of why architectural decisions were made so that when someone asks, or when you’re thinking of changing your mind, it’s all there to remind you. This is a common problem, especially for architects who inherit someone else’s system: why this heck did they do THAT? Without a record of the rationale, inobvious solutions can seem like pure lunacy (or stupidity). I have actually caught myself voicing that sort of negative opinion out loud, right in the company of the very person that had done the implementation. Oops.

But I digress. I don’t want to spend this post rehashing what I wrote in the other site (if you’re interested, by all means go read it). I’d like to discuss how we’re recording our rationale at Sakonnet. First, let me post here a part of the original axiom that I cut out to reduce the number of details:

More formal approaches to this type of documentation involve using a standard template that includes the following information:

  • The name (or brief description) of the decision
  • A brief summary of the issue that the solution attempts to resolve
  • A description of the final solution that was selected
  • Factors that influenced the final decision (functional and non-functional requirements, technical, legal and other constraints, political factors (e.g. partnerships with software vendors), or just about anything else that was considered in the decision making process)
  • A prioritized list of quality attributes (is performance, security, cost or maintainability more important in this case?)
  • The final rationale behind the decision
  • A description of alternate solutions that were considered, and why they were rejected

Architecture Decisions

At Sakonnet, we are using a template, but not quite so formal as the one presented above. This is partly because I started out simple just to try out the concept, and partly because I didn’t want to create roadblocks or intimidate people from creating and using them.

I created something called “Architecture Decisions” (very original, right?), which consist basically of a summary (a title, often framed as a question: “How can we improve the monitorability of our message-driven processes?”), a more in-depth description of the problem (often accompanied by some alternatives that are being considered), and a “resolution” that gets filled out when the decision has been made and the issue is being closed. There are also fields for explaining “impacts on Quality of Service” as a consequence of the decision made. This is divided into four main categories (Performance, Scalability, Reliability, Maintainability), plus an “Others” field for anything else. These fields are all plain text.

Architecture 2.0

By now you’re probably picturing a stale templated Word document two pages long or so that gets lost in the file system, never gets properly filled out, and no one ever sees it or can ever find it again, right? This is where it gets interesting. Architecture Decisions are a key component of what I’m (unfortunately) calling (in this blog only) “Architecture 2.0”: an approach to the architecture process which leverages web 2.0-style application features and community to improve communication, participation and agility (especially with regards to documentation) (note that this isn’t my idea – I’m just borrowing it, including more or less the name, from a presentation called “Enterprise 2.0” recently given by Josh Street).

I created Architecture Decisions (ADs) as a new type of element in the web-based process tracking software that we’re using at Sakonnet. Think of it as a type of Jira or Bugzilla, only more generic and flexible (and less oriented towards software development…). In this system, I was able to define all the fields for these decisions, and more.

Lifecycle

Thanks to the web-based medium, I was able to define both a workflow and a lifecycle for these decisions. The workflow entails the following steps (statuses):

  1. Not started – the decision was created, but isn’t being actively looked at yet
  2. Being investigated – someone (or some group) is looking into the issues surrounding the decision. There is an “assigned to” field for ADs so that everyone can see who specifically is responsible for seeing it through to the end. There is also a “due date” for tracking and reminders.
  3. Decision made – the final solution has been chosen. At this point, the owner needs to fill out the Resolution and the Impacts on Quality of Service fields. People also get notified (more on this below).
  4. Work scheduled – you’ve made the decision, but you’re not done yet! The solution still needs to get implemented. So the AD only reaches this status when the owner has created any specs or tasks required to actually make the decision happen.
  5. Implemented – if the decision has been carried out and implemented in the system, you are now done. But the decision hasn’t reached the end of its lifecycle…
  • Others:
  • Cancelled – if the decision was created by mistake, or someone decided it wasn’t worth going through the motions
  • Deprecated – NOW the decision has really died. More on this below.

The life of an AD is something like the life of a bill (queue Schoolhouse Rock), but without the politics. Someone has a brilliant idea (or a problem), and they write it up in the web site. Then it stays open until someone finally gets around to making up their mind about it. Work gets scheduled, and if it’s important it gets assigned to someone and completed (otherwise, it may linger forever in the limbo known as the “pipeline”).

However, at any point after the decision is made, it may get replaced by a new decision. This may be due to a change in the context (performance suddenly got priority over maintainability), or because new information came to light (something like reopening a case due to “new evidence”). For example, if a decision was made because the middleware doesn’t provide a feature out-of-the-box, you may have to rethink things if a new version of the middleware is released. When the decision changes, a new AD is opened, and the old one is marked as “Deprecated”, meaning it’s no longer valid. A link should be provided to the new decision, but for now this isn’t automatic.

I’ve found from practice that there are a lot of times you want to make just a small change that doesn’t affect the spirit of the decision (e.g. in the solution, you mention an XML tag you want to call <foo>, but you later decide it should be called <Foo> – presumably that change isn’t going to make a big difference). We could just go and alter the text, but people might not notice the change, and it might get overlooked. In this case, I want to have a workflow called “Amendment”, where the AD can be altered slightly, and the change recorded in the history. We haven’t implemented that yet, but I’m working on it.

Collaboration

Documentation of rationale is an important historical artifact. However, for Architecture Decisions, I wanted more than this. Actually, the idea came about for two reasons:

  1. I realized that developers were placing technical discussions in the forum for the specs they were working on
  2. I realized that developers didn’t always know how or when to raise technical issues to the architects

So I created the ADs. They have a number of features which resolve these problems, and more. Like the specs, bugs, and so on, there is a discussion forum for each one. This ends up being type of “living documentation” of the discussion. The back and forth between people ends up looking something like the Federalist Papers (ok, I exaggerate), where you can actually see the proposals and reasoning behind them.

ADs can be associated with “deliverables” (something like a project, or a group of specs), specs and bugs. In fact, they can be CREATED from within any of these. This means that a developer can be given a spec, have questions about the best way to implement it, and with the hit of the button, they can ask the architects. Remember, the “summary” field is often phrased as a question: “Is it ok to use messaging to generate the Foo Report?”

Equally important, the ADs are hooked up to send email notifications at the right times:

  1. Whenever an AD is created, people are notified. This used to be just the architects, but since I want more contribution from the developers (“Empower developers!”), I recently added all developers to the mailing list. So far, no ones complained about the spam.
  2. Whever an AD is marked “Decision Made”, all develoepers receive the notification. This is a way to automatically communicate important decisions regarding the architecture. Since there are 35 developers on the team, it can be hard to make sure everyone got the message.
  3. Anyone working on the related item will get notified whenever someone posts a comment to the forum. Also, anyone that contributes an opinion of their own gets notified.

So far, all of this is working out remarkably well!

Finding Decisions

Another concern I mentioned in my “axiom” is that this information should all be searchable. If you have to spend more than 5 minutes looking for a past decision, you probably won’t do it. Fortunately, for ADs, all their text is searchable via the web interface!

I also went to the effort to provide some other methods to facilitate their use. First, I added the following fields:

  • Category – this is just a free-form plain text field where the architect can fill in a quick description of the “component” or piece of the system to which the decision relates. This was also my way to track exactly what kinds of components this would entail. We’ve got about 250 ADs in the system now, and a set of de facto categorizations have naturally evolved.
  • Functional component – it can be important to track exactly which sets of functionality are being affected by a decision (is this about credit notes replication? Trade saving?). If it affects everything (our Exception Handling Framework), this field is left blank. This field is a drop-down box of pre-defined options.
  • Non-functional component – this is another drop-down of options. It was nearly impossible to pick out a set of non-functional components and stick with it. I started with the “Categories” that people had created on their own. I tossed in some formal components, layers and other elements I’d defined in our official architecture documentation, and so on. The fact is that decisions can affect the software at just about any granularity, hierarchical or not. I’ll write up a separate post some time about categorization of ADs, because it’s a toughy.

So, you can search and sort based on any of the above fields. This can be really useful to answer questions like, “what were all the decisions we made regarding our JMS messaging framework?”

Our web application also lets you define and store reports for repeated use. So I have created a number of views (including things like “which decisions are still open, but due in the next week?”), but there is one that I find very fascinating: The Reigning Decisions Report. This report will show you all the decisions that are in the status “Decision Made”, “Work Scheduled” or “Implemented”. In other words, all the decisions that a developer needs to know at any time about the current system. What’s really cool about this one is that if a decision ever gets “deprecated”, it drops out of the view. If its replacement decision gets made, it shows up instead. This report is ordered and grouped by the Category field.

Parting Thoughts

I must admit that I had delusions about everyone being able to look at my reports and see the WHOLE ARCHITECTURE with a glance. Its one of those things that sounds fantastic in theory, but will never happen in practice. Not every decision made is recorded as an AD, even when they should be. I do my best to keep my eye out for changes that should be recorded this way, and people are generally good about doing so themselves, but sometimes things slip through the cracks. Also, the system was already 8 years old when we started doing this. It would be absurd to think that there is value in going back and documenting every old decision in this system.

Also, there are times when creating an AD for every decision made is too much of a burden to make sense. For big projects, there is usually some up-front design and analysis, during which a whole set of features may be discusssed. It would be impractical to separate each and every “decision” into its own AD, but I still want to record the rationale behind the design. For these cases, we instead create a brief “design document” that gets saved to our version control, and create a single AD which summarizes the design, and links to the actual document. In this way, history is preserved, the decisions are somewhat searchable, and we don’t waste a lot of time unnecessarily.

Architecture Decisions were simple to create, and are a fantastically easy way to “record our rationale”. More importantly, they enhance our ability to collaborate and to communicate architectural changes, standards and principles that would otherwise be discussed (and remain) behind closed doors. They turn out to be a great way to get developers (those that are interested) involved in the design process before these decisions are force-fed to them. In my opinion, they are a critical part of a successful “Archtecture 2.0” environment, and from now on they’re something I’ll never do without.