Announcing the Arch Decisions plugin for Redmine

February 23, 2010

I’ve been silent for a long time on this blog for two important reasons:

  1. I’ve decided not to post anything unless I really have some value to add
  2. I’ve been spending my spare time working on an open source plugin for the Redmine platform

So, without further ado, I’d like to announce the release (of version 0.0.8!) of the Redmine Arch Decisions plugin! At Sakonnet, my previous gig, they were using Quickbase to track tasks, specs, and just about everything. It was a snap to add in a new feature to track “architecture” (or technical) decisions, configure notifications for collaboration, and hook them up to our issues trackers for reference and follow-up.  I wrote about this tool in a previous blog post, and I have been known to make the comment before that I couldn’t imagine working on software again without it. Well, when the time came to move on, guess what? No tool for tracking my “arch decisions”.

Fortunately, my current employers at Integritas are open to trying out new ideas, and are using the Rails-based Redmine for their issue tracking. Redmine, as with Rails in general, has a fairly usable plugin framework, and it was a great opportunity for me to get my hands dirty with RoR, so I jumped to it. Now, on the date of the release of the 8th version of my plugin (which we have been using for our projects), I feel I’m ready enough to announce it to anyone who’s looking for a way to record their technical decisions (and discuss them before they get made) without the overhead of stiff formal documents.

The following is a very brief overview of what you get in Redmine Arch Decisions 0.0.8:

Arch Decisions

Listing of Arch Decisions

The plugin includes a listing of the Arch Decisions themselves, which are currently limited to the scope of a single project. The ADs have an ID, a status, a summary, and a “Problem Description” field for more detailed information on the context of the decision. ADs currently follow a very simple workflow that isn’t being enforced, but is still useful:

  1. Not Started
  2. Under Discussion
  3. Decision Made
  4. Work Scheduled (implies that issues and/or tasks have been registered to track the implementation)
  5. Implemented (implies that all said issues and/or tasks have been completed, or at least to the satisfaction of the scope of the decision)
  6. Canceled
  7. Deprecated (implies that there’s another AD out there somewhere to replace it)

Arch Decisions also have a text field called “Resolution” that should be filled out when the status is changed to “Decision Made”. The resolution should explain what the final decision was, summarize why that decision was made, and provide any additional guidance to any developers who will be making sure the AD gets implemented.

Basic information for an Arch Decision

In addition to those basic text fields, there are also important supplemental elements embedded within the decisions that play an important role in the documentation and decision-making process (note that these are a new feature that I didn’t have in the old Quickbase version):

Factors

Factors associated with an AD

One of the most important benefits of tracking technical decisions in this way is the possibility of making all decision points and trade offs explicit. There are so many reasons why this is important :

  • You can see on one place all the reasons for which a decision was made
  • You can weigh them against one another so that no one gets fixated on a single reason
  • You can truly validate your assumptions by making them visible and discussing them individually
  • If any of these reasons change in the future, you can go back and check to see if your decision is still valid

Taking a cue from Craig Larman and others, I call these reasons “Factors”. A factor can be just about anything – a requirement, a hunch, a feature, a factoid – that can be used as a justification for a particular decision. In my personal experience, I have seen these factors tossed about with reckless and wanton abandon, littering the sacred grounds of a design discussion. The RAD plugin attempts to put a little order to this chaos by giving you one place to record this information. In general, it can be detrimental to the flow of a discussion to continuously stop to record these factors, but it can be extremely productive to let the fur fly in the heat of the moment, and then carefully pick out the key factors afterwards when you’re ready to clean house.

Factors have a status, which is important in showing which ones have been “challenged” (by marking them as “Validated” once the discussion has completed), including ones that were later shown to be incorrect assumptions (“Refuted”). There is even a text field called “Evidence” wherein the user can record exactly how they came to the conclusion regarding the validity (via external URLs, quotes from a discussion, or even a lame but honest “because Tim said so”).

Also importantly, factors can be reordered on the AD view page by simply dragging a row and placing it in the order desired. This allows you to explicitly declare which factors have a greater weight or priority, which comes in useful when a trade off must be made.

One interesting thing to note about factors is that they may have varying scopes. Some may be very specific to the Arch Decision at hand (e.g. “We will get a big bonus if we pick Strategy A!” or “The coin said ‘heads’”). Some may related to more than one AD (e.g. “The company has mandated that we use open source tools for this project”). Still others may be “global truths” that can even be applied across multiple projects (e.g. “Amazon EC2 does not support multicast between instances” (can this one be refuted yet?)). Factors can be created on their own (via the separate Factors tab), or right in the AD itself. In the latter case, they are automatically given a scope of “Arch Decision”. But this can be changed to something a little more broad. When this happens, the Factor can then be added to multiple ADs as appropriate.

Strategies

Strategies for an AD

What’s a decision without options to choose from? As with factors, my experience has been that people are good at tossing out ideas, but less good at remembering what they were later on. Or understanding anyone’s ideas but their own. So the RAD plugin also separates out a section just to track what those alternatives were that everyone proposed. Each one has a “short name”, which can be useful as reference (a little better than “wait, are you talking about the one where command comes in as a message which is then republished, or the one where you stick the command in the database and then you have a periodic task to look them up?”), plus a sightly longer summary. Then there is a detailed description for what that stratesugy would really entail.

Importantly, strategies can then be officially “rejected”, with an explanation as to why (in the future, it might be interesting to point to the key Factors). When this happens, they show up at the bottom of the list, with a big red “X” so that no one is confused as to whether or not that possibility is still being discussed (nor why it was rejected).

In some cases, you have a “there can only be one” situation, where a decision could only be considered to have been made when all the other competing strategies have been rejected. In this case, the Resolution will really just be a rewrite of the surviving strategy and its implications. In other cases, you might have multiple winners, each of which composes a part of the final resolution. I find this is especially the case when you are making decisions regarding standards – some will be rejected, while others will be accepted and adopted.

Tracking

An Issue with two related ADs

With this release, ADs can finally be associated with Redmine Issues. This is very important for tracking and governance (making sure the decision gets carried out, and that it is still followed in later implementations. It’s also true that during the course of making a decision, work has to be done on the side. Thus, the association between ADs and issues includes the “type” of relationship that an Issue bears to the AD:

  • Task – the work is a task related to making the decision (e.g. for research)
  • Proof of Concept – partial implementation projects that are required to prove whether or not a particular strategy is viable
  • Implementation – software development work intended to implement a decision (e.g. the creation of a framework according to the design specifications stipulated by the resolution)
  • Governed – implementation of the issue is expected to follow the guidelines laid out by a (possibly previously-existing) decision

Since I often work with issue trackers other than Redmine (and have been too lazy to implement a real integration), it’s also possible to define an Issue by an external URL rather than via a Redmine ID. Although the external tracker won’t have a back reference to the AD, and the AD won’t be able to report on the status of the issue, it’s certainly better than having no link at all.

Collaboration

The heart of the original idea for Arch Decisions was the ability to provide a voice to everyone involved in a decision. Ivory tower type architects would do well to take heed and use this tool. Developers don’t always like to have their instructions handed to them on a silver platter (especially when they think a bowl would be better for the soup they’re expected to eat). The RAD plugin gives developers the chance to speak up by posting comments in the Discussion sections (in fact, there’s one for each Factor and Strategy as well as the main AD itself, for those times when you need to focus on a specific subject). It also gives other project members a chance to respond, since there is a “watch” feature, and change notifications can go out via email.

In the previous incarnation of Arch Decisions, there was also a button on each issue so that a developer could raise a red flag whenever there was an implementation detail that needed to be discussed. Thus, the discussion could go both ways, so that architects are not always kept in the blue about what the developers are doing, and what they need to know. This worked very well at my last place of work. Unfortunately, I haven’t implemented this feature yet, but I’m sure it won’t be long before I do.

Final Details

Installing the plugin is very straightforward: just download Redmine and follow its basic instructions, then download the plugin, stick it in the /vendors/plugins folder, and run “rake db:migrate_plugins” to set up the database. I’ll provide a more extensive guide in another post, but hopefully that’s enough to get you started. Unfortunately, the plugin only works with version 0.8.4 of Redmine. I’d like to get it working for 0.9.x soon, so if that’s important to you, give me a holler to get off my butt.

I’ve got more tips and details to discuss about the plugin, so I’ll try to get around to that as soon as possible. Until then, let me know if you have any feedback, and I really wish you the best in your future decisions!


97 Things Every Software Architect Should Know – Released!

March 6, 2009

I guess it’s my turn to write about this. I’ve mentioned before that I participated in an online community project called “97 Things Every Software Architect Should Know”. It’s an ongoing (although currently quiet) collaboration to share the wisdom and experiences of software architects, a group that has only recently been gaining recognition as anything more than a job title.

Well, I am proud to say that I contributed 4 of those 97 things, or “axioms” as they’re called, and now am happy to announce that the book was finally released! You can buy a copy yourself via O’Reilly, Amazon.com, or elsewhere. Or, if you prefer burning fossil fuels over killing trees, you can read the axioms for free (minus electric and internet fees) on the official web site. You’ll also find other nuggets of wisdom in the section entitled “Other Things Software Architects Should Know”, and can even contribute your own expertise via the “Community Axioms” page (I have two more out there, including one that I think I expressed better in a recent blog post).

As a side note, I’d just like to say that there was no pretense with this book to define that these principles are THE 97 things you need to know as a software architect, nor that there are only 97 (the additional contributions on the web site are a testament to that). In fact, the reason for the number 97 is a bit bizarre – I leave it as homework for you to find the explanation hidden in the forums of the 97 Things web site.  If this book is successful, O’Reilly may decide to print a second addition with more axioms, maybe even written by you! They are definitely going ahead with a whole new series of books in the “97 Things” style, on such subjects as programming, data architecture, and others. Keep an eye out for more opportunities to contribute to the community knowledge.

Lastly, if you really like the project or the book, and want to network with other contributors, I just created a group on LinkedIn.com for this purpose – feel free to join!


Causes of Decay: Mutating Design

February 23, 2009

AKA “Partial Refactor”

AKA “Good Ideas”

I have discussed in the past a phenomenon I call “Architecture by Accident”, in which the clarity of the design of a system may be ruined by rampant inconsistencies caused by a lack of attention for standards and reuse as the system evolves. But you don’t have to rely on chance to get there – we can achieve the same results absolutely intentionally.

Let’s say you have a system with a catalog of products, and that each of these products has a listing of parts. It’s probably a common pattern in the system to do something with the product itself, then go through the parts one by one and do a related activity. For example, the web page for the product probably lists the product’s name, code, and a description, and then shows each of the parts one by one in a similar fashion. The printed-out invoice may do the same. And let’s say the order fulfillment workflow does all sorts of funky calculations based on summing up the individual parts for things like calculating shipping weight, checking inventories, provisioning, whatever.

So the system designer goes ahead and says, “Hey everybody! Let’s create an iterator for products and their parts. From now on, whenever you need to do something to products, use a loop with the iterator.” Great. So, the team goes ahead and implements the web page and the invoice sheet using the really fancy iterator, with just a slight change to the contents of the “while” statement. So far, so good.

After a while, this “slight change to the contents” starts giving off a distinct copy-paste smell to the designer. So, one bright day, while browsing through their dog-eared copy of the GoF, they come across the Visitor pattern. “Aha! THIS is what we need!” exclaims our designer. The team has just been asked to implement that product-is-the-sum-of-its-parts weight algorithm I mentioned, and the designer decides it’s a good time to try out the pattern. What do you know?! It’s a fantastic improvement to the way they do things. “From now on, team, we use the Visitor pattern!” And it was so.

Time passes, and after a lot of summing up product parts in all sorts of incredibly meaningful ways, the designer starts to realize that their code base is lousy with one-hit-wonder Visitor classes that are created for some special purpose and are never used again. Fortunately, they are reading a book on the wonders of closures in Groovy. “Aha! THIS is what we need! We can just pass the code to be executed, without having to create a whole new class every time!” The team is all for it (all except one member, who’s forced to quit due to some unfortunate flashbacks to the 60′s inspired by the new language – especially tragic to happen to a young man of only 25), and goes about messing around with their products in Groovy.

Eventually, the team is able to hire a Groovy-compatible replacement for their fallen comrade. On the newbie’s first day on the job, she turns to one of her new coworkers and says, “Hey! I thought you said there was a full-time architect on this system.” Confused, he responds, “There is! Why?” “Well, then, why is this system such a mess? You said I’m supposed to be coding this product stuff in Groovy, but there’s a ton of these Visitor classes, brute-force loops, and all this other copy-pasted code. What up?”

From an outsider’s perspective, there’s little difference between “Architecture by Accident” (a lack of standards) and “Mutating Design” (too many standards). The result is pretty much the same: a patchwork quilt of approaches to solving the same problem in myriad ways. An architect or designer (or team) should strive for clarity in their designs. A system should speak for itself, but not if it’s going to say something different every time it opens its mouth.

So how does one avoid creating a system with a Mutating Design? There are only a few things you can do:

  1. Never change your design. Once you make a decision, write it in stone. This way, it will be easy for everyone to know how things are meant to be done. If anyone strays from the beaten path, it should be easy to identify and put things back on track. Unfortunately, this puts quite a burden on you to get things right from the beginning. This is basically synonymous with “waterfall methodology”, and has about the same chances of succeeding. However, it is worth noting that there may be times where the gain to be had by improving a design is outweighed by the damage the change would do to the clarity of the system.
  2. Refactor everything. The devil in a Mutating Design lies in inconsistency. You can exorcise it by going through a rigorous ritual of refactoring everything that had previously been implemented so that the whole system reflects the new design. This could mean a whole lot of work (and risk of introducing new bugs into previously working code) in the name of clarity.
  3. Isolate the changes. Again, the problem is with clarity, which can be occluded by inconsistency. So is there a way to provide clarity even when the design is incosistent? There is… if you’re clear about scope, and you provide a roadmap.

This last point is not obvious, but worth trying to understand and put into practice. The question you should ask yourself is: if the design keeps changing, how can developers know which pattern to use, and where? Ideally, the system should “speak for itself”, which means developers should be able to infer the design from existing implementations. Therefore, if you wish to change the design, do it in a way that can be consistent within the scope in which developers tend to work. If development teams are divided up by ownership of subsystems, for example, you can experiment with a new design in one of the subsystems – but then change the design for that whole subsystem. It may be inconsistent across the whole system, but in general, developers won’t feel the pain. Even if developers work on the whole system, it may be possible to choose a scope that makes sense to them. If the system is divided by modules, you can choose to change the design for one (entire) module. But then you must make it clear to developers that they should use whichever pattern is appropriate for the particular module they are working on.

This last approach can go really wrong if you don’t provide clear signals to developers as to where they are in the design. Because of this, I am working on a series of techniques (and blog posts) that I call “Visible Architecture”. The idea here is that the development team should be able to see the architecture relative to their code at any time. So, for example, if they are working on a module in which the Visitor pattern must be implemented to work with products, a document on this technique should “present itself” to the developers from within their IDE. If they then switch to a module using the new Groovy approach, the document will switch as well.

There aren’t very many tools that provide this type of functionality. I’m working with one called Structure101 which lets you do just that for layer diagrams. You can define dependency rules for a project, and they will actually show up as diagrams (with enforcement via compilation errors) in either an Eclipse or an IntelliJ IDE. You can publish a different set of diagrams for each Eclipse or IntelliJ project, which means if you wish to change these rules, it’s easy to do it for one project, and leave the old rules in effect everywhere else. I have also written a plug-in of my own for these two browsers called “Doclinks” which doesn’t enforce any rule, but allows you to link URLs to source code based on a wide variety of rules. This, together with a wiki-based architectural documentation, is another way to provide a context-specific roadmap to developers, reducing the confusion that can be caused by a Mutating Design.

I’ve previously shown you how a system can lose its clarity due to a lack of architecture. Now I’ve presented how the same thing can happen when it has too much architecture. As an architect or designer, you need to recognize the importance of standardization, but you also shouldn’t freeze your design in time. What’s important is to recognize that the evolution of the system is best done in stages, rather than through kaleidoscoping changes with no regard to what came before. Before you know it, your code may look like it’s from a B-Movie: The Attack of the Mutating Design!


What can quality attributes tell you about a system?

October 10, 2008

Are there quality attributes that can be attributed to specific patterns of system or software architecture? And can you learn about systems based on their attributes?

I’m standing at the cashier of a local furniture store, trying to pay for a new dresser with my credit card, and after swiping, the guy looks at the monitor and says, “your account must be blocked.” I’m thinking, “what? I use this card every day, and I pay it on time…”, so I tell him to try again. He swipes it 3 more times (each time having to go through an annoying loop of cancellations and submission screens), each time with the same result. The message is very clear on the screen, albeit without those helpful little details that we like: “Account invalid.”

So the cashier tells me to go over to the phone to call the credit card company and see if they can clear up the confusion. I follow his instructions because there’s not much else I can do about it, although in the back of my mind, I’m thinking “that message popped up WAY too quick to have come from my credit card provider.”

So, there you have it: I’ve made a judgement call on a problem based on a quality attribute. I happen to know that credit card authentication systems follow a basic architectural pattern of client-server, usually over the internet or some form of extranet. One thing we know about client-server architectures is that the performance characteristics of communication between the client and the server (especially when first establishing communication) are orders of magnitude higher than in-process or intra-machine exchanges.

Based on this, there’s that voice in the back of my head telling me, “get off the phone, you idiot! That was a client-side validation error!” That’s a pretty common design pattern for client-server architectures: validate as much of your data locally before you go and waste your time with a long-running remote request that is doomed to fail.

I can’t discern from this if they have a layered structure on the client which first validates data, then passes the information to some business delegate, if they are using some sort of strong domain object called “CreditCard” that does its own validation, or if they just slipped in a little “if” statement in some 1000-line monstrosity of a procedure that will do everything whether the developers intended it or not. Is there any way to tell this from the outside just by the system bahavior? Not directly (that I can think of at least), but what are the quality impacts these different approaches have? We know that the “monster procedure” approach is initially fast to develop, but increasingly hard to maintain: a higher rate of bugs and longer time-to-resolution. I suppose I could try to fish around for bugs, but not with my credit card! A form-based validation scheme tends to decentralize validation and error messages, so I could try different (invalid) credit cards from different points of entry to see if I get different responses.

But I don’t have time for any of this – remember, I’m still on the phone. Yes, although I’ve dialed in to talk to a person, they have (well, their voice recording has) asked me to first enter my credit card number, the 4-digit security code (the example they gave me was for the 4-digit expiration date – wtf?), and the year I became a “member”. Hmm… I guess this is another way of up-front validation. This is actually a business process, not a system process: they are trying to save operator time and reduce errors by performing the validation ahead of time only with the customer. 9 out of 10 times, by the time I get to an operator, they ask me for all the same details all over again. But not this time – the lady on the phone already knows who I am! What does this tell me about their enterprise architecture? I guess it means that at the very least, its a distributed environment with multiple systems interconnected. This could be a SOA-type web services architecture, some sort of peer-to-peer interaction, a shared data store (via mainframe?) or something else.

Later on in the (totally futile) conversation, the operator transfers me to someone else. Wouldn’t you know it, the other person automatically has all my information without having to ask for it yet again – WOW! Well, this probably isn’t peer-to-peer. Although it would work in this situation, there’s generally auditing and tracking of customer support cases, and that sort of supportability requirement requires a central server, or at least a shared data repository. It’s possible that a peer-to-peer broadcast request-response pattern was used for locating an available support person in the right sector and for handing off the case number, however. It’s less traceable, but more scalable when a lot of data needs to be exchanged and peers come and go with frequency. But if I had to make a guess, I’d say this type of system usually calls for more of a centralized hub (mediator? worflow queue?) in order to keep a tight lid on things.

I forgot to mention – after entering my credit card data for validation, there was a 5-second delay before there was any response on the line. This seems to indicate a synchronous validation process… again, client-server. Then they tossed me into the musical wonderland of the on-hold queue. That’s right, I said queue. They didn’t regail me with fascinating tales of how many people were ahead of me, or how long I should expect to wait, but they very well could have. Queues tend to follow a “competing consumer” pattern in which each “consumer” (customer support person) asks the central queue provider to give them the next item (message, job, impatient caller) in a FIFO manner as soon as they’re available. We know that this is a highly-scalable low-latency way to distribute work among a variable number of “fungible” worker resources. The process was so efficient in this case that it cut off my all-instrumental ecstasy prematurely, halfway through the crescendo in “Feelings”. Fortunately, they quickly confirmed for me that my account was fine and that I was wasting my and their time with this multi-system information exchange – it was a client-side error, after all.

Finally, back to the cashier. He’s trying to cancel my order because he’s given up on waiting for me, and people are getting upset. Apparently, maintaining multiple sessions in memory was not a usability concern, although there are patterns that would support it. I get back right about the time he hits “cancel”, so he has to start my process from the beginning. Having confirmed that the problem is client-side, he calls the manager, who applies a contingency mechanism known as “Scotch tape” to the back of my card, an violá! After a several-second-long client-server validation, my purchase is approved.

Really, without a comprehensive investigation, the quality behaviors can only give you an inkling of the architecture backing a black-box system that you are using. But rarely are we asked to do that. The point here is that the common architectural patterns that we use DO have an impact on those quality attributes. In fact, that’s the whole point: we use architectural patterns to get the job done, but we choose between different options based on which one will get it done with the right balance of trade-offs for our system.

That’s the basis of the Architecture Tradeoff Analysis Method (ATAM). Architectures can be analyzed before they are even built based on the architectural patterns they use, and the prioritized quality goals that are meant to be supported by the system. Given a known set of patterns and the impact they can have (both positive and negative) on these attributes, it is possible to know ahead of time if you have chosen a suitable architecture.

Practitioners of ATAM are supposed to maintain their own templates of architectural patterns and their trade-offs. There are also some public architecture pattern repositories, and IBM has their “IBM Patterns for e-Business” site (an IBM consultant once showed me a monstrous internal database that was much more complete than this). But none that I know of are very usable in the context of the ATAM, or just picking between trade-offs. Personal experience and intuition always end up being the guiding forces. Do you know of any good pattern catalogs that can tell you the ups and downs of using a pipe-and-filter approach? Of avoiding downtime via a Watchdog pattern, or how to choose between a relational and a hierarchical database? If so, I’d like to know!


Confessions of an overdesigner

September 2, 2008

I just read an interesting blog post from Guilherme Chapiewski on keeping software design simple (for those of you who don’t speak Portuguese, this might be a good time to try out that translation feature on Ubiquity. For those of you that are native Portuguese speakers, let this be my chance to set the record straight before bad practices become permanent: it’s pronounced “you-BIH-quih-tee”). This is obviously not the first time the need for design simplicity has been discussed in our field, and books are finally starting to come out regarding HOW to do this.

I personally am a confessed overdesigner, and I think it’s worthwhile discussing why this happens so often. No one goes out of their way to create more work for themselves or for others, at least explicitly (well, there are exceptions to this, as you will see). Below are the top reasons I’ve seen that cause a software developer or designer to go overboard on their design:

  1. Inexperience: developers who are just getting their feet wet with design patterns naturally want to sample the new patterns they have learned. There is a bit of healthy curiosity at work here, and a desire to get some real practice in implementing the pattern. But when I look at code like this, I can almost here it screaming, “Look, ma! New pattern!!” This is a tough problem to deal with, because developers really need to have practical experience to fully grok a pattern, but using it where it’s not needed isn’t necessarily what developers should learn, either. So, consider responding to this code with a firm, but loving, “That’s very nice, dear. Now go clean your room.”
  2. Too much experience: developers who have been around the block a few times, like myself, like to think we can “see” where the software is going. Using design patterns becomes automatic. They become so natural that the cost of implementing the pattern is almost the same as using a simpler implementation. But the cost to others may be much more significant in terms of understanding and maintaining the code, especially if there is no corresponding requirement for the flexibility (etc.) that the pattern affords. One team that I’ve been working with has adopted what they call the “red card” for me, whenever they see that I’m getting carried away with a design. Although it can be pretty embarassing, it’s a good opportunity to be humble, and to learn from your own mistakes. “That’s very nice, dear. Now go to your room.”
  3. To impress others: there can be a certain amount of prestige in saying that you used design patterns in your implementation. The same goes for coming up with “clever” (aka counterintuitive) solutions. This can be between developers, but it can crop up in other situations as well. I’ve personally never had to do this, but I have heard anecdotes of people being required to justify their designs to management by listing the patterns they are using. I have personally answered RFPs which ask for this. To me, this is ridiculous. Pretty much every large application I’ve worked on uses almost every design pattern in the GoF book in one way or another, and any design pattern without context is meaningless. Don’t let pride be a deciding factor in your design, nor should you use patterns gratuitously to fill up your forms.
  4. Because someone said to do it: when someone else does the design for you, or just provides some “helpful suggestions”, there is a risk that they will over-specify (cf. the next two entries). This is partly because the person doing the design doesn’t have to sully their hands with unpleasant details like implementing and maintaining their idea. A person who works only with abstractions will also naturally tend towards creating more abstractions, and always look to generalize concepts that currently apply to just a specific situation. It’s just what they DO. This is a fundamental problem with separating the act of design from the act of development. If your team finds this separation of duties useful, just make sure the designers (or architects, or whoever) are there to collaborate and see it through to the end.
  5. Fear of paying the cost later on: I often see myself leaning towards more complex designs to implement extra flexibility in the application when I am afraid that if we don’t do it now, there may not be enough time to do it later. This is silly, of course, since we will have to pay for it now, at a time when it’s not even needed. This fear is less ridiculous when the simple approach differs dramatically from the more complex implementation (e.g. when the two approaches require different architectures). A judgement call must be made in these cases, but very often the simple approach is enough if you can build in a single point of change where the more complex solution can be swapped in later. This is, after all, one of the main purposes of encapsulation and abstraction, and is the essence of the Open-Closed Principle.
  6. Fear that if you don’t do it, someone else will: this is one underlying cause of overdesign that I only recently discovered after many years of therapy and self-reflection. Somewhere in my subconscious was the little nagging question, “if not me, then WHO? If not now, then WHEN?” More than just an affirmation of my willingness to take on the challenge, there is also a hint of mistrust that others WON’T be so willing, or so capable. Better to overdesign now, than to underdesign later, right? This is a tough one to recognize, but in any case the “why” is less important than just recognizing that a solution can be simplified. If you find in yourself that this is one of the reasons why your are tending towards a big up-front design, remember the commandment: Empower developers. If you don’t trust your own team, you should figure out why and fix it, not work around it.
  7. Because it’s fun: OK, let’s just admit it. Programmers, architects, designers and the lot are all in it for the joy of puzzle-solving. A simple solution, like a video game that ends too soon, is just a let down. If you often find yourself caught up in the thrill of the sheer possibilities, go ahead and let your mind wander – that’s what brainstorming is all about. But make sure to timebox your thinking outside the box. When you’ve had your fun, root your feet solidly on the ground and pick a solution that actually fits your problem.

One quote in Guilherme’s post is certainly appropriate for an overdesigner like myself:

Beware though, keeping a design simple is hard work.

I am a natural when it comes to seeing abstractions, and generalizing problems. Whenever I’m approached with a new problem we haven’t solved before, I immediately try to name it. “Hmm… you say you need to temporarily store a file to keep it out of memory? Looks like we need a TemporaryMemoryManagementRepository component.” When I do this, I’m looking to shine some light on the problem itself that we’re trying to solve, rather than look immediately at the solution that was pulled out of a hat. This helps the “thinking outside the box” part of the exercise (“is this the ONLY way we can do this?”). I’m also looking for a way to encourage reuse, or at least to make sure that we only solve this problem once (check out “Causes of Decay: Copy-paste Architecture”, when I finally publish that article). But by doing so, I may have already created some confusion (“Hey, where’s that class that saves and reads temp files for us? It’s called WHAT?).

To me, overdesigning is the easy part. What really takes work is whittling that down to something not only reusable and extensible, but also easy to maintain. But the first step is admitting you have a problem…


97 Things #66: Get a second opinion

August 27, 2008

The other day, I wrote up my fourth axiom to get accepted into the “97 Things” book: “If there is only one solution, get a second opinion”. In my opinion, it’s not the best of the four, but it might be worth a quick read (as are all the others on the site!).


Enterprise 2.0

August 27, 2008

It looks like I’ve been a little behind the curve on this “Enterprise 2.0″ thing. I thought the term was just something Josh Street had made up as a whimsical way to tie his presentation together. It turns out it’s a term that is not only tossed about in tech discussions – there’s already a Wikipedia entry, a “movement”, and people grumbling about how everyone’s missing the whole point.

Well, I promise from now on to be more on top of things (“Be at the forefront”, after all). I finally got my “unread RSS entries” count down from 2000+ to a mere 50 or so.

I am glad to see that I wasn’t “missing the point” on this, since we have been using a lot of these technologies for some time. My “Architecture 2.0″ series will continue (as my adaptation of Enterprise 2.0 to the practice of software architecture), and so will my efforts to find new ways to improve our collaboration and efficiency via tools. Stay tuned!


Documenting Architecture Decisions

August 25, 2008

Record Your Rationale

I just posted another axiom to the “97 Things” project called “Record your rationale”. The basic idea is that you should keep a record of why architectural decisions were made so that when someone asks, or when you’re thinking of changing your mind, it’s all there to remind you. This is a common problem, especially for architects who inherit someone else’s system: why this heck did they do THAT? Without a record of the rationale, inobvious solutions can seem like pure lunacy (or stupidity). I have actually caught myself voicing that sort of negative opinion out loud, right in the company of the very person that had done the implementation. Oops.

But I digress. I don’t want to spend this post rehashing what I wrote in the other site (if you’re interested, by all means go read it). I’d like to discuss how we’re recording our rationale at Sakonnet. First, let me post here a part of the original axiom that I cut out to reduce the number of details:

More formal approaches to this type of documentation involve using a standard template that includes the following information:

  • The name (or brief description) of the decision
  • A brief summary of the issue that the solution attempts to resolve
  • A description of the final solution that was selected
  • Factors that influenced the final decision (functional and non-functional requirements, technical, legal and other constraints, political factors (e.g. partnerships with software vendors), or just about anything else that was considered in the decision making process)
  • A prioritized list of quality attributes (is performance, security, cost or maintainability more important in this case?)
  • The final rationale behind the decision
  • A description of alternate solutions that were considered, and why they were rejected

Architecture Decisions

At Sakonnet, we are using a template, but not quite so formal as the one presented above. This is partly because I started out simple just to try out the concept, and partly because I didn’t want to create roadblocks or intimidate people from creating and using them.

I created something called “Architecture Decisions” (very original, right?), which consist basically of a summary (a title, often framed as a question: “How can we improve the monitorability of our message-driven processes?”), a more in-depth description of the problem (often accompanied by some alternatives that are being considered), and a “resolution” that gets filled out when the decision has been made and the issue is being closed. There are also fields for explaining “impacts on Quality of Service” as a consequence of the decision made. This is divided into four main categories (Performance, Scalability, Reliability, Maintainability), plus an “Others” field for anything else. These fields are all plain text.

Architecture 2.0

By now you’re probably picturing a stale templated Word document two pages long or so that gets lost in the file system, never gets properly filled out, and no one ever sees it or can ever find it again, right? This is where it gets interesting. Architecture Decisions are a key component of what I’m (unfortunately) calling (in this blog only) “Architecture 2.0″: an approach to the architecture process which leverages web 2.0-style application features and community to improve communication, participation and agility (especially with regards to documentation) (note that this isn’t my idea – I’m just borrowing it, including more or less the name, from a presentation called “Enterprise 2.0″ recently given by Josh Street).

I created Architecture Decisions (ADs) as a new type of element in the web-based process tracking software that we’re using at Sakonnet. Think of it as a type of Jira or Bugzilla, only more generic and flexible (and less oriented towards software development…). In this system, I was able to define all the fields for these decisions, and more.

Lifecycle

Thanks to the web-based medium, I was able to define both a workflow and a lifecycle for these decisions. The workflow entails the following steps (statuses):

  1. Not started – the decision was created, but isn’t being actively looked at yet
  2. Being investigated - someone (or some group) is looking into the issues surrounding the decision. There is an “assigned to” field for ADs so that everyone can see who specifically is responsible for seeing it through to the end. There is also a “due date” for tracking and reminders.
  3. Decision made - the final solution has been chosen. At this point, the owner needs to fill out the Resolution and the Impacts on Quality of Service fields. People also get notified (more on this below).
  4. Work scheduled - you’ve made the decision, but you’re not done yet! The solution still needs to get implemented. So the AD only reaches this status when the owner has created any specs or tasks required to actually make the decision happen.
  5. Implemented – if the decision has been carried out and implemented in the system, you are now done. But the decision hasn’t reached the end of its lifecycle…
  • Others:
  • Cancelled - if the decision was created by mistake, or someone decided it wasn’t worth going through the motions
  • Deprecated - NOW the decision has really died. More on this below.

The life of an AD is something like the life of a bill (queue Schoolhouse Rock), but without the politics. Someone has a brilliant idea (or a problem), and they write it up in the web site. Then it stays open until someone finally gets around to making up their mind about it. Work gets scheduled, and if it’s important it gets assigned to someone and completed (otherwise, it may linger forever in the limbo known as the “pipeline”).

However, at any point after the decision is made, it may get replaced by a new decision. This may be due to a change in the context (performance suddenly got priority over maintainability), or because new information came to light (something like reopening a case due to “new evidence”). For example, if a decision was made because the middleware doesn’t provide a feature out-of-the-box, you may have to rethink things if a new version of the middleware is released. When the decision changes, a new AD is opened, and the old one is marked as “Deprecated”, meaning it’s no longer valid. A link should be provided to the new decision, but for now this isn’t automatic.

I’ve found from practice that there are a lot of times you want to make just a small change that doesn’t affect the spirit of the decision (e.g. in the solution, you mention an XML tag you want to call <foo>, but you later decide it should be called <Foo> – presumably that change isn’t going to make a big difference). We could just go and alter the text, but people might not notice the change, and it might get overlooked. In this case, I want to have a workflow called “Amendment”, where the AD can be altered slightly, and the change recorded in the history. We haven’t implemented that yet, but I’m working on it.

Collaboration

Documentation of rationale is an important historical artifact. However, for Architecture Decisions, I wanted more than this. Actually, the idea came about for two reasons:

  1. I realized that developers were placing technical discussions in the forum for the specs they were working on
  2. I realized that developers didn’t always know how or when to raise technical issues to the architects

So I created the ADs. They have a number of features which resolve these problems, and more. Like the specs, bugs, and so on, there is a discussion forum for each one. This ends up being type of “living documentation” of the discussion. The back and forth between people ends up looking something like the Federalist Papers (ok, I exaggerate), where you can actually see the proposals and reasoning behind them.

ADs can be associated with “deliverables” (something like a project, or a group of specs), specs and bugs. In fact, they can be CREATED from within any of these. This means that a developer can be given a spec, have questions about the best way to implement it, and with the hit of the button, they can ask the architects. Remember, the “summary” field is often phrased as a question: “Is it ok to use messaging to generate the Foo Report?”

Equally important, the ADs are hooked up to send email notifications at the right times:

  1. Whenever an AD is created, people are notified. This used to be just the architects, but since I want more contribution from the developers (“Empower developers!”), I recently added all developers to the mailing list. So far, no ones complained about the spam.
  2. Whever an AD is marked “Decision Made”, all develoepers receive the notification. This is a way to automatically communicate important decisions regarding the architecture. Since there are 35 developers on the team, it can be hard to make sure everyone got the message.
  3. Anyone working on the related item will get notified whenever someone posts a comment to the forum. Also, anyone that contributes an opinion of their own gets notified.

So far, all of this is working out remarkably well!

Finding Decisions

Another concern I mentioned in my “axiom” is that this information should all be searchable. If you have to spend more than 5 minutes looking for a past decision, you probably won’t do it. Fortunately, for ADs, all their text is searchable via the web interface!

I also went to the effort to provide some other methods to facilitate their use. First, I added the following fields:

  • Category - this is just a free-form plain text field where the architect can fill in a quick description of the “component” or piece of the system to which the decision relates. This was also my way to track exactly what kinds of components this would entail. We’ve got about 250 ADs in the system now, and a set of de facto categorizations have naturally evolved.
  • Functional component – it can be important to track exactly which sets of functionality are being affected by a decision (is this about credit notes replication? Trade saving?). If it affects everything (our Exception Handling Framework), this field is left blank. This field is a drop-down box of pre-defined options.
  • Non-functional component – this is another drop-down of options. It was nearly impossible to pick out a set of non-functional components and stick with it. I started with the “Categories” that people had created on their own. I tossed in some formal components, layers and other elements I’d defined in our official architecture documentation, and so on. The fact is that decisions can affect the software at just about any granularity, hierarchical or not. I’ll write up a separate post some time about categorization of ADs, because it’s a toughy.

So, you can search and sort based on any of the above fields. This can be really useful to answer questions like, “what were all the decisions we made regarding our JMS messaging framework?”

Our web application also lets you define and store reports for repeated use. So I have created a number of views (including things like “which decisions are still open, but due in the next week?”), but there is one that I find very fascinating: The Reigning Decisions Report. This report will show you all the decisions that are in the status “Decision Made”, “Work Scheduled” or “Implemented”. In other words, all the decisions that a developer needs to know at any time about the current system. What’s really cool about this one is that if a decision ever gets “deprecated”, it drops out of the view. If its replacement decision gets made, it shows up instead. This report is ordered and grouped by the Category field.

Parting Thoughts

I must admit that I had delusions about everyone being able to look at my reports and see the WHOLE ARCHITECTURE with a glance. Its one of those things that sounds fantastic in theory, but will never happen in practice. Not every decision made is recorded as an AD, even when they should be. I do my best to keep my eye out for changes that should be recorded this way, and people are generally good about doing so themselves, but sometimes things slip through the cracks. Also, the system was already 8 years old when we started doing this. It would be absurd to think that there is value in going back and documenting every old decision in this system.

Also, there are times when creating an AD for every decision made is too much of a burden to make sense. For big projects, there is usually some up-front design and analysis, during which a whole set of features may be discusssed. It would be impractical to separate each and every “decision” into its own AD, but I still want to record the rationale behind the design. For these cases, we instead create a brief “design document” that gets saved to our version control, and create a single AD which summarizes the design, and links to the actual document. In this way, history is preserved, the decisions are somewhat searchable, and we don’t waste a lot of time unnecessarily.

Architecture Decisions were simple to create, and are a fantastically easy way to “record our rationale”. More importantly, they enhance our ability to collaborate and to communicate architectural changes, standards and principles that would otherwise be discussed (and remain) behind closed doors. They turn out to be a great way to get developers (those that are interested) involved in the design process before these decisions are force-fed to them. In my opinion, they are a critical part of a successful “Archtecture 2.0″ environment, and from now on they’re something I’ll never do without.


97 Things Every Software Architect Should Know

August 22, 2008

It looks like the Architect Commandments have shown their face again – and this time, they’re in good company! I caught wind today of a really interesting project that is the brainchild of Richard Monson-Haefel called “97 Things Every Software Architect Should Know”. He’s gathering a list of the top 97 “axioms” that software architects should take to heart. So far, he’s only picked out about half that, and he’s asking for more. I had a look at the axioms, and realized immediately that the Architect Commandments would be right at home on that list. So far, I’ve registered two of my favorites (“Challenge assumptions” and “Empower developers”), and they were accepted for his book!

Now’s you’re chance to contribute, too. Click on the link above, and get involved, or just check out what people have contributed so far.


Dave Packard’s take on the Architecture Commandments

August 18, 2008

I just ran across this classic set of rules to management by Dave Packard, of HP fame: Dave Packard’s 11 Simple Rules. Succinct and well-written, I think they relate well to two of my Architect Commandments: “Empower developers” and “Show the way”. I guess the underlying message is that, no matter who you are, you are part of a team. Negativity hurts you as much as it hurts the other person. To grow tomatoes, you need to give them water and sunlight. Or something like that.


Follow

Get every new post delivered to your Inbox.