Unit tests for the database

A while back (actually, over 2 years ago now!), Scott Ambler was here in Rio for the Rio Java Summit that we were hosting. In his usual controversial way, he made the point that people on the database side of things are way behind the curve on the latest advances in software development, especially in relation to agile practices and software development techniques. Of course, he was promoting his then-recently-released book “Refactoring Databases”. But he was certainly spot-on about the fact that people that generally work directly with databases, including programming stored procedures and triggers, don’t follow the same practices as other software developers, especially those in the object-oriented paradigm. Perhaps this is because database schemas are more risky and slow to change than software (his Database Refactoring book shows how hard this is – even “agile” schemas are refactored on a time scale of weeks, months or even years, rather than minutes for software). Or, more likely, it’s because there’s something of an organizational and historical wall between the two camps.

Whatever the cause, one thing is clear: there are very few tools available to database developers for things that software developers come to take for granted: code metrics and analysis, IDEs with support for refactoring, and frameworks for unit testing. What is available is generally either commercial and highly proprietary (Oracle and Quest (the Toad guys) have some fantastic tools for profiling, analysis and even unit testing), or so limited to be almost trivial. Even when the tools exist, almost no one uses them. Ok, I admit I have no data to support that statement, but I’ve never met a pure DBA, Data Architect or database developer that has ever written a unit test harness in the way that software developers know them (have you?). The point is that unit testing and related practices just haven’t made their way into the database community, and as a result there are almost no tools to support them.

FIT4DBs

As part of Scott Ambler’s presentation, he sent out a challenge to anyone interested to start making these tools on their own. As it so happens, my own team had just recently started a project to adapt the FIT testing framework for use with SQL and stored procedures. We’d looked at DbUnit and some others, but the XML-based test and data declarations seemed somewhat unwieldy to us. The result was a framework I called FIT4DBs. We had plans to open source it, but unfortunately the time just hasn’t materialized. I’ll let you know if we ever get around to it, but for now I’d like to just mention a few aspects of the project.

FIT4DBs, as I already mentioned, is based on the FIT framework for testing. FIT comes with its own philosophy about testing in which the business analysts themselves can define requirements in simple HTML or Excel (or some other table-related format). The developers are then able to write “Fixtures” which essentially “glue” these requirements to the system code. The result are requirements that are actually testable. Sounds fantastic, and when it works, it really is.

Simple FIT test for a web site shopping cart

Simple FIT test for a web site shopping cart

For FIT4DBs, however, it wasn’t this philosophy that interested us – it was the table-driven format for declaring tests. It occurred to me that this format is perfect for when you have an algorithm or procedure that is highly data-driven. If you’ve written a lot of unit tests, I’m sure you’ve dealt with these cases before: some methods only require a few tests to test all the variations, but there are some methods that after 10 or 15 or 20 variations, you keep coming up with more (I find I do this a lot when testing regular expressions, for example).


    public void testCalculateLineItemPriceQty1() {
    	ShoppingCart cart = new ShoppingCart();
    	Item item = new Item(101, "The Best of Wonderella", 10.99F);
    	cart.add(item, 1);
    	
    	assertEquals("Price of one item should equal the unit price", 
    			10.99F, cart.calculateLineItemPrice(item), 0.001F);
    }


    public void testCalculateLineItemPriceQtyMany() {
    	ShoppingCart cart = new ShoppingCart();
    	Item item = new Item(693, "1001 Yo Mama Jokes", 43.00F);
    	cart.add(item, 5);
    	
    	assertEquals("Price of multiple items should follow simple multiplication", 
    			215.00F, cart.calculateLineItemPrice(item), 0.001F);
    }

...and so on...


The problem here is that you have a lot of excess code for what is essentially repetitive work – all you’re doing is trying out different inputs and expecting different outputs. If you come up with a new combination of inputs, you have to copy and paste all these lines of code, just to modify one or two parameters. And after a while, it becomes nearly impossible to see exactly which tests you have already created (not to mention all the ridiculous method names). FIT provides a very convenient alternative to this pattern. All you have to do is list the variations of inputs and expected outputs in a table, and viola! If you want to add a new variation, you just have to add a new row in the table.

Same test with some new exception cases

Same test with some new exception cases

So, the first realization was that database procedures tend to be, well, data-driven. But not in the way that we were talking about here. Usually, what matters is the data that already exists in the database… in the form of tables. And really, that’s the hardest part about testing code that depends on the database: setting up the data the way you want it, and making sure it’s there in a pure and unadulterated form (otherwise, you may get incorrect results). What better way to declare your test data than to put in it tables, just the way it should look in the database? So, we made special set up fixtures that let you declare the name of a table and the columns into which you want to insert your test data. Each row in the HTML or spreadsheet declares a row that will be inserted into the database before the test begins. When the test has completed, the tables that were set up are either truncated (the tests assume they were empty prior to initialization), or the transaction is rolled back. In 9 out of 10 cases, this is sufficient for removing any side effects of the test and keeping each test independent of the others.

A set up table for Items in the database

A set up table for Items in the database

Next, we needed some Fixtures for testing stored procedures and free-form SQL. This was relatively simple. A pure text call to SQL or procedure is transformed into a JDBC call through Java.

Commands to execute directly in the database

Commands to execute directly in the database

The hard part is then comparing the results. While simple JUnit tests can usually limit the scope of their actions to the simple input and output of a single method (some tests require mock objects to test side-effects), stored procedures can affect any number of tables in any number of ways (they can even affect the schema itself). Not only that, but triggers may affect even tables not explicitly included in the scope of the procedure itself (and we DO want to test triggers, right?). This is not an easy thing to test, but FIT4DBs makes a heroic effort to make it simple for you.

What it comes down to in the simplest of terms is looking for “diffs” on the data contained in the tables. This is done by first “registering” the tables and the starting data via the set up tables. That’s right, the data goes not only into the database, but into in-memory hash maps that keep track of what the “before” data looked like. Once the test has been executed, the tables can be re-queried to see if their data has changed in any way. A “SELECT *” is executed on each table, and the results are sucked into “after” hash maps. There are then Fixtures that can be used to declare tables for diff comparisons. Each row that is declared in the test is an expected difference. The differences can be of type “inserted”, “deleted” or “updated”. Comparisons are based on the primary keys, which are declared as part of the table syntax, so if a primary key changes as part of the test, it will look like the row has been deleted and a new one has been inserted.

Testing expected changes in the items table

Testing expected changes in the items table

And that’s pretty much it! There are also facilities for things like declaring your connections, and managing suites of tests by, for example, reusing set up data via a custom “include” facility. DBUnit and similar frameworks work in some of the same ways, but they usually declare their set up data via XML. That just never felt right to me – it’s too hard to edit and to read what you have. Additional benefits of FIT itself include the ability to throw in comments wherever you want (anything that’s not in a table is basically considered a “comment” and ignored – this lets you add formatting and even images to your tests if it can help readability), and the fact that the results show up as exactly your input HTML, with green, red and yellow colors to show passes, fails and errors.

Test Coverage

One other piece that is typically missing from the database development suite are tools for reporting on test coverage of procedural code by the unit test suite. This more or less goes without saying, since unit testing in general is rather neglected. Apparently Quest Software offers a rare tool that can provide this sort of information if you happen to have written your tests with their suite, but most of our tests are defined with the FIT4DBs framework. We also have other automated tests not specific to FIT4DBs which exercise our PL/SQL code, so checking coverage only with that tool would be insufficient.

Eduardo Morelli, our Data Architect, has come to the rescue with an excellent, thorough and flexible solution to this problem. Unfortunately, it’s Oracle-specific, so it will only help you if you work with Oracle. I’ll save the details for another post – stay tuned!

So, Mr. Ambler, I hope this all comes as some encouragement to you. It’s been a long road to rise to your challenge, and it’s still ongoing, but Sakonnet at least is one shop that prefers to use agile development practices on both the software and database sides. I hope that some time soon we can get the resources to open source our FIT4DBs framework for others to use. In the meantime, you can give DBUnit and other tools a shot, or just test your code in an integration environment through JUnit, FIT and other automated harnesses. If you use Oracle, you can even use the techniques I’ll describe in my next post to report on your test coverage. It’s all about the data, so get out there and test the code that’s mucking around with it!

Advertisements

2 Responses to Unit tests for the database

  1. […] is somebody else working on that problem. He is aproaching the problem using FIT or at least based on fit: That’s the hardest part about […]

  2. […] Coverage Reporting on Oracle 10g In a previous post, I mentioned that my colleague Eduardo Morelli and I had made an effort to bring some of the […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: