You’re given a simple task: get some XML data from a URL or web service, convert it to something else, and send it off down stream to some other system. Easy enough, right? Somewhere in the middle of your “80 percent done” report, you realize that the original XML is missing one silly little field – the customer’s middle name, the timezone on the date, the preferred nickname for their online avatar, whatever. Unfortunately, this little detail is critical for completing your task at hand.
If there’s no way to get this information, there’s very little you can do, besides ask for an enhancement and wait. But very often, the data’s there for the taking; you just have to go out and get it from somewhere else. Just as common, however, that “somewhere else” is not as accessible as you’d like. If you’re lucky, you can just pull another object or rowset out of the same database. If you’re not, good luck with that “80 percent” thing…
Here are some of the complications you may come across in a data scavenger hunt:
- The data’s there, but you have to torture it out of data structures not meant to provide it (e.g. it’s buried in a string of text with no standard format, or in a numerical field with no clear designation for varying units)
- The data’s on a remote server: performance may be impacted, you have to deal with the problem of making the call (if at all possible), handling errors, and so on
- You don’t have the privilege to access the data
- The data is not guaranteed to be transactionally consistent (you may be getting some stale data, or the new data reflects changes that aren’t seen by the rest of your data set)
- The data is in a log file, system configuration file, admin-only database tables, or some other unholy “do not touch this!” artifact
Each one of these little beauties is a mini-rat’s nest of complexity creeps of their own. What do you do if your application doesn’t have privileges to get the data? Implement a “run-as” feature just to get the one field? Hard-code the root user password? Convince the guys on the other side of the fence to give you a user and wait?
In some cases, you may actually be able to modify the code itself to fit your needs. But that requires a new release of the software, which, if you’re accessing it from the outside, may not be an option (I’ll be posting another Complexity Creep article on this one some time soon). Also, this can lead to problems of its own: the “universal interface”, or the one-size-fits-all façade. I’ve worked with interfaces like this, where every new feature begets a new variation on the same old methods, just to avoid the need for scavenger hunting. It happens with database views, and in XML, too: to avoid making multiple remote requests, you keep adding new fields to your XML entities, until your Company XML document includes a list of all employees barfed up somewhere in the nested tags, complete with their “Address2″ field and their in-flight meal preferences. This solution is the evil twin of the Data Scavenger: the Data Tar Baby.
Data scavenging is a common problem in integration projects, which is one of the reasons they can be so tough. But it can happen when building monitoring utilities, reporting features, or just about anywhere information is required. Unfortunately, when working in the “information technology” field, the odds are pretty high you’ll come across this more often than you’d like. And yet, when you do, it always seems to be that one last insignificant detail that turns your routine query into a scavenger hunt.