Archive for the ‘anzlig’ Category

Spatial Metadata Wrecks- Part 1

Wednesday, February 27th, 2008

The nzopengis group recently discussed the possible use of the University of Tasmania’s Bluenet Metadata Entry and Search Tool by the New Zealand Geospatial Office. Bluenet is a version of the GeoNetwork tool that supports the ANZLIG metadata profile. It got me thinking about what it takes to make a successful metadata system. I knew I’d bitten off more than I intended when I ran into Aristotle in my search to put a foundation under the metadata system problem.

The IT landscape is littered with metadata wrecks. The fundamental business problem with most metadata systems is the cost and value centers are too far apart. Those who pay the cost of collecting and maintaining metadata aren’t the ones who benefit from it. Those who benefit are usually far removed from those who pay. They can be removed by intent (how they classify and use the data), location (or more accurately relationship) and worst of all temporally. For a sobering read on why metadata systems don’t work, see Cory Doctorow’s Metacrap piece.

The secret to making a geospatial metadata system work is bringing cost and value closer together. That’s obviously not an easy problem to solve (or else we wouldn’t have so many metadata wrecks). Lowering the cost of collecting and managing it (which is where Bluenet GeoNetwork comes in) is a start, but it doesn’t move the two closer together. Solve that problem, and you’ve got yourself a fighting chance at a working metadata system. The alternative is to add 1 to the count of existing metadata wrecks.

Before we can solve that problem, we need to understand the nature of it. If we look at the cost and value propositions of metadata, we see it often comes last in the chain of data creation events. That’s because it’s not central to the immediate needs of those collecting the data. They know the constraints they’re operating under when they collect the data. They know the accuracies of their data collection methods. They understand the attributes of the data. They’ve got their own informal metadata system happening; “Hey Bob, how good are the boundaries we digitized for suburb X? Aw, about 10m shifted NW with a 3 degree rotation at control pt 1, but fine for the power retic schedules you’re doing”. Yes, that’s very informal, but it’s metadata and it’s valuable. Recording those attributes formally for others comes way down in the list of priorities. Typically it adds cost without adding much value to those collecting (and supposedly) documenting the data. No matter how enthusiastic they may start out, they end up doing a poor job of the formal documentation bit.  That’s because there’s insufficient value in it for them.

When we look from the point of view of removed consumers of the data, it’s a different story. They’re screaming out for metadata. We’ve invented all sorts of fancy schemas and what-not for managing it. Problem is, once the grand schemas are written and systems developed, data consumers have no say in the metadata story. They have a vested interest in the quality and quantity of the metadata, but no (realistic) means to amend or add to it. They’re not the custodians. There goes their chance to spend resources (time and effort mostly) in recording or validating the metadata that’s important to them. Right there is a serious mis-alignment of the value and cost propositions in the metadata story.

So what is this metadata that creates so much supposed value? The trite answer is that metadata is data about data. At that point we usually move on, content that we have a self understood term. Instead, I’d like to use an alternate definition used by David Weinberger, noted internet information systems expert and author of recently published book “Everything is miscellaneous” (and here for the google video). David gives us “metadata is what we know, data is what we want to know”. Using that definition, we get a whole new slant on how we approach the metadata value problem. No longer do we have a simple divide between data (shapefiles, imagery, map sheets, OGC web services etc) and our ISO 19115 ANZLIG metadata profile. Sometimes the data will be the metadata and the metadata will be the data. Unraveling that requires we understand the principles on which we classify information. At that point, we come face to face with Aristotle, and his model of information classification that we have used ever since. That’s a big topic in its own right, so I’ll save it for part 2, along with how the nature of the separation between metadata producers and consumers has an affect on our solution.