Open data: not an open, nor shut case

2012-01-10

Who wouldn't want data to be open? In IT circles, the usual opposite of 'open' is 'closed', but it could also be 'shut', 'exclusive', 'inaccessible' or 'locked-in', all of which can be associated with feelings of frustration, of missing out, of being prevented from sharing in whatever are the benefits of getting in.

The reasons why data can be rendered inaccessible are not always simple. For a start, data has an inherent cost to collect, collate and otherwise control. Sensors need to be placed and connected, spreadsheets and databases to be filled and formats to be converted, servers and storage devices to spin up and keep running, people to pay and problems to solve. That cost needs to be amortised somehow. Whatever 'open data' can mean, it certainly can never be considered to be 'free'.

Once data has been gathered, however, it can be given away - either because an individual or organisation wants it so, or because a government (through actions or laws) deems it so. Governmental information sources are obvious candidates for the 'open' tag, in the UK's case with data.gov.uk collating and enabling access to as much non-personal data as is feasible. This is, of course, to be applauded - it serves a very useful purpose, in that government does not, by itself, have the resources to do all that is possible with the data.

Examples such as the Ordnance Survey, roughly 50% of which is funded by the tax payer, show how ‘giving’ should not be a simple, blanket stance however. Data has a cost, and a value that people are prepared to pay for, either in its raw or derived form (such as maps). As such, it is worth considering the commercial value of data, and balancing this with the reality of how that data was funded, e.g. by the tax payer.

Which brings to commercial data sources. Some companies have built substantial businesses on the back of data gathering, interpretation and analysis - indeed, in the IT industry alone the global IT analyst market is said to be worth $2 Billion a year.

Some sectors, such as pharmaceuticals, are highly dependent on data: they spend a lot of money on it, and their products are largely based on it. From a business perspective, profit margins boil down to the amount of revenue that can be created from a new product, minus the amount it cost to create it - in the case of pharma, then, anything that can be done to reduce the cost of creating data has to be a good thing.

This position becomes more tenuous when organisations use tactics to make it more expensive, or even prohibitive, for the competition, for example through the use of patents law – the dubious practice of companies collecting samples form rain forests and attempting to patent anything with potentially healing properties, say, is an example of this.

Data recipients can also misuse information that they have bought, or which has been freely given. A clear challenge is the nature of aggregated data analysis, for example being able to link internet searches information with current events and depersonalised references to derive quite invasive information about specific people.

We've heard, for example, how Google can follow an outbreak of an illness by tracing the locations of where medical advice-related searches are taking place. While this is a positive example (in terms of informing local clinics with an anonymised picture), it is quite another thing to then link this to commercially available databases and direct marketing the antidote.

Things can get even more hazy where the data source, the collection organisation and the data customer have different interests. Who owns your data, for example, your blood group or your credit history? Who owns the data about the shape of your garden, or the drainage properties of your fields? Who owns the information about the chip in your dog? The recent example of NHS data being sold, albeit anonymised, to pharmaceutical organisations is indicative of what a can of worms this can be.

The aggregation question raises stark questions about privacy, and while computer company bosses from Scott McNealy to Eric Schmidt have declared privacy to be dead (with McNealy famously saying, "Deal with it."), privacy might not be as dead as they have declared – at least in the minds of ordinary people. Nonetheless the nature of privacy is changing, in that people have to decide, on the basis of a fragmented and sometimes deliberately obfuscated picture (in the case of social networking site T’s and C’s), what they are prepared to give away.

The question of privacy looms at a national level as well. Julian Assange may have become the darling of the 'open' movement for his role in Wikileaks, as the organisation cast a torch around some of the murkier corners of our political systems. But a sword such as open-ness can cut both ways, and needs to be handled with care – without citing specific examples, there are some things it would be insane to publish particularly if they put lives at risk.

Opening data then, like opening doors, requires forethought. As Pandora discovered when she took the lid off her mythical jar, open is not always better than shut but it depends on what lies inside, and how it relates to everything else. Neither are the protections currently available universally good, nor universally bad – we need laws on privacy to respond to today's realities, and patents and IP, and copyright, but such things can be misused, as can any tool.

So let's not just band around the 'open' tag as though it will always be a good thing but rather, let’s see 'open' as an opportunity to decide, for each source, the benefits of enabling access together while keeping in mind the risks. The one thing we should keep open above all, is our minds.

[Originally published on publictechnology.net]

Jonno's Blog Archive

Open data: not an open, nor shut case