Open data – experience needed
This is kind of a follow up to my previous blog post “Confessions of a recovering statistician” with some thoughts about the ongoing move towards open data.
The open data movement is proceeding apace with more and more development data being made available publicly and better tools to manipulate and visualize it. The World Bank now makes all its development data available for free and allows datasets to be easily accessed through its API. More and more development agencies are now joining IATI and making their aid spending – and soon their project documents available in a standard format. The Center for Global Development now publish the datasets and methods used for all papers they produce so that the results can be independently verified. Lots of NGOs, social enterprises and groups are crowdsourcing data directly from communities and individuals and making it publicly available whether it be Ushahidi deployments or community mapping. Soon we will have volumes and types of data not previously seen available to anyone to use and analyze.
This is a very good thing.
But, an interesting aspect of this is that while you might be tempted to conclude that data is now a more valuable resource than individual knowledge – I think it is actually the reverse.
Although anyone can download a dataset, manipulate it and create visualizations from it – not everyone has the skills to do it properly. Analyzing data, making sense of it and knowing how to use it to inform decision making is a specialized skill – and not one that everyone masters. AS I mentioned in my previous blog – some kinds of analysis require modelling and other techniques which while they can be automated – need to be properly understood to be used properly. Similarly knowledge of the data sources, reliability and the context and important to be able interpret the data correctly.
As more data is available, this specialized skill will be in increasing demand, and the work of those individuals and organizations who can do this will be at a premium. At present there are just not enough people with this skills around, and it will take time for people to be trained in them.
Similarly as data becomes more readily available, this also puts a premium on the types of knowledge that cannot easily be boiled down into data points – areas such as experience, social networks and interpersonal skills (Or as Einstein put it “Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.”). Even interpretation of data and turning that into politically feasible policy recommendations requires not only technocal knowledge but also experience and judgement.
In a way the benefit of open data is that it frees up time and effort spend just trying to collect or get access to data, and allow us to spend more time analyzing, interpreting, thinking and ultimately doing – and those people and organizations who are better equipped to do these tasks will be the ones that will prosper.
One potential negative side effect of opening up data to all, is that there will be a boom in poorly done, misleading secondary analyses and attractive but inaccurate data visualizations, and conclusions will be drawn and decisions taken that will be based on faulty analysis. On the other hand these analyses will be able to be reproduced, checked and corrected or countered by others. In the shorter term instead of getting no data and inf0rmed analysis on a topic we might instead get multiple analyses with different conclusions competing with each other. But the benefits of this debate and self-correction mechanisms mean that in the long run those who analyze data will also be more accountable for what they do and that reputations of the analysts will be built which help the better analyses to rise above the poorer ones.
And it’s important to remember that even experts make mistakes – but these can now be corrected by “The crowd”. And in this case the crowd isn’t the general public – but rather those who have the required experience and technical skills – but who are not sitting in the organization which collected or produced the data in the first place. This way more expert eyes on a data set can both produce new analyses and validate those that have already been produced.