Confessions of a recovering statistician

Since I’ve just started my new job, I’m not ready yet to blog about the KM aspects of it, so instead I’m writing about what I’ve learned from my past experience.

I have a confession to make. I am, by academic training at least, a statistician. That said, it’s a very long time since I worked as one. But even though I don’t work in the area of statistics any more, I gained a few useful perspectives from my academic training and my early career in the UK government statistical service which I wanted to share. Here are a few thoughts:

Data and the ability to analyze and understand it are very powerful. They allow you the possibility to gain amazing insight into the world, and sometimes to gain insight into the world’s problems and how they work and occasionally how to address them.
Despite this, numeracy is often underrated in many educational systems (it certainly was the UK when I was growing up) even among the political elite. Too many opinion leaders and policy makers shamelessly admit that they don’t understand official statistics or graphs and charts – yet the same people would be much less willing to admit that they didn’t really master reading or writing.
Basic statistical literacy is not actually that hard. It’s not difficult to learn how to understand fractions and ratios and how to read data tables or understand graphs and charts, or how to use them effectively (and honestly) in communicating statistical data – if more effort was placed on teaching them and they were more valued. And just this basic understanding could help avoid many incorrect interpretations of data and the faulty decisions which emanate from them.
BUT – some aspects of statistics ARE highly specialized and require experts to do them. Examples include designing sampling schemes for surveys, developing experimental designs that allow you to test hypotheses and econometric modelling. Even then, it’s not uncommon to see errors and disagreements either in design or interpretation of the results – so it’s good to use experts for expert work and to have some mechanism to peer-review the work (even better if you can publish both your methods and datasets to anyone who has the right skill set can check it). See this notable illustration of the need to check your methods on the cost effectiveness of deworming.
Users of statistics, such as politicians and journalists often forget that statistics are usually an estimation of the situation in real world – not the literal truth – whether due to the statistic being based on a sample or being measured indirectly or incompletely. This means it is our best guess of the real situation – but not reality itself, and as such it is subject to error and only as good as the approach taken and the quality of data used. Ideally any estimate should be accompanied by a standard error that gives you an idea of how accurate that estimation really is – but this is rarely given or used.
Some things are even worse – they are based on models. Maternal mortality figures are an example. Bill Easterly recently commented on the use of “inception statistics” – a model with a model within a model when looking at stillbirths. Often this might be the only way to estimate something – but we need to be wary in interpreting and explaining the results and aware of the implications of the (sometimes heroic) assumptions made and the sensitivity of the indicators to them.
You treasure what you measure – we often seek to identify measurable quantifiable indicators to help monitor progress, whether it is development goals, or process indicators for our projects. But it’s important to remember that when we put this numbers into our frameworks and set up means of collecting them, we risk to focus on improving the numbers themselves, rather than on the underlying issues we are seeking to address. This becomes all the more the case when rewards, personal or institutional, are based on hitting the numbers.
Understanding statistical data requires not only statistical expertise but also contextual knowledge to interpret it. It’s often tempting to start going beyond what the numbers themselves say to suggest explanations for what they mean – but unless you have a good understanding of the specific context (culture, politics, biology etc.) then “common sense” assumptions and explanations might well be inaccurate or just plain wrong.
Interpretation is not free of conscious or unconscious biases. People often look to data to find confirmation of their existing beliefs, rather than impartially considering all possible explanations – the famous “confirmation bias”.
Data can be great to persuade people – but you have to be good at writing and talking about it too – both to explain it accurately and in plain English, but also to use it persuasively. Good data with poor explanation – especially for those audiences who are not data literate – is a poor persuader.
Sometimes you have to take a position if you want to take action even when you don’t have all the facts: Those who understand statistics and produce them are often rightfully cautious about how they are explained and interpreted for some of the reasons above. But taken at face value this can lead to paralysis. To use data to take action you need to strike a balance between seeking the most complete and reliable information, and taking timely and politically pragmatic action with incomplete data.

You will notice that some of these, especially the last three might be seen to be contradiction with each other. Statisticians and policy makers often find it difficult to see eye to eye on the appropriate use of data in decision-making. There is a trade-off between knowing enough and taking action – and also a reality that data is not the only thing that can and should factor into decision-making – both due to the limitations of what data can currently do, and the need to factor in other less tangible knowledge such as experience, culture and politics. But if policy makers can become more attuned to data and both understand and respect it more and know how to use it more effectively, and if statisticians can better explain what they do, explain what their research means and acknowledge and work with the needs of policy makers then there is a lot of potential for decision-making to be not data driven, or data ignorant – but data informed.

Written by Ian Thorpe

October 3, 2011 at 2:04 pm

Posted in Uncategorized

4 Responses

Subscribe to comments with RSS.

Hi Ian, I hope you settle well into the new role, and enjoy the fresh challenges.

I was discussing the Open Data movement recently with some colleagues, and its impact on the interpretation of information. This is a major feature of the public sector landscape in the UK, and I believe the US and other countries, where government departments and certain other types of business are being encouraged (or coerced depending on your viewpoint) to share information about how they run, and in certain circumstances about the services and products they offer.

This new era of transparency allows the common folk like us, usually mere shareholders or voters, to inspect information about the way that organisations are run, which should on the whole empower us to make decisions based upon solid facts, rather than mere reported speculation. I was fortunate enough to spend a day on an Open Data Master Class (run by the Ordnance Survey, the UK’s mapping agency), learning how to obtain, blend and interpret information that just a few short years ago I could never have dreamt of getting hold of without some kind of James Bond shenanigans. OK, Cubby Broccoli would never have sold films talking about district boundaries, school attendance figures, antisocial behaviour orders or civil servants’ expenses, but the point is that you and I can now lay our hands on a wealth of information about the services that governments deliver to us, with just a few clicks of the mouse.

And as much as this is a boon to democracy, it does present an issue in the ways that the information is analysed and interpreted. Minions in GCHQ may be trained in the ways of mathematics, statistics, logical extrapolation and reasoning, whereas I, members of local parents’ associations, political pressure groups (and even some professional journalists) have been trained more as mere consumers of pre-digested information. All of a sudden hoardes of amateurs are being granted access to vast quantities of raw data, and allowed to form and disseminate opinions without necessarily understanding the subtleties of the art.

Moving forward we would hope that statistical prowess would come to show itself, that people who really understand how to make sense of this information deluge will shine a beacon – showing not only the truest facts at the heart of the matters, but also encouraging others to learn the best ways to make their own interpretations as valid as possible. However I think that we need to be prepared for a barage of “facts” that are based upon real data, but where perhaps the reasoning and interpretation leaves something to be desired.

As much as it is laudable to share ‘public’ information so Openly, it could prove to be a double-edged sword.

Arthur M. Gallagher

October 3, 2011 at 9:10 pm

Reply
- Arthur – thanks for your comment. This issue is actually part of what I was planning to write about in my next blog post 🙂
  
  Ian Thorpe
  
  October 4, 2011 at 4:52 am
  
  Reply
While GoogleEarth and other tools have opened up data dissemenation to the masses there are others who are actively fighting against this in the interests of cost recovery, privacy, homeland security or sometimes general data paranoia.

Here’s a recent decision that sparked a lot of controversy:
http://egis3.lacounty.gov/eGIS/index.php/2011/06/02/orange-county-parcel-database-exempt-from-public-records-act/

Fiona

October 4, 2011 at 7:57 pm

Reply
[…] is kind of a follow up to my previous blog post “Confessions of a recovering statistician” with some thoughts about the ongoing move towards open […]

Open data – experience needed « KM on a dollar a day

October 5, 2011 at 1:14 pm

Reply

KM on a dollar a day