Steve Mac Feely*
Department of Statistics and Information, University College Cork, Ireland
*Corresponding author: Steve Mac Feely, Head of Statistics and Information, United Conference on Trade and Development, Switzerland Adjunct Professor, Centre for Policy Studies, University College Cork, Cork, Ireland
Submission: May 08, 2018;Published: June 15, 2018
Over recent years the potential of big data for government, for business, for society has excited much comment, debate and even evangelism. Described as the ‘new science’ with all the answers  or a paradigm destroying phenomena of enormous potential  big data are all the rage. Official statisticians, already with a long history of using non-survey data, which are often very large in terms of volume, must decide whether big data is really something new and useful or just hype. On the one hand, some argue that big data needs to be seen as an entirely new ecosystem comprising new data, new tools and methods . Whereas others argue to the contrary that big data is just hype and that big data are just Data . In deciding whether big data can be useful for official statistics, National Statistics Offices (NSOs) must keep the protection of confidential data at the top of their decision making tree.
For official statistics, safeguarding the confidentiality of individual data is sacrosanct and is enshrined in Principle 6 of the United Nations Fundamental Principles of Official Statistics , which states ‘Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes.’ The UN Handbook of Statistical Organization , too ‘underscores repeatedly the requirement that the information that statistical agencies collect should remain confidential and inviolate. The Scheveningen Memorandum  prepared by the Directors General of NSOs in the European Union identified the need to adapt statistical legislation in order to use big data-both to secure access but also protect privacy. The failure to treat individual information as a trust would prevent the statistical agency from functioning effectively. For a NSO to function, confidentiality of the persons and entities for which it holds individual data must be protected i.e. a guarantee to protect the identities and information supplied by all persons, enterprises or other entities, and guarantee that their data are used for statistical purposes only. In short, everyone who supplies data for statistical purposes does so with the reasonable presumption that their confidentiality will be respected and protected. In most countries, safeguarding confidentiality is enshrined in national statistical legislation. But with the increased volumes of big data being generated, and the potential to match those data, greater attention must be paid to data suppression techniques to ensure confidentiality can be safeguarded.
The emergence of big data is forcing many challenging questions to be asked, not least with regard to privacy and confidentiality. Mark Zuckerberg, the founder of Face book, famously claimed that the age of privacy is over . Scott McNealy, CEO of Sun Microsystems, too famously asserted that concerns over privacy are a ‘red herring’ as we ‘have zero privacy’ . Many disagree and have voiced concerns over the loss of privacy [10,11]. Fry  has likened developments with regard to big data and the loss of privacy to the opening of Pandora’s Box - what he terms, Pandora 5.0. The introduction in Europe of the new General Data Protection Regulation which comes into effect in 2018, reinforcing citizen’s data-protection rights, including among other things the right ‘to be forgotten’, suggests that privacy is still a real concern  at least in some regions of the world. By contrast, in the United States, users who provide information under the ‘third-party doctrine’ i.e. to utilities, banks, social networks etc. should have ‘no reasonable expectation of privacy.’
This introduces two new challenges for official statisticians: one technical and one of perception. The technical challenge arises from the availability of large, linkable datasets which present a problem thought to have been solved in traditional statistics-anonymisation. But big data, combined with the enormous computing power available today, it is clear that simply removing personal identifiers and aggregating individual data is not a sufficient safeguard. A paper by Ohm  outlining the consequences of failing to adequately anonymise data graphically illustrates why there is no room for complacency. Thus a problem that had been solved in the context of traditional official statistics must now be re-solved, in the context of a richer and more varied data ecosystem. The changing nature of perception is arguably a trickier problem.
What if Zucker berg and McNealy are correct and future generations are less concerned about privacy? There appears to be some evidence to suggest that they may be correct. It seems there are clear inter-generational differences in opinion vis-a-vis privacy and confidentiality, where those ‘born digital’ are less concerned about disclosing personal information than older generations . Taplin  ponders this, musing ‘It very well may be that privacy is a hopelessly outdated notion and that Mark Zuckerberg belief that privacy is no longer a social norm has won the day.’ If this is so, what are the implications for official statistics and anonymisation? If other statistical providers, not governed by the UN fundamental principles, take a looser approach to confidentiality and privacy, it may leave official statistics in a relatively anachronistic and disadvantaged position vis-a-vis other data providers. But moving away from or discarding principle 6 of the UN Fundamental Principles for Official Statistics would seem to be a very risky move, given the importance of public trust for NSOs.
Taplin  argues that we trade our privacy with corporations in return for innovation or benefits, ‘but it is one thing to forfeit our privacy as individuals to a company that we believe is delivering a needed service and another to open our personal lives to the federal government.’ Mac Feely  has warned that if the benefits of privacy are insufficiently clear to the public or policy makers, then it leaves official statistics vulnerable, and possibly facing a precarious and bleak future. Rudder  highlights this challenge too noting that ‘the fundamental question in any discussion of privacy is the trade-off - what you get for losing it.’ Like Taplin  and Rudder  also argues that the trade-off benefit with the private sector is clear-better targeted ads! He argues that ‘what we get in return for the government’s intrusion is less straightforward.
McNealy too, who seems unconcerned about the lack of privacy in the private sector, takes a very different attitude when it comes to government, saying ‘It scares me to death when the NSA or the IRS know things about my personal life and how I vot Every American ought to be very afraid of big government’ . Curiously, while there is a real fear of government Big Brother, there appears to be few concerns regarding the emergence of a corporate Big Brother. A challenge for official statistics is how to put clear blue water between the NSO and the other institutions of government from the perspective of data sharing, but highlight the common benefits of official statistics as a public good. To some extent there is ideology at play here, where a neo-liberal agenda is pushing to minimize the role of the public sector, but it also illustrates the challenge facing national governments and their agencies where their contribution to the wellbeing of economies and societies is poorly understood.
Big data, if they can be harnessed properly, would appear to offer some tantalizing opportunities - not least improved timeliness and the chance to better align public and official statistics with policy needs. The possibilities of matching different digital data sets may allow us to dramatically improve our understanding of complex, cross-cutting issues, such as, the impacts of life style on health. Advances, such as, the Internet of Things and biometrics will all surely present opportunities to compile new and useful statistics. As yet, the implications of this ‘big data bang’ for statistics is not immediately clear, but one can envisage a whole host of new ways to measure and understand the human condition. In relative terms, big data are still new. At the turn of the century, Scott Cook, the CEO of Intuit mused we’re still in the first minutes of the first day of the Internet revolution . Almost two decades later we are probably only in the first hours. Many norms and standards are yet to evolve. But it does not take a huge leap of imagination to foresee that in the not too distant future, the misuse of big data will be at the heart of a serious human rights abuse scandal. Official statistics must take the ethical dimension seriously. Just because something can be measured doesn’t mean it should be. Norms and cultural values regarding privacy may be changing, but in assessing whether and how to use big data, NSOs and international organizations must carefully consider the human rights of citizens in this digital age.
© 2018 Mohammad Sarwar Mir. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.