As SIIA noted in a series of reports, big data analytics is a source of significant innovations in health care, education, energy and the delivery of government benefits.
Analytics has been with us for some time, but big data analytics really is something new. When data sets are very large in volume, diverse in the variety of data types they contain, and changing with dramatic velocity, standard techniques of data analysis can be supplemented with new computational techniques that take full advantage of the wealth of new and different input data. These techniques enable novel insights to emerge from data in the form of correlations that could not be anticipated from previous theories or empirical research. These unexpected correlations can then form essential elements in increasingly accurate predictive models.
These predictive models have to pass all the normal tests of statistical and empirical significance in order to be successfully used for scientific research or business purposes. This is one reason that some of the recent critiques of big data miss the mark. Moreover, the success rate in developing increasingly accurate validated predictive models in a wide range of endeavors is well established. With the increase in input data coming from sensors embodied in everyday things and linked to communications networks, a phenomenon that goes under the name of the Internet of things, it is highly likely that the new data analytical techniques will become increasingly accurate and will spread to new domains of activity.
The potential common good benefits of this development are so large that a major focus of the Administration’s technology policy should be the promotion and advancement of big data analytics.
It is likely that the upcoming Administration report on privacy and big data will highlight these extraordinary benefits of increasingly accurate analytical predictions. In his earlier comments to the workshop on big data and privacy in Berkeley on April 1, White House Counselor John Podesta, who has been tasked by President Obama to lead the review effort, referred to the experience of a hospital showing how big data can literally save lives. This experience is also described in a recent SIIA blog. The hospital contracted with an outside firm to analyze millions of health data points about new born infants and discovered that a pattern of invariance on a range of indicators of vital signs predicted the onset of an extremely high and dangerous fever twenty four hours later. This advance warning system enabled hospital personnel to start treatment ahead of time.
It is also likely that the report will focus some attention on big data and discrimination. This concern was highlighted several weeks ago, when a coalition of civil rights groups and privacy advocates issued civil rights principles for the era of big data. One of the principles was to ensure “fairness in automated decisionmaking.” The group also warned that new big data analytical techniques “…can easily reach decisions that reinforce existing inequities.” These concerns are legitimate and these groups are right to draw attention to these possibilities.
In his earlier April 1 comments, Podesta raised these same issues:
“Big data analysis of information voluntarily shared on social networks has showed how easy it can be to infer information about race, ethnicity, religion, gender, age, and sexual orientation, among other personal details. We have a strong legal framework in this country forbidding discrimination based on these criteria in a variety of contexts. But it’s easy to imagine how big data technology, if used to cross legal lines we have been careful to set, could end up reinforcing existing inequities in housing, credit, employment, health and education.”