As SIIA noted in a series of reports, big data analytics is a source of significant innovations in health care, education, energy and the delivery of government benefits.
Analytics has been with us for some time, but big data analytics really is something new. When data sets are very large in volume, diverse in the variety of data types they contain, and changing with dramatic velocity, standard techniques of data analysis can be supplemented with new computational techniques that take full advantage of the wealth of new and different input data. These techniques enable novel insights to emerge from data in the form of correlations that could not be anticipated from previous theories or empirical research. These unexpected correlations can then form essential elements in increasingly accurate predictive models.
These predictive models have to pass all the normal tests of statistical and empirical significance in order to be successfully used for scientific research or business purposes. This is one reason that some of the recent critiques of big data miss the mark. Moreover, the success rate in developing increasingly accurate validated predictive models in a wide range of endeavors is well established. With the increase in input data coming from sensors embodied in everyday things and linked to communications networks, a phenomenon that goes under the name of the Internet of things, it is highly likely that the new data analytical techniques will become increasingly accurate and will spread to new domains of activity.
The potential common good benefits of this development are so large that a major focus of the Administration’s technology policy should be the promotion and advancement of big data analytics.
It is likely that the upcoming Administration report on privacy and big data will highlight these extraordinary benefits of increasingly accurate analytical predictions. In his earlier comments to the workshop on big data and privacy in Berkeley on April 1, White House Counselor John Podesta, who has been tasked by President Obama to lead the review effort, referred to the experience of a hospital showing how big data can literally save lives. This experience is also described in a recent SIIA blog. The hospital contracted with an outside firm to analyze millions of health data points about new born infants and discovered that a pattern of invariance on a range of indicators of vital signs predicted the onset of an extremely high and dangerous fever twenty four hours later. This advance warning system enabled hospital personnel to start treatment ahead of time.
It is also likely that the report will focus some attention on big data and discrimination. This concern was highlighted several weeks ago, when a coalition of civil rights groups and privacy advocates issued civil rights principles for the era of big data. One of the principles was to ensure “fairness in automated decisionmaking.” The group also warned that new big data analytical techniques “…can easily reach decisions that reinforce existing inequities.” These concerns are legitimate and these groups are right to draw attention to these possibilities.
In his earlier April 1 comments, Podesta raised these same issues:
“Big data analysis of information voluntarily shared on social networks has showed how easy it can be to infer information about race, ethnicity, religion, gender, age, and sexual orientation, among other personal details. We have a strong legal framework in this country forbidding discrimination based on these criteria in a variety of contexts. But it’s easy to imagine how big data technology, if used to cross legal lines we have been careful to set, could end up reinforcing existing inequities in housing, credit, employment, health and education.”
He cites Kate Crawford’s well-known example of the possible biases that can be hidden in big data. When the city of Boston released an app called Street Bump that used smartphone sensors to detect potholes and report them to the department of public works, it initially missed the inequity in relying solely on that app to allocate resources to fix potholes. Since poor people were less likely to carry smartphones and use the app, resources to fix potholes went to wealthier neighborhoods. The city quickly recognized this bias and compensated for it to ensure that the app provided fair and efficient access to these city services.
The lesson from these warnings of possible unfairness is not that we need a new framework of laws and regulations designed specifically to guard against the possibility of big data discrimination. As Podesta pointed out, we already have a strong legal framework prohibiting discrimination in housing, employment, insurance and health care. This framework includes Title VII of the Civil Rights Act of 1964, the Equal Credit Opportunity Act, the Fair Housing Act and the Genetic Information Nondiscrimination Act of 2008.
In addition, a strong Federal Trade Commission has substantial authority to go after unfair practices in the marketplace and to enforce the Fair Credit Reporting Act’s consumer protection rules for the use of data for eligibility decisions in employment, insurance and credit. As recently as two weeks ago, the FTC took strong action against companies that failed to live up to their responsibilities under the Act. The FTC is holding a series of workshops highlighting big data and various possible consumer abuses, including an upcoming hearing in September on the effects of big data on poor and underserved consumers.
We think an Administration effort to examine the question of possible discrimination in the use of big data analytics is extremely important. The focus of this effort, however, should not be on the data itself or on the analytic tools as such. It should be on the use of the data and the analysis. There is no such thing as bad data or evil analysis.
Existing safeguards do not become inoperative when dealing with big data. What is needed is for the Administration to sound the alarm that the intentional use of data, big or small, to engage in unfair and discriminatory practices is already prohibited under existing laws.
The Administration can also remind the public of something that responsible producers and users of analytics already know: that some uses of data analytics can inadvertently reinforce existing patterns of invidious discrimination and that sound business practices for users and producers is to review the uses of these techniques and take steps to eliminate any harmful biases. These steps could include additional internal review and accountability methods such as consumer review boards.
SIIA welcomes a continued conversation with Administration officials on these important and timely issues.