IBM’s Watson Graduates from Winning Jeopardy to Changing Healthcare

Two years ago IBM Watson competed and won on Jeopardy against two of the shows most successful contestants.  Watson was able to achieve this feat by using natural language processing and big data to comprehend the questions and then come up with the correct answer.  Since this initial historic achievement IBM has been working on making Watson work in the real world.  Now Watson is working with Doctors at the Memorial Sloan-Kettering Cancer Center (MSKCC) to help the proliferation of medical information and improve health care efficiency and quality. 

Last week I was able to attend a briefing where IBM showcased how Watson is proliferating medical information and improving health care efficiency and quality.  While at this briefing I kept thinking about how this was a perfect real life case study of using big data and how it fit in with SIIA’s recent white paper  Data-Driven Innovation A Guide for Policymakers: Understanding and Enabling the Economic and Social Value of Data.

The briefing was led by Dr. Martin Kohn, the Chief Medical Scientist of IBM and Dr. Mark Kris, the Chief Thoracic Oncology Service at MSKCC.  During the briefing they showed us how Watson is able to use a patient’s record and look at relevant data to come up with a list of potential treatment plans and their odds of being successful.  If important information is missing Watson lets the doctors know what information it needs in order to make a decision.  Over time as the patient has new symptoms or gets back the results of certain tests or treatments or expresses preferences on treatment Watson takes all of these things into consideration when coming up new treatments and their probabilities of success.  Additionally based on the information Watson has received it can diagnose or change the diagnosis of a patient.

Dr. Kris believes that Watson is successful at diagnosing and offering treatments because it looks at everything not just what people believe are important.  The other reason he believes Watson is successful is because it goes about things the way a doctor would such as giving a list of possibilities not one definite solution and the likelihood of various treatments being successful.  Watson has the added ability to look at information collected by doctors in the field around the world and use their cumulative knowledge instead of just relying on what a few specific doctors at that hospital know.  Just like with people Watson is able to learn and remember things so the more patients it works with the better it is able to do in the future. 

While these initial results of transforming Watson from a games show winner into a doctor have been promising there are still many problems they have to work on fixing before using Watson at the hospital becomes a common occurrence.  The two biggest of which are that for Watson to come up with diagnoses and treatments requires it to analyze and store massive amounts of data which is very costly to do at the moment.  The second is that at the moment they need to figure out how to best maximize the use of Watson as it is only capable of working in a narrow field at the moment such as cancer instead of in the broader field of healthcare.  Both Dr. Kohn and Dr. Kris stressed that Watson at the time is a tool that can be used to support or come up with a second opinion on things but is not a substitute for an actual doctor. 

At the moment Watson is a useful tool at the MSKCC but there is a still lot of work that needs to be done before it is able to potentially revolutionize the healthcare industry.  The most important thing is to remember the use of big data to create Data Driven Innovation to create real world benefits is still in the early stages and the best thing we can do is to not put restrictions or limitations on how or why it is used or collected so that we don’t accidentally prevent monumental changes in how we do things from happening.


Ken WaschDenys Emmert is the Public Policy intern at SIIA. He has a degree in marketing and political science from Florida State University.

Data’s Big Impact on the World

In January, SIIA announced its commitment to Data-Driven Innovation as a top policy priority in 2013.  As part of this initiative, its white paper Data Driven Innovation a Guide for Policymakers: Understanding and Enabling the Economic and Social Value of Data will be released soon.

Now an article that reinforces the points made in the white paper has appeared in the May/June issue of Foreign Affairs Magazine – The Rise of Big Data: How It’s Changing the Way We Think About the World by Kenneth Cukier and Viktor Mayer-Schoenberger.  The most important point of both the SIIA white paper and the Foreign Affairs articles is this:

New data collection and analytical techniques allow the use of massive amounts of data to help businesses and governments make everyday things work better.

The Foreign Affairs article described three main changes in how we go about approaching data.  In addition to collecting and using large amounts of data we need to accept data that is messy or unorganized instead of just using data that is clean or organized.  Furthermore we need to place less emphasis on causation and instead look at correlation.  In short, we should ask “what?” and not “why?”

Some of the challenges to data-driven innovation are due to people applying the same mindset that worked in the past when we didn’t have the ability to utilize large amounts of data.  One example from the article highlights that for a long time, people tried unsuccessfully to make computers “learn how to do something.”  With increased amounts of data and analytical capabilities, we are instead giving computers a massive amount of data and empowering them to use it to come up with probabilities of something happening.  In the past, analysts were usually limited to smaller amounts of data, and therefore the data inputs had to be precise and accurate.  But now with the increased amount of data, it does not have to be as precise or accurate because the sheer volume of data can fill in these gaps. 

The case study that perhaps provides the best example of what big data can be used for in the paper is about Google, and how they were able to use search records to track outbreaks of the flu.  In 2009 Google “took the 50 million most commonly searched terms between 2003 and 2008 and compared them against historical influenza data from the Centers for Disease Control and Prevention.” By running all of this data through algorithms, Google was able to come up with a list of 45 terms that had a strong correlation with the CDC’s data on the flu.  The biggest difference in how Google and the CDC were able to come to this conclusion is that Google was only concerned with how those terms were related and what that meant, not why people were getting sick as the CDC was asking.  By approaching the data in this way, Google was able to come up with an answer in close to real-time instead of several weeks, which in the case of pandemic is crucial to saving lives.

The article also highlights that one of the biggest potential concerns associated with big data is its ability to create “Big Brother.”  So to be sure, there are some risks associated with data driven innovation, but the article appropriately concludes that there is no such thing as bad data. Rather, the potential use of massive amounts of data to achieve positive outcomes in the way we live far outweighs the potential concerns.   

Ken WaschDenys Emmert is the Public Policy intern at SIIA. He has a degree in marketing and political science from Florida State University.

SIIA Joins Broad Call for Email/Cloud Privacy

 This Thursday the Senate Judiciary Committee will take up legislation to reform the outdated Electronic Communications and Privacy Act (ECPA) to correct the current law’s double-standard that inappropriately provides for a lower level of privacy for communications stored remotely, or “in the cloud.” S.607, Electronic Communications Privacy Act Amendments Act of 2013, is also referred to as the “warrant requirement” because it would level the playing field for law enforcement access to electronic content, setting a warrant as the consistent standard, regardless of how or where the content is stored.  In a show of the broad support for the effort, SIIA joined with a broad group of organizations and companies urging Committee members to support the proposal—alleviating any lingering doubt about the broad support for ECPA reform, the letter brings together such a diverse set interests as the ACLU, Americans for Tax Reform, to the American Library Association and every segment of the technology industry.


Ken WaschDenys Emmert is the Public Policy intern at SIIA. He has a degree in marketing and political science from Florida State University.

Using Public Record Data to Fight Income Tax Refund Fraud

Currently on the Federal, State, and local level, the government is seeking to constrain budget deficits while still providing their constituents with the same services. This morning in the Cannon building I learned how the creative use of public record data can save state governments money. I had the opportunity to listen to a discussion hosted by Representative Tom Price (GA-6) and Lexis Nexis about how the state of Georgia has started to use commercial data analytics services to prevent tax refund fraud and use the savings to help fix their budget problems.

Andy Bucholz, Lexis Nexis Risk Solutions and Doug MacGinnitie, Commissioner of the Georgia Department of Revenue led the discussion entitled Combating Income Tax Refund Fraud Using Preventative and Investigative Tools: Lessons Learned From Georgia. Doug started off by noting that he became more concerned about the issue a few years ago when his wife had her identity used fraudulently for a tax refund.

Identity theft for government payments is one of the largest and fastest growing types of crime in the country. In 2006 the state of Georgia blocked 32,987 fraudulent tax refund returns worth some $26,900,000. By last year those numbers had risen to 158,462 fraudulent returns and $98,700,000 in payouts by the state. This increase is related mainly to two things:

  1. The online migration of government services such as tax returns and applying for unemployment. This allows people to apply multiple times and in multiple states from one place all with little actual risk of being caught.
  2. Sending tax refunds out on pre paid debit/EBT cards or electronic payments. Once funds are withdrawn from these cards, they are hard to trace.

In any year the state of Georgia processes around 4 million tax returns, 3 million of which receive some type of refund. All of this is done in less than 4 months. This means it is hard to spend a lot of time looking at any individual tax return to assess its risk of being fraudulent. How could the state of Georgia reduce the tax fraud rate in these circumstances? On its own, it tried to create and use fraud detection rules based on multiple filings that used the same name, address, and Social Security Number. This helped to prevent some fraudulent tax refunds but not enough.

Starting in 2011 the state of Georgia began using a data analytics service from Lexis Nexis to help flag potential fraudulent tax refunds. Lexis Nexis used public record data from across the states to identify potential fraudulent returns by flagging returns where there was a sudden change in address or change in the number of dependents. Other information such as incarcerations in other states was used as red flag indicators as well.

These potentially fraudulent applicants for a tax refunds were automatically mailed a letter asking them to confirm their identity online. This was done by asking them a series of questions that only they would know such as which of these addresses did you live at 8 years ago? Or what type of car do you own currently? All of which Lexis Nexis is able to confirm through public records data. If they were unable to pass this identity check, that was a good sign that they were not who they pretended to be.

Last year Georgia spent just over $3,000,000 on this data analytics service while saving over $23,000,000 in fraudulent tax return refunds. Georgia believes they are ahead of most states in preventing tax return refund fraud and hope that being ahead of the curve will prevent future fraud. The fraudsters will go to other states because it is easier to commit the fraud there!

One of the concerns Georgia had initially when trying to solve this problem was that by cracking down on tax refund fraud they would drastically slow down processing the real returns. Real returns accounted for 96% of the returns filed in the state and a processing delay could adversely affect their citizens who depended on the returns to be able to cover day to day living expenses. But this fear did not materialize. On average Lexis Nexis was able to flag or confirm a tax return in only five extra minutes and without having to force the state to use additional employees to do it.

Georgia has gotten good at blocking tax refund fraud but that it is still hard to actually prosecute people for it. As a result, the fraudsters really have nothing to lose and the number of fraudulent claims will continue to increase in the future. Georgia will try to stay in front of the problem so that they can protect the state, individuals within the state, and the state’s tax payers and use the savings to continue to provide the government services their citizens expect. By continuing to use good data analytics services, the state is also reducing the incidence of identity theft – which takes 12-18 months to completely repair. As David LeDuc pointed out in a recent blog this is a major and growing problem.

The states of South Carolina, Louisiana, and Connecticut are also using Lexis Nexis to prevent tax refund fraud from happening in their states so that they can continue to provide for their constituents. The use of data analytics for this worthwhile purpose is only one example of how data can be used for the public good.


Ken WaschDenys Emmert is the Public Policy intern at SIIA. He has a degree in marketing and political science from Florida State University.