Webinar: Seven Ways Text Analytics Tames Big Data

Your customers have big data issues, and they’re looking to software vendors to help them solve these issues.  Whether your software is designed for archiving, collaboration, content management, CRM/social media or compliance – it’s getting harder for enterprises to sort through all their unstructured data.

Listen to the pre-recorded webcast to hear and see examples from SIIA member Content Analyst of where content-aware advanced analytics is being applied within software products to address the issues of big data, and learn the advanced analytics secrets held by dozens of software companies in legal, eDiscovery and US Intelligence for years.  This webinar addresses these questions and more:

  • How and why are software companies embedding concept-aware text analytics engines into their products to bring order to the chaos of big data in a highly automated, consistent and defensible way?
  • Can these engines really synthetically transfer human knowledge to large unstructured data collections to precisely and accurately classify big data for reduction, organization, analysis and findability?
  • Can a machine be taught to conceptually understand specific categories and apply those categories to each piece of unstructured content with a much higher level of precision than a human, and in any language?

Presenters
John Falehi
Chief Strategy Officer, Content Analyst

Steven Toole
Vice President of Marketing, Content Analyst

Click here to view a copy of Content Analyst’s slides.

There’s No Bad Data, Only Bad Uses of Data

Steven Lohr explored the roots of the debate over personal data and privacy in a timely article in the New York Times this Sunday. An important theme of his article is best summed up by Craig Mundie of Microsoft, who says, “There’s no bad data, only bad uses of data.” At SIIA, we concur that if we want privacy protections to be truly meaningful, we should move away from restricting data collection, and instead work to prevent its harmful use.

Lohr’s article first describes a scenario in which a person is harmed because data from his or her online click stream is being collected. But even though this example is being used to illustrate the danger of data collection, it winds up confirming that true harm comes not from the collection, but the misuse of data. It might be harmful to an Internet user if predictions and inferences about his or her web travels make their way to a health insurer or potential employer. But the harm stems from data misuse, not its collection!

The online advertising industry collects click stream data now. It wants to use this data to improve the effectiveness and value of its online advertising. And the industry has already pledged to wall off online data from harmful use by  isolating it from eligibility decisions regarding employment, health care, credit and insurance.

It’s crucial to allow industries to continue to collect data so it can be used to benefit society. For instance, data driven innovation’s contributions in the educational sphere have been well-documented. Two recent reports by the Center for Technology Innovation at the Brookings Institution, called Educational Success Stories and Big Data for Education, show how data analytic techniques can help schools better understand students’ learning approaches and challenges. Instead of relying on static, uniform tests, “instructors can analyze what students know and what techniques are most effective for each pupil. By focusing on data analytics, teachers can study learning in far more nuanced ways.”

There are many uses of data that are beneficial to society, and public policy should not obstruct them by constructing arbitrary barriers to data collection. The best way to respect individual privacy in the age of big data is to protect people from harmful uses of data. Industries like online advertising are already moving in this direction by developing best practices and self-regulation. Blanket prohibitions on data collection will only do more harm than good.


Mark MacCarthy, Vice President, Public Policy at SIIA, directs SIIA’s public policy initiatives in the areas of intellectual property enforcement, information privacy, cybersecurity, cloud computing and the promotion of educational technology. Follow the SIIA Public Policy team on Twitter at @SIIAPolicy

IIS Breakthrough Recap: Now That Is Big Data

Written by Deborah Richman, Consultant, Zions Bank

Deborah Richman, Consultant, Zions Bank

“We just had our 10 petabyte party,” declared Brewster Kahle, to Information Industry Summit attendees this week. Universal access to all knowledge may sound like a pipe dream, yet Kahle and his Internet Archive team have been doggedly pursuing this goal and using up petabytes to collect, digitize and share content.

The Internet Archive is best known for creating the de-facto web historical repository. Since 1996, Kahle’s team has visited “every page on every web site, every two weeks.” There are more than 240 billion URLs in the archive today. For better or worse, anyone may access them at the WayBack Machine.

Fortunately web tools and sources have improved, and Kahle also relies on others to help. At this point, there are some 1,700 curated collections from 200 places included in the archives. “Personal digital archives are next,” says Kahle. “But our stuff is all over the place. And things are gone.”

It’s more than websites

The archive.org team, comprised of 150 staffers, has been making books, audio, video and TV news available at a dizzying rate. Kahle reported on archival progress for SIIA members:

  • Books: 3mm e-books, 500k for blind and 300k modern e-books.
  • Audio: 1mm items in 100 collections, including 100k concerts from 5k bands.
  • Video: 2 – 3k movies, plus industrial, educational, other specialty films.
  • TV news: 20 channels collected since 2000, and TV news for three years.

The Internet Archive sidesteps copyright issues by behaving like a library consortium. Libraries and individuals are free to make their multimedia collections available online. Then patrons, aka site visitors, are able to view unrestricted materials or check out others from the holdings.

No modern-day industry titans, like Andrew Carnegie, have come forth and made this digital access dream come true. Instead, a non-profit organization filled with dreamers and technologists have been knocking down access barriers to digitized content for two decades. It’s pretty sweet.

____________________________________________________________________________

Debby Richman spent her formative years at D&B, leading the reference business from print to online and web offerings. She has since held digital leadership roles at Overstock, About.com, Looksmart, Starz, Collarity and Zions Bank.

Breakthrough Talk Recap: Using Big Data to Build Prognostication Capabilities

Marie Giangrande, Public Notions

In a back stage interview, Factual Inc. CEO Gil Elbaz and Cortera Inc. CEO Jim Swift discuss the drivers and requirements to build a Big Data capability plan.

By Marie Giangrande, Public Notions

Tracking Behaviors Fuel Big Data
“It’s all about tracking events and behaviors in order to improve the accuracy of your decision making” asserts Cortera CEO Jim Swift. Cortera produces credit worthiness rankings from tracking the purchases and payments that a Company conducts. “When evaluating companies it’s good to hear what they are saying, but most important is to see their actual behaviors; tracking actual payments and purchases, for example will give a more accurate prediction of a company’s credit worthiness” continues Swift. Factual’s CEO, Gil Elbaz, agrees: “People expect and need the context to correctly interpret data… stitching together the facts and illustrating the backdrop is what Big Data is all about.” Factual offers a path for companies to source external data, enrich their own data and incorporate a variety of new data sets.

Prognostication CapabilitiesThe Next Competitive Battle
Behavioral targeting has been used extensively by online companies to target advertisements. Now, this concept is gaining broader appeal as a long term competitive advantage enabling Information Companies to match their content to their client’s workflow and distribution preferences. Companies can use behavior tracking to build predictions about client preferences, to identify partnerships and to develop value added services.

The pursuit of prognostication capabilities has a tremendous consequence: It redefines the importance of data in an organization. It puts an emphasis on data management capabilities. Can I handle streaming data from Twitter feeds or social media outlets? Is a ‘just in time approach’ to data collection needed? Can I enrich my data in order to monetize it? Is the data accurate and extensible?

The Strategic Information Spine

“A lot of companies are living with inefficient collection and maintenance of data. They ignore missing data and inaccurate data because they do not associate the underlying data to their ability to compete” comments Factual’s CEO, Gil Elbaz.

However, as soon as executives link their ability to compete with the value and accuracy of their data, the ROI presents itself. “It’s like an Information Spine” reflects Cortera’s CEO, Swift.”For information companies, the core stream of data is the spine and off this, hangs all their products and services.” The implication is that a company’s ability to compete will come back to the design, sourcing, accuracy and strategic health ‘of the spine’.

An Asset or Liability?

As firms embrace the use of big data for a competitive advantage, it changes all the questions and answers. It leads companies to develop a more strategic view: they identify data assets and data liabilities around maintenance and accuracy.

“Many companies have not yet thought through which data sets provide a competitive advantage and which ones won’t.” comments Factual’s Elbaz. “If you can buy the data, it is most likely not an asset” continues Elbaz. Data that is missing, inaccurate and difficult to maintain may not only be an opportunity cost, but it could actually be a liability, especially if not kept fresh.

For non-proprietary data, companies are developing Data Acquisition plans to give them new agility along with a managed cost structure. “Just as IT Managers embraced Open Source code, now Business Managers are embracing Open Sourced Data” Factual CEO Elbaz concludes. Elbaz points to dozens of internal databases that should be deleted and licensed from readily available, external, sources. He asks, “Why allocate internal resources to manage data you don’t have to?”

Finding the Skills
But finding the skills to manage the internal build, the external licensing and the data architecture is hard to find. “The biggest problem is the lack of people and talent needed for companies to bootstrap their efforts” claims Cortera’s Swift. Data architects and data developers are very different from software developers and IT managers. And both CEO’s agree this is not -necessarily- the role of a CTO or CIO.

Who, inside your company, could nurture and expand your newly found Data assets? The first step, it seems, is for us to prognosticate on that.

This Week in Public Sector Innovation

OMB to push Strategic Sourcing: This week OMB issued a memorandum expanding the use of strategic sourcing to include commodity IT purchases. In addition the memo establishes Strategic Sourcing Accountable Officers within the CFO Act agencies to be appointed by January 15, 2013. It also establishes a Strategic Sourcing Leadership Council (SSLC), chaired by OFPP, with representatives from DoD, Energy, HHS, DHS, VA, GSA and NASA and requires the SSLC to submit to OMB a set of recommendations for management strategies for goods and services to insure the government receives the most favorable offer. Lastly it requires the SSLC to identify at least 5 products or services for which new government-wide acquisition vehicles or management approaches are needed and requires GSA to implement 5 new government-wide strategic sourcing solutions in each of FY13 and FY14 and increase transparency of prices paid for common goods. Read the memo here.

GSA pulls the plug on Apps.gov: The federal government pulled the plug on Apps.gov this week. The cloud application storefront, which was the brainchild of former Federal CIO, Vivek Kundra, was intended to provide a one-stop-shop for cloud apps for the federal government and make it easier for federal IT personnel to acquire cloud services. The initiative never took off as intended. GSA didn’t give a reason for decommissioning the initiative, but noted that everything that was available through Apps.gov, would still be available through Schedule 70. Information Week has a story.

NextGov Prime highlights procurement reform, big data: NextGov held its first-ever Prime Conference at the Ronald Reagan Building this week. The event included a keynote panel featuring Rep. Darrell Issa (R-CA) chairman of the House Oversight and Government Reform Committee and Rep. Gerry Connelly (D-VA), ranking member of the panel’s Technology Subcommittee, two leaders pushing an update to the 1996 Clinger-Cohen Act. The intent of the legislation, which SIIA has been tracking closely and which is expected to be introduced early in the next Congress, is to improve the speed and efficiency of federal IT purchasing. FCW has the wrap up. The event also had a heavy focus on big data and how data analytics can make the government more effective. FCW covers that angle as well.


Michael Hettinger is VP for the Public Sector Innovation Group (PSIG) at SIIA. Follow his PSIG tweets at @SIIAPSIG.

5 Key Data Points from IBM’s Big Data Policy Event

IBM briefed policymakers today on how they can leverage big data to save money and address societal challenges. And the timing couldn’t have been better. Congress and federal agencies are trying to do more with less, and many are looking to a directive Obama announced in March, which allocates more than $200,000 a year to big data research and development projects.

Among other speakers, Sen. Dianne Feinstein (D-CA), Rep. Steve Womack (R-AR), and VP For IBM Research Dr. David McQueeney took the stage to explain the power of big data. Here are five of the most compelling data points from the event:

  1. We create 2.5 quintillion bytes of data every day–and this number will continue to grow at an exponential rate. About 90 percent of world’s data was created in the last two years alone.
  2. The amount of data generated per hospital will increase from 167 terabytes to 665 terabytes by 2015, due to the incredible growth of medical images and electronic medical records. Big data will help doctors make better predictions by leveraging huge amounts of clinical information.
  3. The GSA stands to save an estimated $15 million a year by reducing power usage at 50 of the agencies highest energy-consuming buildings, with help from IBM software and sensors.
  4. Over 70% of members of the National Center for Manufacturing Services (NCMS) believe increased adoption of advanced computing would lead to competitive advantages. Yet only 6% of small to medium manufacturers in the US are taking full advantage of high performance computing.
  5. The retail industry misses out on $165 billion in sales each year because stores don’t have the right products in stock. Big data could help them analyze sales trends and better predict their needs.

So what should policymakers do with this knowledge? They should push for public-private partnerships and research to better optimize industries. IBM announced one such partnership at today’s event. With the Lawrence Livermore National Lab, IBM will use high performance computing to help solve problems like improving our electric grid, advancing manufacturing, and discovering new materials. The data behind it is clear: big data can help the US compete.


Laura Greenback is Communications Director at SIIA. Follow the SIIA Public Policy team at @SIIAPolicy.

SIIA Op-Ed: Data-Driven Innovation is an Economic Driver

In a Roll Call op-ed today, SIIA President Ken Wasch explains how data is empowering innovation, and warns policymakers that a fixed regulatory approach could stunt economic growth.

The IT ecosystem is evolving at unprecedented speed, and data is becoming a driver of economic and social growth. Cloud computing, the ubiquity of smartphones, and improved bandwidth are fueling a new era of data-driven innovation, Wasch says.

“A range of previously unimaginable applications of data-driven innovation are already being produced — or will be in the near future. These innovations are making people’s lives better and safer and more prosperous, while also increasing energy efficiency and saving money.”

Wasch’s sentiment echoes a forum hosted earlier this month by the National Institute of Standards and Technology and the University of Maryland. Attendees like Google, the National Institutes of Health, and Lockheed Martin came together to discuss the ways data can help address a range of national priorities. The opportunities are vast.

“Right now, hospitals are providing better care by analyzing data about the triage process and using that information to eliminate wasteful steps that prevent patients from getting to the doctor quickly. Traffic-management centers are processing millions of cellphone and GPS signals, combining them with a wide range of other data about car speeds, weather conditions and more to assess road conditions in real time and avoid traffic jams. And financial services companies can collect and integrate customer transaction information in real time to quickly identify questionable patterns and proactively enact new processing rules to reduce fraud.”

But if this technological and economic evolution is to truly take hold, it needs support from policymakers who can ensure that the conversation stays focused on how to best benefit customers and the economy at large. A fixed regulatory approach would only stifle innovation and hurt consumers. If industry and policymakers can work together, we can safeguard consumers and unleash data’s enormous potential for transformative growth.


Laura Greenback is Communications Director at SIIA. Follow the SIIA Public Policy Team at @SIIAPolicy