IIS Breakthrough Recap: Now That Is Big Data

Written by Deborah Richman, Consultant, Zions Bank

Deborah Richman, Consultant, Zions Bank

“We just had our 10 petabyte party,” declared Brewster Kahle, to Information Industry Summit attendees this week. Universal access to all knowledge may sound like a pipe dream, yet Kahle and his Internet Archive team have been doggedly pursuing this goal and using up petabytes to collect, digitize and share content.

The Internet Archive is best known for creating the de-facto web historical repository. Since 1996, Kahle’s team has visited “every page on every web site, every two weeks.” There are more than 240 billion URLs in the archive today. For better or worse, anyone may access them at the WayBack Machine.

Fortunately web tools and sources have improved, and Kahle also relies on others to help. At this point, there are some 1,700 curated collections from 200 places included in the archives. “Personal digital archives are next,” says Kahle. “But our stuff is all over the place. And things are gone.”

It’s more than websites

The archive.org team, comprised of 150 staffers, has been making books, audio, video and TV news available at a dizzying rate. Kahle reported on archival progress for SIIA members:

  • Books: 3mm e-books, 500k for blind and 300k modern e-books.
  • Audio: 1mm items in 100 collections, including 100k concerts from 5k bands.
  • Video: 2 – 3k movies, plus industrial, educational, other specialty films.
  • TV news: 20 channels collected since 2000, and TV news for three years.

The Internet Archive sidesteps copyright issues by behaving like a library consortium. Libraries and individuals are free to make their multimedia collections available online. Then patrons, aka site visitors, are able to view unrestricted materials or check out others from the holdings.

No modern-day industry titans, like Andrew Carnegie, have come forth and made this digital access dream come true. Instead, a non-profit organization filled with dreamers and technologists have been knocking down access barriers to digitized content for two decades. It’s pretty sweet.

____________________________________________________________________________

Debby Richman spent her formative years at D&B, leading the reference business from print to online and web offerings. She has since held digital leadership roles at Overstock, About.com, Looksmart, Starz, Collarity and Zions Bank.

Breakthrough Talk Recap: Using Big Data to Build Prognostication Capabilities

Marie Giangrande, Public Notions

In a back stage interview, Factual Inc. CEO Gil Elbaz and Cortera Inc. CEO Jim Swift discuss the drivers and requirements to build a Big Data capability plan.

By Marie Giangrande, Public Notions

Tracking Behaviors Fuel Big Data
“It’s all about tracking events and behaviors in order to improve the accuracy of your decision making” asserts Cortera CEO Jim Swift. Cortera produces credit worthiness rankings from tracking the purchases and payments that a Company conducts. “When evaluating companies it’s good to hear what they are saying, but most important is to see their actual behaviors; tracking actual payments and purchases, for example will give a more accurate prediction of a company’s credit worthiness” continues Swift. Factual’s CEO, Gil Elbaz, agrees: “People expect and need the context to correctly interpret data… stitching together the facts and illustrating the backdrop is what Big Data is all about.” Factual offers a path for companies to source external data, enrich their own data and incorporate a variety of new data sets.

Prognostication CapabilitiesThe Next Competitive Battle
Behavioral targeting has been used extensively by online companies to target advertisements. Now, this concept is gaining broader appeal as a long term competitive advantage enabling Information Companies to match their content to their client’s workflow and distribution preferences. Companies can use behavior tracking to build predictions about client preferences, to identify partnerships and to develop value added services.

The pursuit of prognostication capabilities has a tremendous consequence: It redefines the importance of data in an organization. It puts an emphasis on data management capabilities. Can I handle streaming data from Twitter feeds or social media outlets? Is a ‘just in time approach’ to data collection needed? Can I enrich my data in order to monetize it? Is the data accurate and extensible?

The Strategic Information Spine

“A lot of companies are living with inefficient collection and maintenance of data. They ignore missing data and inaccurate data because they do not associate the underlying data to their ability to compete” comments Factual’s CEO, Gil Elbaz.

However, as soon as executives link their ability to compete with the value and accuracy of their data, the ROI presents itself. “It’s like an Information Spine” reflects Cortera’s CEO, Swift.”For information companies, the core stream of data is the spine and off this, hangs all their products and services.” The implication is that a company’s ability to compete will come back to the design, sourcing, accuracy and strategic health ‘of the spine’.

An Asset or Liability?

As firms embrace the use of big data for a competitive advantage, it changes all the questions and answers. It leads companies to develop a more strategic view: they identify data assets and data liabilities around maintenance and accuracy.

“Many companies have not yet thought through which data sets provide a competitive advantage and which ones won’t.” comments Factual’s Elbaz. “If you can buy the data, it is most likely not an asset” continues Elbaz. Data that is missing, inaccurate and difficult to maintain may not only be an opportunity cost, but it could actually be a liability, especially if not kept fresh.

For non-proprietary data, companies are developing Data Acquisition plans to give them new agility along with a managed cost structure. “Just as IT Managers embraced Open Source code, now Business Managers are embracing Open Sourced Data” Factual CEO Elbaz concludes. Elbaz points to dozens of internal databases that should be deleted and licensed from readily available, external, sources. He asks, “Why allocate internal resources to manage data you don’t have to?”

Finding the Skills
But finding the skills to manage the internal build, the external licensing and the data architecture is hard to find. “The biggest problem is the lack of people and talent needed for companies to bootstrap their efforts” claims Cortera’s Swift. Data architects and data developers are very different from software developers and IT managers. And both CEO’s agree this is not -necessarily- the role of a CTO or CIO.

Who, inside your company, could nurture and expand your newly found Data assets? The first step, it seems, is for us to prognosticate on that.

This Week in Public Sector Innovation

OMB to push Strategic Sourcing: This week OMB issued a memorandum expanding the use of strategic sourcing to include commodity IT purchases. In addition the memo establishes Strategic Sourcing Accountable Officers within the CFO Act agencies to be appointed by January 15, 2013. It also establishes a Strategic Sourcing Leadership Council (SSLC), chaired by OFPP, with representatives from DoD, Energy, HHS, DHS, VA, GSA and NASA and requires the SSLC to submit to OMB a set of recommendations for management strategies for goods and services to insure the government receives the most favorable offer. Lastly it requires the SSLC to identify at least 5 products or services for which new government-wide acquisition vehicles or management approaches are needed and requires GSA to implement 5 new government-wide strategic sourcing solutions in each of FY13 and FY14 and increase transparency of prices paid for common goods. Read the memo here.

GSA pulls the plug on Apps.gov: The federal government pulled the plug on Apps.gov this week. The cloud application storefront, which was the brainchild of former Federal CIO, Vivek Kundra, was intended to provide a one-stop-shop for cloud apps for the federal government and make it easier for federal IT personnel to acquire cloud services. The initiative never took off as intended. GSA didn’t give a reason for decommissioning the initiative, but noted that everything that was available through Apps.gov, would still be available through Schedule 70. Information Week has a story.

NextGov Prime highlights procurement reform, big data: NextGov held its first-ever Prime Conference at the Ronald Reagan Building this week. The event included a keynote panel featuring Rep. Darrell Issa (R-CA) chairman of the House Oversight and Government Reform Committee and Rep. Gerry Connelly (D-VA), ranking member of the panel’s Technology Subcommittee, two leaders pushing an update to the 1996 Clinger-Cohen Act. The intent of the legislation, which SIIA has been tracking closely and which is expected to be introduced early in the next Congress, is to improve the speed and efficiency of federal IT purchasing. FCW has the wrap up. The event also had a heavy focus on big data and how data analytics can make the government more effective. FCW covers that angle as well.


Michael Hettinger is VP for the Public Sector Innovation Group (PSIG) at SIIA. Follow his PSIG tweets at @SIIAPSIG.

5 Key Data Points from IBM’s Big Data Policy Event

IBM briefed policymakers today on how they can leverage big data to save money and address societal challenges. And the timing couldn’t have been better. Congress and federal agencies are trying to do more with less, and many are looking to a directive Obama announced in March, which allocates more than $200,000 a year to big data research and development projects.

Among other speakers, Sen. Dianne Feinstein (D-CA), Rep. Steve Womack (R-AR), and VP For IBM Research Dr. David McQueeney took the stage to explain the power of big data. Here are five of the most compelling data points from the event:

  1. We create 2.5 quintillion bytes of data every day–and this number will continue to grow at an exponential rate. About 90 percent of world’s data was created in the last two years alone.
  2. The amount of data generated per hospital will increase from 167 terabytes to 665 terabytes by 2015, due to the incredible growth of medical images and electronic medical records. Big data will help doctors make better predictions by leveraging huge amounts of clinical information.
  3. The GSA stands to save an estimated $15 million a year by reducing power usage at 50 of the agencies highest energy-consuming buildings, with help from IBM software and sensors.
  4. Over 70% of members of the National Center for Manufacturing Services (NCMS) believe increased adoption of advanced computing would lead to competitive advantages. Yet only 6% of small to medium manufacturers in the US are taking full advantage of high performance computing.
  5. The retail industry misses out on $165 billion in sales each year because stores don’t have the right products in stock. Big data could help them analyze sales trends and better predict their needs.

So what should policymakers do with this knowledge? They should push for public-private partnerships and research to better optimize industries. IBM announced one such partnership at today’s event. With the Lawrence Livermore National Lab, IBM will use high performance computing to help solve problems like improving our electric grid, advancing manufacturing, and discovering new materials. The data behind it is clear: big data can help the US compete.


Laura Greenback is Communications Director at SIIA. Follow the SIIA Public Policy team at @SIIAPolicy.

SIIA Op-Ed: Data-Driven Innovation is an Economic Driver

In a Roll Call op-ed today, SIIA President Ken Wasch explains how data is empowering innovation, and warns policymakers that a fixed regulatory approach could stunt economic growth.

The IT ecosystem is evolving at unprecedented speed, and data is becoming a driver of economic and social growth. Cloud computing, the ubiquity of smartphones, and improved bandwidth are fueling a new era of data-driven innovation, Wasch says.

“A range of previously unimaginable applications of data-driven innovation are already being produced — or will be in the near future. These innovations are making people’s lives better and safer and more prosperous, while also increasing energy efficiency and saving money.”

Wasch’s sentiment echoes a forum hosted earlier this month by the National Institute of Standards and Technology and the University of Maryland. Attendees like Google, the National Institutes of Health, and Lockheed Martin came together to discuss the ways data can help address a range of national priorities. The opportunities are vast.

“Right now, hospitals are providing better care by analyzing data about the triage process and using that information to eliminate wasteful steps that prevent patients from getting to the doctor quickly. Traffic-management centers are processing millions of cellphone and GPS signals, combining them with a wide range of other data about car speeds, weather conditions and more to assess road conditions in real time and avoid traffic jams. And financial services companies can collect and integrate customer transaction information in real time to quickly identify questionable patterns and proactively enact new processing rules to reduce fraud.”

But if this technological and economic evolution is to truly take hold, it needs support from policymakers who can ensure that the conversation stays focused on how to best benefit customers and the economy at large. A fixed regulatory approach would only stifle innovation and hurt consumers. If industry and policymakers can work together, we can safeguard consumers and unleash data’s enormous potential for transformative growth.


Laura Greenback is Communications Director at SIIA. Follow the SIIA Public Policy Team at @SIIAPolicy

Big Data: A Long Way from Plug-and-Play

We are excited about our partnership with the InfoCommerce Group to produce DataContent 2012, coming up October 9-11 in Philadephia. The conference will focus on discovering the next big thing in publishing: The intersection of  Data, Community and Markets at DataContent 2012. As we lead up to the conference, we will be highlighting posts from the InfoCommerce Blog which focus on the issues and topics we will be discussing at DataContent 2012. Enjoy!

Big Data: A Long Way from Plug-and-Play by Nancy Ciliberti

One of the key markets for all the new big data analytics providers is marketers themselves, a group that should be a natural for turning deep customer insight into increased revenue. But are they ready?

Well, according to a study by Columbia Business School and the New York American Marketing Association, although nearly all (91 percent) of marketers value and want to make data driven decisions, 29 percent report that their marketing departments have “too little or no customer/consumer data.” Thirty nine percent of the marketers surveyed said their data is collected too infrequently and “not real-time enough.” Two in five marketers admit that they cannot turn their data into actionable insight and about an equal number (36%) report that they have “lots of customer data,” but “don’t know what to do with it.”

Read more…

__________________________________________________

This blog post is brought to you by the InfoCommerce Group (ICG). ICG and SIIA are teaming up to bring you DataContent 2012, scheduled October 9-11 in Philadelphia. Please visit the website details on the conference schedule, speakers and registration.

All About the Cloud – Why the Buzz?

Cloud Computing is changing the way every one of us consumes IT services and delivers applications. From SaaS, IaaS, PaaS, to Big Data – the industry’s landscape is evolving. So what’s the next greatest Buzz? Check out this video and find out what everyone is saying:

Register or sponsor the Industry’s Leading ISV conference representing the entire cloud ecosystem, All About the Cloud.

 


Katie CarlsonKatie Carlson is Program Manager for the SIIA Software Division.