As big data grow beyond buzzwords, leading organizations are looking to innovators who understand the possibilities and can help them do something useful with all that information.
Yesterday at The Economist's Ideas Economy: Information conference being held in Santa Clara, California, attendees heard from some of the industry’s most knowledgeable experts, including SAS CEO Jim Goodnight.
Goodnight appeared on a panel with:
- EMC's President and Chief Operating Officer, Pat Gelsinger.
- Cisco's Senior Vice President of Engineering and Chief Technology Officer, Padmasree Warrior.
Matthew Bishop, US Business Editor for The Economist, moderated the discussion. The topic: how companies can not only survive today's data deluge but capitalize on it.
What, exactly, do you mean by big data?
The discussion opened at a very fundamental level with Bishop's request for each of the panelists to explain what they think the term big data means. What is different about it, what is transformational?
Warrior said Cisco is looking at big data in terms of the opportunities inherent in a value shift from simply collecting large amounts of information to clearly understanding what that information can do for you. "It's not just about how do you collect data and store data, but how do you share that data and create visualization for it? How do you do analytics with it? How do you search for it? … The kind of information and the volume of information are changing; therefore, the value becomes what do you do with that information."
With EMC in the business of data storage, Gelsinger noted the crossover to petabytes of data and the increasingly diverse nature of data, including unstructured data sources such as video and text. “It’s being able to analyze that in increasingly real-time, collaborative manners,” he said. “Not just to ‘cloudify’ data sets – i.e., transform IT – but truly to transform business.”
Goodnight said SAS faces big data challenges in looking for new and better ways to do predictive modeling for customers who have enormous data sets and short reporting windows. He cited a banking solution that has reduced risk computation times down from 18 hours to 15 minutes and a retail solution that examines two years’ worth of point-of-sale data in an hour and a half to optimize pricing. “The data is big, but it gets confounded by the fact that you want to analyze it and do forecasting on it,” he said. “Using a large array of grid computers, we can harness the power of all those [data points]in parallel. It lets you do things you never thought about before because the data was just way too big. We’re having a lot of fun right now working on these kinds of problems.”
Where do you see the greatest opportunities in big data over the next five years?
“The idea of doing a risk computation before you make a portfolio move, before you make a sale or a trade … most risk managers have never even thought of doing such a thing. It’s not possible, but what if it were? That would change a lot of people’s behavior,” Goodnight said.
Bishop argued that some would say it wasn’t necessary to look at the full data set, that a subset would suffice.
The audience erupted in laughter at Goodnight’s reply. “Well, that’s what you said when you couldn’t look at the whole data set.” The very best fit, he said, is always going to be found looking at all the data.
Bishop consented that disasters such as the recent financial meltdown and mortgage crisis could have possibly been prevented with risk computation on historical data of this sort.
“What [banks and mortgage lenders]were doing was essentially taking bundles of thousands of loans and looking at them as one,” said Goodnight. “Now we have the computing power to break them down into individual loans and compute the probability of risk.”
Gelsinger added that essentially every sector can benefit from this type of analysis. “Everyone is now saying, ‘Ok, I have this. Now how do I monetize it?’” He sees great opportunity for the health care industry with its enormous data in patient records, imaging, clinical data and more. “I see exciting things going on as people are modifying these extraordinary data sets from information that is available today,” he said.
Warrior sees opportunity in two distinct areas she defined as the Internet of Things and the Internet of people. The first will be driven by applications such as smart grids and machine-to-machine communication. The Internet of people will be driven more by applications such as video communication and telepresence. She, too, referred to opportunities in the health care industry. “For doctor-patient communication or even eldercare in the home, these applications will be transforming,” she said.
Later in the conversation, Goodnight discussed some of the work SAS is doing, along with the CEO Roundtable on Cancer, to provide a federated site for data from oncology clinical trials.
What are the important questions facing organizations in predicting their future?
“Every marketer in the world wants to predict human behavior,” said Goodnight. “A marked part of our business centers around marketing automation and what we call customer intelligence, where we try to help select the very best target audience for a particular offer that is being made.” He also noted the importance of discovering who the influencers are within social networks impacted by or created by an organization’s offerings.
Reinforcing the idea of big data, Goodnight shared the example of Catalina Marketing, which looks at 600 billion records over 195 different loyalty cards to make sure the coupons you receive at point-of-sale are on target for each buyer.
Bishop countered with the thought that many experts believe that five or six data points can predict the future.
“I think the idea that five or six data points will tell the future is not correct at all,” Goodnight said. “With the massive amounts of data that are available through transactions we do, we’re going to see more and more people come up with information based on all of that data.”
“It’s about mining and finding those needles in haystacks,” said Gelsinger. “It’s about full synthesis of data.”
Are we seeing value yet?
Bishop noted his personal dissatisfaction with recommended books on a well-known online retailer site. “How close are we to these things being more useful than annoying?” he asked.
“One of the guys that used to work in the analytics department there is now on my staff,” said Gelsinger. The audience chuckled, but Gelsinger had more to say about Bishop’s experience. “We now take a whole set of data that used to take them about twice a day to run analytics on and do it twice a minute. Many of these things are fairly simplistic today when they give you those recommendations. In the future it’s going to be a real-time analytics analysis that is done on a very broad set of data that is highly personalized. So, I think we’re actually very close to being able to fix your book problem,” he said.
Warrior discussed the importance of user accountability in creating real value. “It depends on how we as users exercise our right to the data.” She recommended setting preferences, updating profiles, liking content when it is something preferred. “Nothing is static. Things change. We often underestimate the user input of the data and the value that we [as users]can bring.”
Big data will the change structure of industry. What are the biggest business challenges that will make or break your organization?
"Big data has been around for a long time, we just haven't known what to do with it," said Goodnight. "We are now beginning to find ways to do analysis on it, relying on companies like EMC to provide machines like Greenplum with a shared nothing database."
“You’re now facing a really serious competitive challenge from IBM,” said Bishop.
“We’re in the analytics business,” said Goodnight. “They sell hardware, don’t they? Just because you buy a small stat company doesn’t make you an analytics company overnight.”
“I can see you’re not worried,” Bishop said.
Gelsinger expressed different challenges for his business. “We built a set of businesses and infrastructure that largely was built around enterprise-structured data. We have to embrace this scaled-out framework, these unstructured-file-oriented databases,” he said. “Like any of these major trends, there are going to be companies that are buried by the trend, there are going to be those companies that ride the trend. There are going to be dramatic convulsions in the industry. If we don’t grab it and run with it, we’re going to be one of those companies left behind. We’re making a bold move to grab it and run with it.”