I recently caught up with Dr. Tom Davenport, analytics thought-leader and author of Big Data @ Work, in Dublin, where we talked about big data, the Internet of Things and Hadoop. I'll be sharing the conversation here with you in two parts. You'll find part one below, and you can check back next week for part two.
John Farrelly: I'd like to start by discussing your new book, Big Data @ Work, and how it dispels the myths and outlines the opportunities concerning big data. What have you come across since starting to write the book last year?
Tom Davenport: A year ago, it became apparent that big companies were starting to experiment with big data. They were telling me, "We know that analytics is important, we get that we need big data; but how do we seamlessly integrate big data and our existing small data analytics?" Also, "How do we increase the speed and scale on which we're using this, and how do we move towards incorporating machine-learning in addition to the traditional hypothesis-driven approach?" A lot of organizations have been doing this for some time, but I also spoke to companies such as Allied Irish Bank, Icon, UPS and USAA about their pilot projects.
So, the book is a compilation of how large organizations, who had some small data analytics in place, were achieving this. Jill Dyché from SAS drafted the technology chapter, but I also wanted to look at this from a cultural perspective.
John: Did you come across any particularly good examples?
Tom: Well, GE are betting a couple of billion dollars on the industrial Internet saying, "We used to be a big iron company, but now we're a big iron and big data company." They told me that their devices all had sensors in them and now they needed to start extracting the data and making sense of it. The problem that they've faced is that there are almost as many different data formats as there are sensors; there's no common standard. I recall discussing the development of the RFID standard with Sanjay Sarma of MIT. He worked in the lab that coined the term the Internet of Things. Sanjay told me that it took around 15 years to agree on a common standard for RFID data, but he was hopeful that new standards will be agreed upon much more quickly.
Another good example is Monsanto. They came up with an idea called precision planting, where they gave farmers insights into how to optimise crop production. They paid almost a billion dollars for a company which had climate data, they bought a soil data company, they already had a lot of information about plant hybrids and the like. Their aim was to be able to recommend to farmers what to plant, when to plant it, how many seeds to sow per cubic centimeter, how much water to use, when was the best time to add herbicides and pesticides and when to harvest the crop.
John: So were farmers prepared to pay for this advice?
John: Interesting, did you find any more examples?
Tom: The whole area of wearables, technologies which can monitor your heart rate, blood pressure etc. is going to be very significant for healthcare companies and organizations. You can foresee a time when patients self-monitor and appreciate the potential impact upon clinical trials.
Also, I found that Singapore has a really interesting traffic monitoring system which uses sensor data from taxis and roadsides and so forth, but it's still quite BI focussed rather than analytical. It was a huge leap just to pull all of the data together and visualize it in one place.
John: I imagine that with all this proliferation of data, Hadoop is becoming really significant?
Tom: Oh, yes. I spoke to a company based in Los Angeles called TrueCar. It collects car price data and told me that they already have around two and a half petabytes worth. They told me that they had moved from storing this in a data warehouse to using Hadoop. They found out that, all things considered, a data warehouse was costing them $19 per gigabyte per month, whereas Hadoop cost just 23 cents!
John: That's amazing. It strikes me that Hadoop is a allowing organizations to fulfill their curiosity, since using Hadoop makes it practical to search for patterns which in the past would have been difficult to justify spending time and money on.
Tom: Yes, that's true. For example, Wells Fargo told me how they are now experimenting with Hadoop in a big way, it's a great place for experiments as it'll take data in many different formats. This is giving CIOs a much wider range of choices. Also the mix of structured and unstructured data makes Hadoop an ideal repository for the sorts of open-data initiatives being pioneered by national governments.
John: Well, that's been really interesting so far. I'd like to come back and talk to you about model management, data science and the skill gap later?
Tom: Look forward to it.
Interested in getting to know Hadoop? Register for a free, 30-minute webinar.
This is the first part of the enjoyable and enriching interview I had with Tom. The second part is coming next week: Thanks to my colleague Philip Male for supporting me with this article. You can follow him on Twitter @PhilMale.