Hadoop: Breaking out of the Silicon Valley bubble

large bubble against blue sky In the first installment of this series on Hadoop, I shared a little of Hadoop's genesis, framing it within four phases of connectivity that we are moving through. I also stated my belief that Hadoop has already arrived in the mainstream, and we are currently moving from phases three of connecting people to phase four of connecting devices and things.

In this post I want to cover my views on the developments that have allowed Hadoop to break free from the Silicon Valley bubble and appeal to organizations of all sizes. From my perspective, there are four drivers that have propelled Hadoop into the mainstream and contributed to Hadoop market growth, aside from the oft-touted cost savings:

Digital first: The need to support increasingly digital forms of "human" interaction. Humans now operate in a world that is almost exclusively digital. This is especially true of the younger generation who are the next generation of consumers/customers/partners. Essentially, organizations want to ensure they can capture the big data provided through these interactions and deliver a suitably personalized and rapid service in return (which results in more data!).
De-risking hadoop: The arrival of commercial Hadoop vendors who could support organizations wishing to deploy Hadoop along with a massive push (which is still underway) to make Hadoop ready for the enterprise.
Hadoop accessibility: The arrival of supporting commercial technologies from software vendors that make using Hadoop more accessible.
Device and sensor volume: The arrival, and future growth, of phase four – the Internet of Things – where we are connecting devices/things that are generating ever larger volumes of data. This new wave of data and interaction offers the potential to transform the way organizations operate, the services they offer and the expectations of consumers. Cisco, for example, predicts that 50 billion “things” will be connected by 2020 and every company on the planet is trying to work out how they can tap into that.

Let’s address each of these in more detail.

1. Digital first

With the PC slowly being usurped by tablets, and other mobile devices, digital interaction is no longer the exception but it is increasingly the norm. From online shopping, social networks, digital banking, electronic payments through to location based mapping and navigation services, digital is everywhere along with the data it generates and the expected quality of the service. Every commercial company in the world is going to have to become a SMART digital company, otherwise it will simply not survive the shifting generations that will be their next revenue stream. The bank of the future is not the bank of today!

The bottom line? More data is coming, in more varied forms, at much faster speeds, and being able to use that data to deliver better, differentiate services at fast speeds will be expected. Companies are unable to keep spending at the same high levels they do today on the IT backbone, and Hadoop is offering an alternative.

2. De-risking Hadoop

Organizations are generally risk averse. If information has become the new oil, then you do not put that oil into an unsecured and unsupported facility. The exceptions were those organizations who had the will and the need to build and manage a facility themselves often because there was no alternative and their entire business depended on it. With the arrival of companies like Cloudera, Hortonworks, Pivotol and MapR (amongst many others), there are now commercial distributions of Hadoop that are slowly making the environment more robust and enterprise ready.

The risk of using Hadoop is decreasing just at the time the cost of the hardware to use it is decreasing and the amount of data is going up. High-performance computing, once dominated by mega projects such as the Hadron Collider, the Square Kilometre Array and Silicon Valley juggernauts, is required to make Hadoop viable. Now, high-performance computing is ready for the commercial mainstream with all the elements of CPUs, networking, storage and memory getting faster or bigger at the same time that they are getting cheaper – a trend that will just continue moving forwards.

It is a perfect storm coming at a time when many companies were starting to worry about the costs of storing, utilizing and delivering data in support of services given the volume, variety and velocity changes that are occurring as the world is increasingly digitizing. Hadoop does not replace everything in an organization today. Rather, it should be considered as an alternative approach based on the needs of the organization and the process it might support.

3. Hadoop accessibility

To get Hadoop into the mainstream, organizations need the skills to interact with Hadoop without the expense of hiring hundreds of new programmers to build custom solutions (and let's not forget the maintenance headache if you try to accomplish this with third parties as they will not be there forever).

SAS, along with other software vendors, are rapidly developing tools that open up Hadoop and unlock its enormous potential for any company, not just those with hordes of coders! There are now tools to help manage data, to handle data quality, to visualize and report on data and to run analytics all via intuitive user interfaces. Now working with Hadoop is moving in the direction of being just as easy as it was with other systems and it works with any volume of data very very quickly in a cost efficent manner.

4. Device and sensor volume: the coming tsunami

Lights, toothbrushes, refrigerators, thermometers, wearable’s and more are the next generation of connected devices. In 1990 most people, if lucky, had one internet connected device whereas today in my home I probably have over 40 (thanks to my Phillips Hue lights).

This new consumer world of connected devices is matched by the sensors in cars that are collecting telematics data to be streamed back to companies, sensors on oil rigs that monitor performance, air quality sensors in public spaces, and lets not forget things like the iWatch that gather lots of data for new, innovative services and more. The list is endless, and if something can have an IP address in future and send back data, you can bet someone is going to make it happen. These devices will generate data at volumes not previously seen, and forward looking organizations should already be thinking about adapting to get the most out of whatever data they can capture before their competitor does.

Every company is a digital company in the future! That means every company needs to think like their Silicon Valley trailblazers.

Hadoop will soon be a teenager but it has already left the nest!

Hadoop is not even 10 years old today, but it is rapidly maturing into the young adult it will soon become. It deserves its place in the IT architecture of almost any company.

In the early days, Hadoop was quickly adopted by Yahoo, and by 2009 Yahoo made all the code they had developed available to the open source community. Shortly after that, when Facebook was rapidly growing, their messaging application went online and they also adopted Hadoop, announcing in 2010 that their Hadoop deployment now had 21 Petabytes of data in it and topping that in 2013 by announcing it had reached 100 Petabytes. Just when you thought we might be done, Yahoo recently announced their 455 petabytes of data stored in Hadoop.

These companies are no doubt the early movers who have recognized that a digital approach could be disruptive in various markets. At the same time, they have understood that, for certain things, traditional IT approaches would struggle to capture and process the growing data volumes in a cost effective manner, and this is why they all adopted Hadoop.

Each of these companies, just like most organizations, continues to capture more and more data and they are all working hard to get smarter and smarter at surfacing useful information and turning that data into commercial gain.

It could be argued these early adopters are the current masters at monetizing the data they are collecting and in providing digital services. For them, Hadoop is not just a cheaper data collection medium but from the VERY START it has been a platform that allows them to use that data to deliver a unique new experience, to connect people to things they did not even know they wanted to be connected to themselves, and create new services. This is the magic of Hadoop!

As other companies look at their current situation and future needs, Hadoop is the new kid on the block that offers great potential to reduce costs for some things today, speed up processing and perhaps more importantly to provide a proven scalable platform for the future into data volumes that most companies think they will never reach! Having seen many non-Silicon Valley companies utilizing Hadoop I can tell you that "Hadoop has spread from the Valley" and is now flying quickly throughout the world.

In the next installments of this series, we will talk about some common early use cases for Hadoop alongside traditional IT architectures, take up the debate around the emerging topic of data lakes, and finally talk about the big data lab.

To learn more, register for a free, 30-minute Hadoop workshop.

photo by darwin Bell // license by cc

1 Comment

Adam on March 13, 2015 8:55 am

Thanks for sharing your post! In the Silicon Valley ecosystem, Hadoop has been the rage for several years, but the larger business world is just starting to come to grips with how this technology will move from proof-of-concept into production. Some interesting points to note: Barclay’s just released a study (see “CIOs Uncertain About Hadoop’s Value: Barclays”) that indicates the adoption of Hadoop may happen gradually. CIOs are just not confident they can plug Hadoop into their data warehouse ecosystems and move ahead. Also, one of the findings from Gartner Analyst Merv Adrian’s research is that as tech executives gained more experience with Hadoop, they became less likely to think that it would eventually replace the data warehouse.

Blogs