Stop #2 in the Big Data Archipelago journey: the Processing Isle

4

“I have travelled the length and breadth of this country and  talked with the best people, and
I can assure you that data processing is a fad that won’t last out the year.”
(Editor in charge of business books for Prentice Hall, 1957)

Whereby the Analytics Isle tends to be a popular destination for marketers on the big data journey, you really won’t find them flocking to the nearby Processing Isle. This highly active island has much to offer—like special territories for batch, real-time, and streaming data—but marketers aren’t typically interested in how data is processed, as much as they’re interested in what marketing data can be processed and how fast. The happy folks on the Processing Isle keep them happy with timely, reliable, and relevant data. How the data gets there, many don’t care or need to care.

Figure 1 - The Processing Isle in the Big Data Archipelago

Figure 1 - The Processing Isle in the Big Data Archipelago

Regardless, marketers who have been in the industry awhile have witnessed the remarkable speed at which data warehousing technologies have advanced over the years. Nowadays, not only do we have options on how to process our data—such as grid computing, in-database, in-memory, and appliances—we also have much greater control over the activity in our data warehouse and analytical ecosystems. With these advancements, we’ve been able to increasingly optimize the data warehouse around mixed workloads, and marketers are undeniably reaping the benefits.

A Big Data Best Practice for Processing Data

Even with the significant technological advancements in traditional systems, big data technologies have changed the playing field for processing data of all shapes and sizes. In fact, the need to process high-volume, high-velocity, and high-variety data (otherwise known as the 3Vs of big data) was a key driver in the development of these big data technologies.

As a result, take advantage of the processing power of big data technologies is the new battle cry for big data. With technology options like Hadoop—an open source project designed to address the storage and processing requirements for big (and traditional) data—we can easily process semi-structured and unstructured data that we can’t or don’t want to store in our traditional systems. Or we can pre-process traditional (or big) data in Hadoop before storing it in a data warehouse.

Contrary to popular belief, you don’t need “big” data to take advantage of the power of Hadoop. For example, you can use Hadoop to offload some of the processing work you’re currently asking your traditional data warehouse to do. Let’s take a look at Facebook.

A Quick Example: Facebook

When you go to Facebook, all the information you see is coming from multiple data sources. For example, your profile data is coming from a transactional database, your mutual friends list is coming from a data warehouse, and the news feed is coming from Hadoop. Your mutual friends list needs to be updated periodically to reflect your current state of connections. As you can imagine, this would be a data-heavy, resource-intensive job for your data warehouse – given that Facebook needs to figure out your mutual friends for each and every connection, and then do that for everyone, on a regular basis.

This is a job that could easily be handled by Hadoop in a fraction of the time and cost – and that’s what Facebook does. It sends your data from the data warehouse to Hadoop to process and update your mutual friends list, and then Hadoop sends the updated list back to the data warehouse. Could the data warehouse do this work? Of course, but at what cost (in time and money)?

This is just a simple example of how taking advantage of big data technologies, such as Hadoop, can save both time and money. Transfer that online experience afforded by Hadoop to how your customers interact (or may want to interact) with your organization and appreciate how big data technologies might save your own organization both time and money.

Key Takeaways for Marketers Visiting the Processing Isle

  • Data processing did not die in 1957.
  • The terms “big data” and “Hadoop” are not synonymous. Hadoop is just one of many big data solutions.
  • Hadoop can process all your data—unstructured, semi-structured, and/or structured.
  • Feed the mermaid in the South Bay of the island. She’s nice and likes the attention.
  • Many traditional and big data software vendors, including SAS, have integrated Hadoop into their big data solutions.

This is the 2nd post in a 10-post series, “A marketer’s journey through the Big Data Archipelago.” This series explores 10 key best practices for big data and why marketers should care. Our next stop is the Integration Isle, where we’ll talk about using the best tools for the job.

Share

About Author

Tamara Dull

Director of Emerging Technologies

I’m the Director of Emerging Technologies on the SAS Best Practices team, a thought leadership organization at SAS. While hot topics like smart homes and self-driving cars keep me giddy, my current focus is on the Internet of Things, blockchain, big data and privacy – the hype, the reality and the journey. I jumped on the technology fast track 30 years ago, starting with Digital Equipment Corporation. Yes, this was before the internet was born and the sci-fi of yesterday became the reality of today.

Back to Top