Stop #4 in the Big Data Archipelago journey: the Open Source Adoption Isle

“One does not discover new lands without consenting to lose sight of the shore for a very long time.” - André Gide

Ever heard of OpenOffice, Hadoop, Android, Firefox or MySQL? If so, can you identify the common denominator between these software tools and applications? If you answered, “They’re all open source,” you’re right!

While open source software has been around a long time, many organizations have been somewhat slow on the draw to integrate open source into their enterprise infrastructure. A lot of companies have considered open source solutions for initiatives such as BI/DW and have compared them with proprietary solutions on functionality and cost. Yet the hard truth is: We’ve known about these open source solutions, and still we’ve been able to get by without them on a large scale.

Until now.

The Open Source Adoption Isle in the Big Data Archipelago

The Open Source Adoption Isle in the Big Data Archipelago

A Big Data Best Practice for Open Source Adoption

With the rapid growth of big data solutions these last few years, open source has taken a significant step forward into the enterprise space. Conversely, more and more enterprise-level organizations have begun to participate in and contribute code to the open source community. The time has come to take open source seriously for big data platforms.

A key reason is that many – if not most – of your current software vendors have integrated a variety of open source projects into their own proprietary big data solutions. Not only has Hadoop been integrated, but also many Hadoop-related projects (see list below). Vendors are also partnering with key big data service providers, such as Cloudera, HortonWorks, MapR, and other niche shops to support their customers’ emerging big data needs.

It’s important to note that while open source software is free and (typically) easy to install, commercial open source vendors have to make their money somehow. They can do this in a variety of ways:

  • Develop custom/proprietary code to enhance the free software;
  • Provide custom design and development services;
  • Provide a development sandbox;
  • Host software installations; and
  • Offer technical support and training.

As with proprietary software, open source software also requires ongoing support and maintenance. The software may be free, but it won’t always be cheap.

Popular Open Source Projects for Big Data

In the world of open source software, what we typically call a product or application is called a project. Just like products and applications, some open source projects are highly robust and complex, while others are quite simple and straightforward. In addition, many projects are built to play well with others. Such is the case with Hadoop.

The Apache Hadoop Project. This project is managed by the Apache Software Foundation and has two primary functions: to store and process data. As compared to our traditional, relational databases, Hadoop is able to store and process any and all types of data (not just structured data) in a fraction of the time and cost. No wonder it’s so popular.

Apache Hadoop includes four components:

  1. Hadoop Common – contains libraries and utilities needed by other Hadoop modules
  2. Hadoop Distributed File System (HDFS) – stores data on commodity hardware that scales easily and cheaply
  3. Hadoop MapReduce – a programming model for large-scale data processing
  4. Hadoop YARN (Yet Another Resource Negotiator) – a resource management platform

Of these four components, HDFS (the storage component) and MapReduce (the processing component) are what we hear about the most.

Hadoop-Related Projects. In addition to Apache Hadoop, there are dozens (if not hundreds) of open source projects that have been built to expand and extend its functionality. Listed below are some of the more popular Hadoop-related projects:

  • Apache Flume – collects, aggregates and moves large amounts of streaming event data
  • Apache HBase – a distributed, scalable, non-relational database (modeled after Google’s BigTable)
  • Apache Hive – provides a data warehouse-like structure and SQL-like access (called HiveQL) to data stored in Hadoop HDFS
  • Apache Mahout – a machine-learning and data mining library
  • Apache Pig – a high-level platform that allows users to create MapReduce programs
  • Apache Spark – a newer, faster, data-processing engine (a MapReduce alternative)
  • Apache Shark – uses Spark to run Hive queries up to 100x faster in memory (or 10x on disk)
  • Apache Sqoop – transfers bulk data between Hadoop and relational databases
  • Apache Zookeeper – a centralized service for configuration management and synchronization

Database Technologies. A question often asked about Hadoop is, “Is Hadoop a database – like Oracle or Teradata?” And the simple answer is “No.” Hadoop is not a database technology; it’s a framework and a file system. However, there are several database technologies out there that do support big data initiatives, such as:

  • Apache Cassandra – provides rapid access to structured or semi-structured data
  • Apache Solr – a search engine that provides full text indexing of documents
  • MongoDB – stores large collections of documents
  • Neo4J – stores graph-type data, such as social networks
  • Redis – provides rapid access to unstructured data

As marketers, what’s important to note is that when you hear the term Hadoop, it’s highly likely that the discussion is about the Hadoop ecosystem (which includes any and all projects listed above, plus the projects, technologies and services not listed)—and not just about the Apache Hadoop project (i.e., HDFS and MapReduce). As you can see, to address your big data requirements with open source software, you will need more than just the standalone Apache Hadoop project. You will most likely need an ecosystem of big data-related projects.

Make no mistake: We’re in geek heaven right now. Open source solutions are here to stay in the big data world.

Key Takeaways for Marketers

  • If you’re talking about big data technology, you’re probably talking about open source software.
  • Hadoop was built by developers for developers, not marketers. Don’t tackle alone.
  • Open source software is free, but that’s where “free” stops. It costs to implement.
  • Visit the graveyard near the southern tip of the island. It has open gravesites that you can explore.
  • Find out what your company’s position is on open source software and big data. One will most likely inform the other.
  • To take this all one step further for marketing, check out this paper, Six Tips for Turning Big Data into Huge Insights, featuring viewpoints from Catalina Marketing's ex-CIO Eric Williams.

This is the 4th post in a 10-post series, “A marketer’s journey through the Big Data Archipelago.” This series explores 10 key best practices for big data and why marketers should care. Our next stop is the Location Isle, where we’ll talk about allowing data to reside where it will provide value.

Post a Comment

Facebook vs. Orkut – three lessons for marketers

Me:          Hey! Orkut is going away.
You:         Oh, bummer! I didn't realize you had one of your cousins from Hungary in town.
Me:          *sigh*

A Facebook exchange among Social Media experts about Orkut.

I got permission to use this screen-shot of the exchange.

One of my Facebook friends recently mused in a post about getting a farewell email from Orkut. And she couldn’t remember what Orkut was. To me – that tells the whole story of Orkut. And in her case, she is a well-recognized authority on social media - and not remembering Orkut doesn’t undermine her expertise in the least. The problem is clearly about Orkut. And then I got to thinking:

How in the world
could a company like Google
develop something like Orkut
and have it be a flop?

Long story short – Orkut is a social messaging platform that Google developed and launched in January, 2004. It is substantially similar to Facebook, which launched in February, 2004. For more details about Orkut, visit the Wikipedia page. As for Facebook, I doubt you need a Wikipedia page to know about it.

I am not doing a tit-for-tat comparison of features and functionality about Orkut vs. Facebook because it doesn’t matter – Facebook won the battle. But I do think there are at least three marketing strategy lessons to be learned here.

Names matter

No, not your name – the name of your product / company. It helps if the name vaguely describes what it does, and it’s even better if it describes how it benefits the user. It’s not always possible to assign a name that way, but if so – do it. And find something easy to pronounce in multiple languages. Read More »

Post a Comment

Stop #3 in the Big Data Archipelago journey: the Integration Isle

“If you build it, he will come.” – From the movie “Field of Dreams”

“Build it and they will come” is a popular quote often attributed to the movie Field of Dreams. But guess what? This quote is not from the movie; it’s actually a misquote. [See the actual quote above.] It’s fascinating how much mileage this misquote has gained over the years—in the media, at conferences, in our business meetings, and even in our social circles.

Truth be told, this quote—right or wrong—has fueled our organizations: Build the data warehouse and they will come. Build the customer data mart and they will come. Build the analytics solution, the self-service BI app, the data visualizations – and they will come. Even though we go to great lengths to expand our platforms, build the applications, and integrate the data and business processes, the reality is that they don’t always come. Cindi Howson, founder of BI Scorecard, has done the research and tells us the same story every year: BI Adoption Flat.

But now that we have big data and can build it into the mix, will they finally come?

Figure 1. The Integration Isle in the Big Data Archipelago

Figure 1. The Integration Isle in the Big Data Archipelago

A Big Data Best Practice for Integration

The best practice we like to share here is: Build it on demand using the best tools for the job. With big data and its technologies, we now have more options on what, where, and how we’re going to build our integrated infrastructure. Read More »

Post a Comment

Set your compass to positivity for best results

Do your winds point to the "P" for "Positivity" on the weather vane?

Do your winds point to the "P" for "Positivity" on the weather vane?

I like to be reminded from time to time that having a positive attitude has so many better outcomes than the opposite approach. Some people try to make the case that focusing on positive outcomes actually drives positive outcomes, which sounds idealistic, but I think there's some truth to that idea.

One such positively-inclined person made a big impact on me seven years ago on my first day on the job at SAS - Newt Gingrich.

No, seriously - he spoke as Keynote at the SAS Health Analytics Executive Conference, which I attended that day. And while he's not best remembered for all-positive actions, he has established himself as an authority on the health care ecosystem as Co-founder of the Center for Healthcare Transformation.

Mr. Gingrich believes that in a contentious situation (like debating health care reform), if all sides simply stopped looking for reasons not to do something and instead focus on what needs to be addressed to make it possible, then the conversation will move toward a positive resolution. Simply put, he believes we should:

Stop saying "No, because..." and instead say, "Yes, if..."

That prescription may / may not be how health care reform actually happened, but he does make a valid point. Just think about it.

More recently, I've been inspired to share thoughts about the power of a positive attitude from an internal blog post written by Fritz Lehman, SVP of Customer Engagement and Support at SAS. His post is titled, "Relentlessly Positive," and I like what he has to say:

---------- Read More »

Post a Comment

Stop #2 in the Big Data Archipelago journey: the Processing Isle

“I have travelled the length and breadth of this country and  talked with the best people, and
I can assure you that data processing is a fad that won’t last out the year.”
(Editor in charge of business books for Prentice Hall, 1957)

Whereby the Analytics Isle tends to be a popular destination for marketers on the big data journey, you really won’t find them flocking to the nearby Processing Isle. This highly active island has much to offer—like special territories for batch, real-time, and streaming data—but marketers aren’t typically interested in how data is processed, as much as they’re interested in what marketing data can be processed and how fast. The happy folks on the Processing Isle keep them happy with timely, reliable, and relevant data. How the data gets there, many don’t care or need to care.

Figure 1 - The Processing Isle in the Big Data Archipelago

Figure 1 - The Processing Isle in the Big Data Archipelago

Regardless, marketers who have been in the industry awhile have witnessed the remarkable speed at which data warehousing technologies have advanced over the years. Nowadays, not only do we have options on how to process our data—such as grid computing, in-database, in-memory, and appliances—we also have much greater control over the activity in our data warehouse and analytical ecosystems. With these advancements, we’ve been able to increasingly optimize the data warehouse around mixed workloads, and marketers are undeniably reaping the benefits.

A Big Data Best Practice for Processing Data

Even with the significant technological advancements in traditional systems, big data technologies have changed the playing field for processing data of all shapes and sizes. Read More »

Post a Comment

Stop #1 in the Big Data Archipelago journey: the Analytics Isle

“There are known knowns. These are things we know that we know.
There are known unknowns.  That is to say, there are things that we know we don't know.
But there are also unknown unknowns. There are things we don't know we don't know.”
- Donald Rumsfeld

Welcome to the Analytics Isle—the #1 hot spot destination for marketers in the Big Data Archipelago! It’s not hard to understand why this island is so popular given its unlimited data opportunities for exploration, reporting, advanced analytics and data visualization. With all the data that’s available these days—from traditional data (CRM, contact center, sales) to big data (email, social, mobile)—some marketers are having a heyday carving out new paths and adventures to pursue, while others are simply stuck in the moat.

Big Data Archipelago - Analytics Isle

Figure 1. The Analytics Isle in the Big Data Archipelago

For years, marketers have used analytics with their traditional data to gain valuable insight into their company and customers. In other words, with analytics, they have come to the data with their business questions in hand to answer, what we will call, the known unknowns. They know what they don’t know or want to know, and they use analytical data to fill in the blanks. It’s like discovering gold in a hidden treasure chest of data.

Discovering the known unknowns with analytics has kept companies very busy for years and will continue to do so in the years to come. Now let’s shift our focus from what is happening with traditional data and analytics and explore what could be happening with big data. Read More »

Post a Comment

The health care customer emerges amid industry upheaval

One interesting outcome of regulatory reform in health care is seeing the use of the word "customer" filter into the dialogue in the industry.

This letter is not from a health plan or provider, but clearly shows me the love as a customer.

Someday letters from health plans and providers will look more like this.

The context for that development in the United States’ health care industry is upheaval not seen in any sector of the economy since the government-mandated breakup of the monopoly Bell Telephone System in the mid-1980s.

At the same time, technology is radically transforming health care, and the precedent in communications is how the Bell System breakup coincided with the commercialization of innovations such as voice mail, mobile telephones and the internet, to name a few. So will this upheaval in health care also usher in great innovations? I think it’s quite likely – and one big question is, “Who will do the innovating?”

In times of change, the organizations that thrive are the ones that adapt, often ending up operating alongside new players that might do things that were previously unimaginable.  I imagine the same will hold true in health care as it evolves.

We’ve already seen adaptation taking place among health insurance plans in fundamentally strategic ways, and two standouts include Blue Cross Blue Shield of North Carolina and Florida’s GuideWell Connect (GWC) - an affiliate of FloridaBlue. The most striking changes have involved the ways they engage with their customers, essentially orienting their organizational direction more toward individual members’ needs and desires and not so much as a consequence of regulatory requirements.

As a health insurance customer, I could not be more excited. As a marketer, I could not be more intrigued. Read More »

Post a Comment

Tell a financial story when measuring customer experience

In our fast changing, increasingly digital world, building a strong customer relationship is the lynchpin to building a great business. Digitally-savvy, hyper-connected customers are now harder to define, understand and please than ever before. Today’s enterprise must focus on the customer experience as never before--or risk being replaced or ignored.

A new report by Harvard Business Review Analytics Services, "Lessons from the Leading Edge of Customer Experience Management" spotlights how leaders in customer experience management are developing strategies, capabilities, processes, and metrics to gain competitive advantage and remain relevant.

One of the striking aspects--and there are many--of this report pertains to measuring customer experience efforts. Maximizing the customer experience ROI (52%) was cited as the top issue. Nearly half also reported that it's extremely challenging to tie customer experience investments to business outcomes. Leading-edge firms aren't immune from this issue, with a third in the same predicament.

That difference suggests that a higher incidence of tying customer experience to business outcomes sets the leading-edge firms apart. Still, a majority of leading-edge companies admit to having at least some difficulty tying their customer experience investments to business outcomes.

The traditional measure of customer experience success—customer satisfaction scores—are widely used in all companies. Yet a variety of other metrics deemed highly important by those that use them, such as customer effort and digital engagement scores aren't as prevalent.

The report showed that customer experience leaders use an array of metrics, often more effectively, to track customer experience management progress, including measures such as customer lifetime value, indirect traffic, social media sentiment, and upsell rates. Read More »

Post a Comment

A marketer's journey through the Big Data Archipelago

Come along with me on a journey through the Big Data Archipelago. It involves "visiting" a series of islands - an archipelago if you will - that each present a different opportunity to find value in big data.

Big data is arguably one of the most overhyped buzzwords in business today, yet we can't call it a mere "buzzword" because it's quite real. And it presents itself simultaneously as a challenge and an opportunity, so doing nothing about it is not an option for today's marketers.

We all are awash in data — big and small, structured and unstructured — and our ability to process, analyze, and manage rests in knowing the rich value of the data. To that end, I propose a structured approach to big data as a journey through an archipelago so that each marketer can tailor what they do with big data according to what makes sense for their organization.

And while ubiquitous, big data means different things to different people. So let's consider how Paul Kent, VP of Big Data at SAS talks about big data:

“That amount of data or complexity which puts you out of your comfort zone.”

Going Beyond the 3Vs

How many articles, blog posts, webcasts, or presentations on big data have you read or listened to these past few years that’s referenced Gartner’s 3Vs – volume, variety, velocity – of big data? I suspect “a lot.”

Moreover, vendors, analysts, and consultants alike have taken the liberty to expand this list, adding such V’s as value, veracity, variability, and viability, just to name a few. Framing the big data discussion around 3, 4, 7, or even 15 V’s can be problematic for marketers, however, because it doesn’t get them any closer to finding the hidden value in their data – only understanding why it’s so hard to find. This is where the big data archipelago can help.

The BigData Archipelago has 10 islands.

Read More »

Post a Comment

Could marketing optimization improve your golf game?

As the best of the best golfers converge in Pinehurst, NC at the US Open Golf Tournament, it seems only natural to relate golfing to marketing optimization. I assure you this is not a stretch - please read on.

You see, everything in life is an optimization problem that must be solved. Time spent on an activity versus the quality level output from that activity - whether it’s at work, at home, or even on a hobby. For me, my golf game is one big optimization problem. You might ask – well what do you have to optimize against in this situation?

My handicap.

For those of you not familiar with golf, you are assigned a handicap based on your skill level. With this handicap the following formula is applied:

Gross Score – Handicap = Net Score

This system is used so differing skill level players can compete against each other in tournaments, etc. A very good player will have a low handicap (3) whereas as a less skilled player has a higher handicap (20). An example of how handicaps work are as follows: A good player shoots a 76-3= 73 while a less skilled player shoots a 92-20=72. The less skilled player would win the match, because their net score is lower.

The optimization side of this problem is – how much do I want to invest into lowering my handicap and what is the payoff for that investment? One thing for I must consider are the constraints – what am I limited by when attempting to lower my handicap? Wow – time, money, innate athletic skill, the goodwill of my family and coworkers for spending endless hours on the golf course, and the list goes on and on. Unfortunately – the constraints are keeping me from getting my handicap to where I would ideally like it – but that leaves room for future improvement right?

Just like my golf game, marketing for many organizations is one big optimization problem. Questions marketers may ask that are indicative of the need for optimization may include: Read More »

Post a Comment