The other day, I was looking at an enterprise architecture diagram, and it actually showed a connection between the marketing database, the Hadoop server and the data warehouse. My response can be summed up in two ways. First, I was amazed! Second, I was very interested on how this customer uses Hadoop.
Here is what I found:
- The Hadoop server was purchased for a specific function in analyzing marketing campaigns created for the customers and prospects identified for this corporation.
- The Hadoop server receives files from the marketing database on a weekly basis. Data management activities such as identifying formats, file types, and relationships are sometimes over looked. And, getting the data "ready" is very time consuming and, at times, very complex.
- There are two full time people working on the Hadoop solution.
- The data warehouse receives the analysis results for reporting and further analysis in a structured data format.
- Customized marketing campaigns are created in the marketing database, based on analysis received from Hadoop and reporting in the data warehouse. Thus, it creates a full cycle of information for the enterprise.
I was impressed that this company had purchased the technology for a reason, implemented for success, and are now gaining market share based on the analysis from the Hadoop and the data warehouse. The only feedback, after my review was this:
- They may want to consider doing some sampling of data from the sources for Hadoop (especially if they are going to include external data sources). Then, they can structure data and use a profiling/data quality tool to understand any relationships prior to load to ascertain the readiness for Hadoop.
- This will also allow them to understand any key relationships between the data sources, as well as, a quick view of the data quality. Most external data sources can have a very low sense of data quality.
2 Comments
Joyce,
We have a similar situation with our implementation of SAS Visual Analytics and Hadoop. First our Data Management and ETL processes are handled within SAS Data Management, the data is offloaded into an Oracle database and then a nightly job sweeps through Oracle to pick up any new data. Depending on the file/table size, data is then loaded directly into memory in the SAS LASR server or into Hadoop.
Our marketing department uses SAS Visual Analytics to analyze the effectiveness of their marketing campaigns from when the email was sent all the way until a sales opportunity closes.
I would be happy to discuss this further if more information is requested.
-shawn
Shawn - thank you for your input.