Hadoop and big data management: How does it fit in the enterprise?

2

The other day, I was looking at an enterprise architecture diagram, and it actually showed a connection between the marketing database, the Hadoop server and the data warehouse.  My response can be summed up in two ways. First, I was amazed! Second, I was very interested on how this customer uses Hadoop. 

Here is what I found:

  1. The Hadoop server was purchased for a specific function in analyzing marketing campaigns created for the customers and prospects identified for this corporation.
  2. The Hadoop server receives files from the marketing database on a weekly basis. Data management activities such as identifying formats, file types, and relationships are sometimes over looked. And, getting the data "ready" is very time consuming and, at times, very complex.
  3. There are two full time people working on the Hadoop solution.
  4. The data warehouse receives the analysis results for reporting and further analysis in a structured data format.
  5. Customized marketing campaigns are created in the marketing database, based on analysis received from Hadoop and reporting in the data warehouse. Thus, it creates a full cycle of information for the enterprise.

I was impressed that this company had purchased the technology for a reason, implemented for success, and are now gaining market share based on the analysis from the Hadoop and the data warehouse. The only feedback, after my review was this:

  • They may want to consider doing some sampling of data from the sources for Hadoop (especially if they are going to include external data sources). Then, they can structure data and use a profiling/data quality tool to understand any relationships prior to load to ascertain the readiness for Hadoop.
  • This will also allow them to understand any key relationships between the data sources, as well as, a quick view of the data quality. Most external data sources can have a very low sense of data quality.
Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

2 Comments

  1. Shawn Skillman on

    Joyce,
    We have a similar situation with our implementation of SAS Visual Analytics and Hadoop. First our Data Management and ETL processes are handled within SAS Data Management, the data is offloaded into an Oracle database and then a nightly job sweeps through Oracle to pick up any new data. Depending on the file/table size, data is then loaded directly into memory in the SAS LASR server or into Hadoop.

    Our marketing department uses SAS Visual Analytics to analyze the effectiveness of their marketing campaigns from when the email was sent all the way until a sales opportunity closes.

    I would be happy to discuss this further if more information is requested.

    -shawn

Leave A Reply

Back to Top