Data preparation and data wrangling, Part 1 (yippee, bring your lasso)

0

cowboy represents data wranglingI'm a very fortunate woman. I have the privilege of working with some of the brightest people in the industry. But when it comes to data, everyone takes sides.

Do you “govern” the use of all data, or do you let the analysts do what they want with the data to arrive at conclusions that could change business? These are hard decisions that require many conversations.

Let’s examine this dilemma by starting with some definitions.

  • Data preparation is the act of cleansing, formatting, integrating and loading data into a data store for consumption by other applications or reporting/analytics, etc.
  • Data wrangling could also be called “composting.” We land the data, and apply cleansing, formatting and integration on the application layer (ELT) – not in the database. This requires a data scientist or data analyst to merge and manage the data, in whatever way they need it for a specific analysis.

Most organizations have some sort of an operational data store as well as a data warehouse. Most of these data stores have been managed and governed with specific patterns for read, insert, update and delete functions. That said, with the introduction of the data lake, how do we manage and govern the data? Or does it even matter?

One of my first thoughts was: “Heck, I have all this data – why not just send it to the data lake from the data warehouse and/or operational data store?” Sounds logical to me. But what if the requirements for this analysis does not need to have good quality data? Interesting – right? Not necessarily the way we have gathered requirements in the past.

So, what do we need to do to make to make sure data is consumed the way it needs to be consumed, and from the "correct" data store?

Watch for Part 2 of this series where we'll continue this discussion.


Got 2 minutes? Watch a video to learn more about data preparation for analytics.

Share

About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Related Posts

Leave A Reply

Back to Top