Big data quality - don't tell me, another buzzword!


Marketing is a big part of my job so I should be supportive of efforts to capitalize on the trend of the day. But given my background in R&D, I am dubious of marketing efforts that are not backed up by real product or solution capabilities. So, I’m a bit of a skeptic about vendors that have inserted big data in all of their marketing messages. I see a lot of talk about what organizations should be doing, what is now possible given the opportunity of big data, etc. But I see very little in terms of guidance – how things should be done, or information relating to best practices that can guide the average organization.

So, I’ll step up to the challenge and provide some initial thoughts relating to data governance and big data. My initial posts will address data quality aspects of big data, and while I’m not going to provide a complete set of best practices, I’ll provide a list of considerations that should be factored into your big data plans.

I’ll start with my main takeaways, and then provide a set of recommendations related to data quality and big data. After considerable thought and discussion with product experts, implementation consultants and analytics practitioners, my key takeaways include:

  • Big data is all the more reason to leverage a comprehensive information management approach, one that is not just focused on data quality and data integration, but utilizing a comprehensive information management approach that spans data, analytics and decision management.
  • When it comes to big data, it’s not just about volume. As evident with data quality, many of the considerations are specific to the types of data that are being processed, or dependent on the source of data or the business use case. You don’t have to have massive volumes of data to leverage these data quality considerations.
  • As with other information initiatives, data quality should be considered as part of your overall data strategy including data governance and MDM, but I’ll take these topics one at a time, starting with quality.

I wish I could say it is as simple as extending your existing data quality approach to big data. Certainly, if you have solid data quality and data governance processes and technologies in place, you are at a great starting point. And simply extending what you are doing to include big data will provide some benefits. But to be truly successful, you need consider aspects of big data that may require a different perspective. On Monday and Tuesday, I'll these data quality considerations summarized by the following statements:


About Author

Mark Troester

IT / CIO Thought Leader & Strategist

Mark Troester is the IT / CIO Thought Leader & Strategist for SAS. He oversees the company’s market strategy efforts for information management and for the overall CIO and IT vision. He began his career in IT and has worked in product management and product marketing for a number of Silicon Valley start-ups and established software companies. Twitter @mtroester


    • Mark Troester

      Hello Fazal - Thanks for the comment and I agree that it is helpful to have a solid understanding of the data. That relates to data governance since you can use reference data management and business glossary capabilities to help determine and communicate a common understanding of the data. That plus visualization and exploration tools to help you understand the data that you have. Mark.

  1. Pingback: Big data quality: Think outside the box - Information Architect

  2. Pingback: Data quality for Hadoop - Information Architect

  3. Elismar Moraes on

    Mark, I've been work on my dissertation of MSc with big data and data quality. My purpose is build a tool that analyses a data source (database on NoSQL) with quality dimensions. Do you have any paper, article ou something like these about it? Best regards!!

  4. I view big data as one more element of Enterprise Information framework, with a capability of handling the information, which was till now outside the purview of the orgnisation,since the systems has been working on a limited control data sets. Till now we are not completely done with that.
    Now with the advent of big data, extension of data governance ,architecture & management is really a challenge ,because scalability ,malleability & ductility of existing data quality tools, MDM tool, metadata management tool and data integration tools, because big data tech can do a part of it.
    Really the boundary or clear demarcation is yet to evolve ,with vendor busy in pushing this though to gain a bigger share of market.
    Users are the real one who will come out with the best of breed, as they already have begin ingesting this concept.

Leave A Reply

Back to Top