Begin at the beginning


I’ll admit I am particularly fond of a saying, “Begin at the beginning.” All too often we get ahead of ourselves when trying to tackle a problem. And without a clear understanding of the full scope of a problem, there’s always the risk of making it worse.

Something like this is happening in the area of business analytics. In the search for a single version of the truth, it is necessary to corral data from any number of data sources throughout an organization. Typically the process also involves effectively cleansing and prepping this data for relevant analysis – all with the overall goal of preparing useful reports that provide business insight that aids in fact-based decision making.

It’s never a problem getting people excited about the super-cool, fact-based decision making part. But often those same people are less concerned about how you get to that end result. Suffice to say the “plumbing” of data integration and data quality (to borrow the fixture analogy of my esteemed colleague, Ken Hausman) is perceived as less sexy.

And yet, the data integration/data quality part is undeniably the beginning of a business analytics project.

In February 2009, Computerworld released a SAS-sponsored report entitled, Defining Business Analytics and Its Impact on Organizational Decision-Making. It is an insightful report asking responders to define the term “business analytics” and the technologies associate with that term. Notably, 73 percent of responders view business analytics as a function of both IT and business departments – the evolving relationship between these two groups is a fav subject I’ve written about previously.

The results also indicate a consistent trend I’ve seen in a number of research reports from the past few years, including:

  • The SAS report, Business Intelligence Maturity and the Quest for Better Performance found that “close to 80 percent of organizations have not fully implemented practices to ensure data quality, integrate their data across the business or create standard data definitions – fundamental elements of maximizing business information.”
  • In another Computerworld project, Information Management Initiatives at Midsize and Large Organizations, a combined 55% of responders listed either “integrating disparate systems, standardizing data management processes, data quality or data access” as the key barrier to their information management efforts.
  • In the current Computerworld report, 59 percent of responders named “data integration with multiple source systems” and 56 percent named “data quality” as the key technology or business challenge to business analytics implementations.

See a pattern?

This trend is fleshed out even more when I talk to folks about their various business analytics implementations. From the outset, they don’t necessarily have a strong sense of where the most important data lives within their organizations or how many versions of that data exist. They typically underestimate how much time and ingenuity it may take access that data. And almost without fail, data quality issues threaten to consume the project. And we all know the adage - garbage in, garbage out.

Data integration is the oversight. Yet it continues to surface as the proverbial thorn in the collective side. Data quality issues haunt the professionals responsible for building the systems and the executives who want so desperately to rely on the reports those systems generate.

So how does something so basic, so fundamental, get overlooked? Planning how data is extracted from multiple sources and how that data is deduped, standardized, profiled, etc. – it would seem that is the beginning isn’t it? But we know that it is all too commonly skipped or rushed… and sooner or later, revisited.

Do you have a story about data integration/data quality gone-awry? Thoughts on why some are so averse to investing the time up front to make sure the data is as solid as possible before analysis and reporting? I’d love to hear your opinions.


About Author

Ericka Wilcher

Sr. Marketing Specialist

Ericka Wilcher is a Senior Marketing Specialist at SAS, specializing in developing and executing innovative marketing campaigns for brand awareness and lead generation. Ericka specializes in digital strategies for advertising, content creation and syndication, and interactive promotional technologies. Follow @ErickaWilcher on Twitter for more!


  1. Great points, Ericka. As organizations adopt a business analytics approach, the data is always the fundamental step to really having a successful and ongoing strategy.
    More than a project or an initiative, business analytics takes into account that you are building a framework to support solutions cross the organization - as such the identification of best practices and agreement on data governance, quality and management issues that align with the business needs is crucial.
    Developing and identifying data sources internal to the organization, from third parties, in structured, semistructured and unstructured form - is included in that process.
    I had someone comment to a post recently that folks are looking to do the basics - that discussions of strategy can be chasing wind mills. I disagree.
    We have to begin at the beginning - which is what your post is about - but we also have to begin with the end in mind as well.
    What, when, where and who - the downstream of the data has to align with value to the business because beginning with the end in mind means we are valuable parts of the decision eco cycle.
    Great report from Comptuerworld! I'll be looking at it and possibly posting some thoughts later this week.

  2. Spot on. In this view of data integration as a component of the framework, there are really two aspects.
    The first is the show stopper: providing the data required while at the same time avoiding/removing data quality issues. These data quality issues would not only render the current data incorrect and untrustworthy, but would also pollute all downstream processes that use this data, effectively creating mistrust and avoidance which are the death toll for IT projects, regardless of their cost.
    The second aspect of data integration is value add - bringing in data beyond the basic requirements. This could include data from additional sources, with more latency options and having differing levels of structure. It could also include added intelligence, as a result of integrated analytics. The value added by this level of data integration is the difference between meeting requirements and exceeding expectations... between being a "me-too" player and an industry leader... or ultimately, between getting cake with icing, instead of just an empty cake box.

Back to Top