Four DI modernization mistakes

This month's theme concerns how organizations can modernize their data integration (DI) efforts. As many of the posts demonstrate, the era of big data portends massive opportunity – with the following caveat: An organization is much more likely to succeed if it takes a good, hard look at its current DI efforts. Against that backdrop, here are four DI/modernization mistakes to avoid.

Preserving a bureaucratic, top-down method

609179193 Self-service has never been more critical in the data world, but don't believe me or the other bloggers on the Data Roundtable. A few years ago, Gartner formalized its importance by adding self-service business intelligence (SSBI) to its IT glossary. The company defines it as “end users designing and deploying their own reports and analyses within an approved and supported architecture and tools portfolio.” Yeah, the Gartner definition is somewhat clunky, but its benefits are hard to overstate.

As Philip Russom writes in the TDWI Checklist Report, self-service grants employees direct access to new and big data for a wide range of users. Put differently, self-service functions:

...enable users to work with data with spontaneity, speed, and agility. [U]sers aren’t waiting for IT or a data management team to create a unique data set, report, or analysis for them. IT and other teams, in turn, are off-loaded when self-service data is set up so users can create their own data sets and the reports and analyses based on them. According to a recent TDWI report, the four tasks BI users want to do most via self-service are (in priority order) data discovery, visualization, dashboard authoring and data prep. This is more than wishful thinking; the same report reveals that half of users are already practicing data-driven self-service successfully.

No, data anarchy should not prevail. Still, refusing to embrace the very idea of self-service will hamper an organization's successful DI and modernization efforts.

Failing to embrace new data preparation practices and tools

As I know all too well, sometimes data sets require extensive cleansing before releasing them into the wild. The notion of data preparation is often critical but overlooked in many circles. That is, data must be collected, cleansed and consolidated – a process that typically entails:

Fixing errors.
Removing outliers.
Filling null values.
Merging records.

I won't get into the specifics of any single vendor's wares. What's more, many old data-prep features still get the job done. Still, it's safe to say that there may be better ways to prep data today compared to 20 years ago. Why not explore them?

Ignoring new data types and sources

Many mature data integration efforts focused primarily on neat, structured data sets from internal data sources such as relational databases. Sure, these are still important, but the unstructured stuff from external sources matters as well. I just don't see how DI modernization can maintain an exclusively internal focus.

Not allowing for sufficient capacity and scale

No one can predict precisely how much data any one organization will store, retrieve and/or analyze next year, never mind far beyond that. In all likelihood, though, it will be more than today. Organizations engaged in modernization projects would do well to keep this in mind.

To this end, it's far better to err on the side of excess capacity, at least within reason. This is doubly true because storage costs have been plummeting for decades. Organizations should leave themselves a healthy margin for error here.