Data management: the next generation


I was a science fiction fan from an early age after Star Wars (now referred to as Episode IV: A New Hope) became the first movie that I ever saw in a theater. Being born in the 1970s meant that I didn’t see the original Star Trek television series until the reruns after its first movie debuted. Although I found the Star Wars movies more enjoyable than either the Star Trek movies or television episodes, in 1987, when Star Trek: The Next Generation debuted, I gradually became as big of a fan of Star Trek as I was of Star Wars.

Both franchises have an important place in the canon of great science fiction, and the force of their influence will continue to engage science fiction for a long time to come.

Among Trekkies, there is often a deep divide between fans of TOS (Star Trek: The Original Series) and TNG (Star Trek: The Next Generation), which is somewhat similar to the deep divide between fans of the two sets of Star Wars trilogies. The figureheads of this Trekkie polarity are its signature commanding officers, Captain James T. Kirk and Captain Jean-Luc Picard.

The data management industry seems to be experiencing a similar divide between the original series and next generation of data management. The figureheads of this techie polarity are its signature commanding officers, Captain Data B. Relational and Captain Not-Only SQL.

The relational model has dominated the data management industry since the 1980s, fostering the long-held belief that data has to be structured before it can be used, and that data should be managed following ACID (atomicity, consistency, isolation, durability) principles, structured primarily as tables and accessed using structured query language (SQL).

NoSQL is a broad class of data management systems identified by non-adherence to the relational model that manage data following BASE (basically available, soft state, eventual consistency) principles, not structured primarily as tables and generally not (or not only) accessed using SQL.

Staunch relational defenders normally do not have their phasers constrained to stun when engaging in verbal combat with those espousing the virtues of NoSQL. In fact, just as many fans of TOS initially refused to watch any episodes of TNG, many data management professionals initially refuse to discuss NoSQL. Or to mix my science fiction metaphors, in the Big Data Wars, it seems like the data management industry is pitting the Jedi Knights of Relational against the Sith Lords of NoSQL.

However, just as Trekkies need to acknowledge the merits of both TOS and TNG, and Stars Wars fans need to acknowledge the merits of both trilogies, the data management industry must acknowledge some next-generation data use cases differ considerably from the original series of data use cases.

Of course, data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary.

In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality, as well as our traditional perspectives about analytics, since sometimes depth and detailed analysis may not be necessary to provide business insight.

Big data is not the final frontier, but for us to boldly go wherever the continuing business mission of our enterprises need to go, we should apply the practices best suited to each specific application of data, whether that means we rely on data management techniques from a long time ago, or strange new data management techniques that seem to be from a galaxy far, far away.

Both relational and NoSQL have an important place in the canon of data management best practices, and the force of their influence will continue to engage data management for a long time to come.


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top