Streaming analytics blurs the lines between data management and analytics

0

man jumping chasm represents the divide between data management and analyticsI live in the Midwestern United States (central Iowa), where almost every town still has a functioning water tower – a large water tank perched atop a high tower. Its primary purpose is to provide clean water that's safe for drinking and food preparation during a power outage (gravity provides the hydrostatic pressure to push its stored contents into water distribution systems). Due to Iowa’s flat landscape and minimal number of tall buildings, water towers are one of the most easily visible structures in the region.

While disc golfing the other day, I was searching for a wayward tee shot in a wooded area when through and above the trees I saw a nearby water tower. This made me think of traditional approaches to data management and analytics. The data an organization stores and maintains internally, often in an enterprise data warehouse, is analogous to a water tower. Most of this data has been internally generated then structured, transformed, cleansed, integrated and stored to support specific business purposes. For example, it's used to conduct daily operations, process financial transactions, and for the post-storage analysis that produces the factory fodder feeding most business intelligence reports.

As my disc golfing continued, one tee provided a nice view of Saylorville Lake, a man-made lake constructed by the US Army Corps of Engineers as a reservoir and flood control system for the Des Moines River. This made me think of the rising popularity of data lakes, a storage repository holding a vast amount of raw data in its native format – structured, semistructured and unstructured data. A data lake is essentially a reservoir and flood control system for big data. The lake stores it all before data structures and business requirements have been defined for its use, including its many potential applications for advanced analytics.

When I reached the 18th hole of the disc golf course, I was confronted by its greatest water hazard: a fast-moving stream, a tributary of the Des Moines River that flows across the fairway and has carried many an errant tee shot away to a watery grave. This made me of think of the greatest data management hazard – and greatest analytical opportunity – represented by the data flowing around and through the enterprise without having ever been stored. That includes data flowing from external sources like sensors, RFID tags, smart meters, live social media, mobile devices and other internet-connected objects.

Dealing with this data makes me ponder how streaming analytics is blurring the lines between data management and analytics. With traditional analysis, the data is stored then analyzed. But in streaming analytics, it’s the models and algorithms that are stored, and incoming data is analyzed as it passes through them. All this happens as the data is being generated or transmitted in real time. So before streaming data is stored in a data lake, or extensively processed for storage in an enterprise equivalent of a water tower, it’s rapidly analyzed. The analysis is an attempt to determine the data's meaning and value, pinpoint event relevance and generate instant alerts when there’s an urgency to take action.

Streaming analytics is never going to obviate the need for traditional data management and analytics. But it is becoming an increasingly necessary complement of it. Because streaming analytics enables enterprises to decide what streaming data should be stored – and will be subject to management and governance. Further, the insights gleaned from streaming data are valuable for supplementing traditional business applications.

Download a white paper: Channeling Streaming Data for Competitive Advantage
Share

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 20 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality. Jim is the host of the popular podcast OCDQ Radio, and is very active on Twitter, where you can follow him @ocdqblog.

Leave A Reply

Back to Top