As I've previously written, data analytics historically analyzed data after it stopped moving and was stored, often in a data warehouse. But in the era of big data, data needs to be continuously analyzed while it’s still in motion – that is, while it’s streaming. This allows for capturing the real-time value of data before it’s lost in the time lag between creation and storage – and before it’s lost in the time lag between analysis and action.
“It’s a streaming world,” David Loshin recently wrote. “In the past, much of the data that was collected and consumed for analytical purposes originated within the organization and was stored in static data repositories. Today, there is an explosion of streaming data. We have human-generated content such as data streamed from social media channels, blogs, emails, etc. We have machine-generated data from myriad sensors, devices, meters and other internet-connected machines. We have automatically generated streaming content such as web event logs. All of these sources stream massive amounts of data and are prime fodder for analysis.”
Big data streaming – a faster way to analytical insights
Big data not only needs to be processed quickly, but any analytical insights that can be gleaned from it must be done as close as possible to real time. Not only is it frequently impractical to store all big data; a significant percentage of it is often irrelevant to business processes and analytical applications. Consider, for example, all the social media data generated by an organization’s customers. That data definitely has a lot more noise than signal.
This is why big data and streaming go hand in hand. Streaming can apply simple transformations and rules to determine whether big data is relevant enough to warrant immediate action, further downstream processing and/or eventual storage. Perhaps the best example is fighting fraud and cybercrime. As Todd Wright recently blogged, “It’s often just a matter of seconds that determines whether a malicious intrusion is successful or a transaction is fraudulent. Those are critical seconds that traditional approaches to managing and analyzing data won’t always catch.”
Use cases for fraud and cybercrime are numerous and span industries. But banks and other financial institutions that issue credit cards stand out since the cost of credit card fraud is estimated at $200 billion annually. Streaming enables more data to be quickly analyzed. That includes historical transaction data previously processed and stored, and real-time big data representing customers’ recent online behavior. Such data may not need to be processed further, or even stored, to immediately identify and intercept attempts at fraudulent transactions.
Big data hits its full potential with streaming. In other words, data streaming is the most effective way to handle the volume, variety and velocity of big data. The major point of big data analytics is to gain business insights from data beyond what traditional store-it-first-analyze-it-later approaches have been capable of delivering. Hand in hand with streaming, big data can deliver those business insights as they happen.