In the world of science fiction, crossing the streams is bad. How bad? Try to imagine all life as you know it stopping instantaneously, and every molecule in your body exploding at the speed of light. The only time crossing the streams is recommended is during a desperate attempt to stop Gozer the Gozerian, who has taken the frightful form of the Stay Puft Marshmallow Man, from using a cross-dimensional portal built in a high-rise apartment building in upper Manhattan to cross on through to the other side and conquer the world.
In the world of data management, crossing the streams also used to be considered bad.
Historically, when dealing with data streaming from heterogenous sources, the preferred method was to use data integration techniques to extract data from each source, transform it into a homogeneous format while validating its data quality, and then load it into an enterprise data warehouse that could serve as the single data source for business intelligence applications.
This data integration approach gave users a faucet of information providing a consistent, clean and single data stream from which they could quench their data analytical thirst.
This method of taming the crisscrossing data stream madness was mostly batch-oriented, thereby working best when you had the time to wait for Data the Datarian to take the far less frightful form of the Stay Static Data Warehouse Man, based upon which cross-dimensional data queries can be safely executed in a high-rise office building by upper management in an attempt to cross on over to the profit side of the general ledger and conquer the business world.
Nowadays, the business world changes so fast that real-time data analytics is a necessity and the data that must be analyzed is streaming out of a complex waterworks of crisscrossed fire hoses, which are also becoming increasingly hose-less (i.e., the wireless water world of the Internet, mobile devices and social networks where systems, sensors and humans are all constantly streaming data).
Therefore, the quest to quench your data analytical thirst has been transformed from a leisurely paced walk to the faucet of information into a terrifying run for your life through an ectoplasmic gauntlet of information overload that leaves you feeling like you just got hit by the data equivalent of Slimer.
In the world of real-time analytics, crossing the streams has become not only good, but essential. So, who you gonna call? No, not the Data Busters.
Instead of proton packs and ecto-containment units, you need some form of event stream processing (ESP) or complex event processing (CEP) that can enable real-time decision making by continuously analyzing large volumes of data as it is received. Real-time analytics is challenging enough when dealing with a large volume of fast-moving data originating from a single source, but with the additional complexity of multiple sources in multiple formats, crossing these streams is like imagining all data analytics as you know it stopping instantaneously and every beautiful bit and byte of your business intelligence best practices exploding at the speed of light.
Even though, as Phil Simon recently blogged, the data fire hoses of the financial industry have been the most commonly cited use cases (e.g., algorithmic stock trading and credit card fraud detection), you don’t need the other kind of ESP to predict that CEP has lots of other potential applications, so crossing the streams is a data-busting best practice coming soon to a business theater near you.