Hadoop may have been the buzzword for the last few years, but streaming seems to be what everyone is talking about these days. Hadoop deals primarily with big data in stationary and batch-based analytics. But modern streaming technologies are aimed at the opposite spectrum, dealing with data in motion and providing analytical insights in flight.
Streaming technologies have been around for a number of years. But recently, the numbers and types of use cases that could take advantage of these technologies has exploded. Today, the question is not really about whether or not to stream. It’s about how to marry new streaming capabilities and approaches with emerging use cases.
Streaming use cases and applications
Streaming applications enable continuous processing on data that’s being produced constantly. This equates to a huge number of streaming events that need to be filtered and analysed for insights in a short period of time. The degree of real time that’s needed – and the latency demanded by an application – define which use cases can benefit the most from real-time streaming analytics.
Cybersecurity and IoT, for example, can benefit from streaming technology. Network-based cybersecurity solutions deal with streaming data from routers and network equipment, while IoT applications deal with sensor or device data. In both cases, the streaming solution has to deal with huge volumes of data at very high rates (e.g., millions per second) with extremely low latency (in milliseconds). Typically, there’s a large signal-to-noise ratio that has to be dealt with to deliver valuable insights. This is where the streaming platform becomes most valuable.
From a traditional business perspective, there’s also a growing need to monitor customer behaviours and transactions in real time and then act on them using data and analytics. Streaming solutions now play an important role in powering these highly targeted, push-oriented services and offerings for different industries.
Open source, or commercial off-the-shelf?
The technology behind streaming platforms has matured significantly to tackle traditional and emerging use cases. From a tool selection perspective, customers often ask me how to compare commercial streaming software from SAS with those from the open source community. At a high level, the choice extends into streaming platform discussions.
Commercial solutions like SAS Event Stream Processing offer a more robust development and deployment framework that saves time and cost. Customers often look to SAS for a proven, scalable platform (frequently the main consideration). SAS Event Stream Processing can also accelerate development and ease deployment efforts through native, optimised integration with legacy systems like Teradata, IBM WebSphere MQ Series and next-generation big data platforms like Hadoop and Apache Nifi.
A key benefit of open source solutions, of course, is their openness or extensibility. Fortunately, SAS Event Stream Processing was developed with that in mind. It can be extended by using third-party codes and APIs. This means developers can potentially use C++/C, Python or Java to build new extensions and adaptors and design the embedded streaming logic.
In the end, you shouldn’t choose a streaming engine based solely on whether it’s open source technology or commercial software. Instead, you should base your decision on functionality.
Not all streaming engines are created equal. Your choice of streaming platform will certainly affect what you can do and the insights you can get. One of the unique aspects of the SAS Event Stream Processing engine is that it was built with analytics at its core from day one. It excels in traditional requirements like data aggregation and filtering within stateful or stateless windows. But it’s the ability to deploy analytical models within the engine that makes it an incredibly powerful platform for driving real-time analytical insight.
SAS Event Stream Processing supports dynamic clustering-based machine learning using k-means clustering to identify homogeneous segments in live event stream data. Built-in text mining capabilities let you analyse and score unstructured text on the fly. This opens the possibility of doing sophisticated sentiment and contextual analysis against streaming social media content and other unstructured content.
At SAS, we believe that streaming analytics should be a foundational pillar for every analytical deployment architecture and that machine learning algorithms will be core to the success of future streaming solutions.