In my last two posts, I introduced some opportunities that arise from integrating event stream processing (ESP) within the nodes of a distributed network. We considered one type of deployment that includes the emergent Internet of Things (IoT) model in which there are numerous end nodes that monitor a set of sensors,
Author
In my last post, we examined the growing importance of event stream processing to predictive and prescriptive analytics. In the example we discussed, we looked at how all the event streams from point-of-sale systems from multiple retail locations are absorbed at a centralized point for analysis. Yet the beneficiaries of those
Over the past year and a half, there has been a subtle shift in media attention from big data analytics to what is referred to as the Internet of Things, or IoT for short. The shift in focus is not intended to diminish the value of big data platforms and
Once you have assessed the types of reporting and analytics projects and activities are to be done by the community of data analysts and consumers and have assessed their business needs and requirements for performance, you can then evaluate – with confidence – how different platforms and tools can be combined to satisfy
In the last few days, I have heard the term “data lake” bandied about in various client conversations. As with all buzz-term simplifications, the concept of a “data lake” seems appealing, particularly when it is implied to mean “a framework enabling general data accessibility for enterprise information assets.” And of
As part of two of our client engagements, we have been tasked with providing guidance on an analytics environment platform strategy. More concretely, the goal is to assess the systems that currently compose the “data warehouse environment” and determine what the considerations are for determining the optimal platforms to support
In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are
In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed
Hadoop is increasingly being adopted as the go-to platform for large-scale data analytics. However, it is still not necessarily clear that Hadoop is always the optimal choice for traditional data warehousing for reporting and analysis, especially in its “out of the box” configuration. That is because Hadoop itself is not
Over my last two posts, I suggested that our expectations for data quality morph over the duration of business processes, and it is only at a point that the process has completed that we can demand that all statically-applied data quality rules be observed. However, over the duration of the