Why big data and streaming go hand in hand

As I've previously written, data analytics historically analyzed data after it stopped moving and was stored, often in a data warehouse. But in the era of big data, data needs to be continuously analyzed while it’s still in motion – that is, while it’s streaming. This allows for capturing the real-time value of data […]

Post a Comment

Finding the balance between short- and long-term big data storage

Back before storage became so affordable, cost was the primary factor in determining what data an IT department would store. As George Dyson (author and historian of technology) says, “Big data is what happened when the cost of storing information became less than the cost of making the decision to […]

Post a Comment

Beyond the boundaries of structured data: Part two

How many times have you gone onto a website, put a few things in a shopping cart, and then exited the Internet? I do it all the time. Sometimes when I log on to that site during my next visit, those same items are still in my cart – ready for purchase. I find […]

Post a Comment

Data integration modernization: Key recommendations

Modernization. It’s a hot topic for organizations in all types of industries that are looking for ways to streamline hardware and software footprints while gaining control and insights from the data deluge. In the data integration space, this means we have to look beyond a traditional ETL approach to one […]

Post a Comment

Big data modeling: An iterative approach - Part 1

In my prior two posts, I explored some of the issues associated with data integration for big data and particularly, the conceptual data lake in which source data sets are accumulated and stored, awaiting access from interested data consumers. One of the distinctive features of this approach is the transition […]

Post a Comment

ESP can determine if big data is eventful

Many recent posts on this blog have discussed various aspects of event stream processing (ESP) where data is continuously analyzed while it’s still in motion, within what are referred to as event streams. This differs from traditional data analytics where data is not analyzed until after it has stopped moving and has […]

Post a Comment

Event stream processing – Tip 1: Don’t be overwhelmed

I believe most people become overwhelmed when considering the data that can be created during event processing. Number one, it is A LOT of data – and number two, the data needs real-time analysis. For the past few years, most of us have been analyzing data after we collected it, […]

Post a Comment

Let us be smarter with the Internet of Things

As we enter the era of “everything connected,” we cannot forget that gathering data is not enough. We need to process that data to gain new knowledge and build our competitive advantage. The Internet of Things is not just a consumer thing – it also makes our businesses more intelligent. Whenever […]

Post a Comment

Data management for analysis – Feeding the analytical monster more than once

(Otherwise known as Truncate – Load – Analyze – Repeat!) After you’ve prepared data for analysis and then analyzed it, how do you complete this process again?  And again? And again? Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat […]

Post a Comment

Using Hadoop: Query optimization

In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed […]

Post a Comment