Will streaming data kill big data as we know it?


The world is already embarked on the so-called Fourth Industrial Revolution. This is likely to be hugely disruptive, and have a major impact on Society because it will change the way humans behave, are organized, and work, as well as the skills and competencies needed. It will also, however, have positive impacts on sustainability. At the heart of this transformation are automation, robotization and digitalization, all of which depend on data and are already seeing rivers of data flowing into big data oceans (lakes are too small nowadays).

Big data is commonly seen in terms of three Vs: volume, velocity, variety. However, people are adding another two, Veracity and Value, and these seem to me to be the more important. As the objective is to get value out of the data, the volume, velocity, and variety do not matter quite as much, because value depends on veracity. The critical challenge is to extract valuable and actionable information, at the right moment.

What and when is the right moment?

Whatever the technology applied to big data repositories, all analyses are based on past data. They rely on events that have already occurred and that we can therefore do nothing about. All we can do is mine and explore these oceans of data☺, applying advanced analytics algorithms to understand more about the past and using that try to predict how things would look like.

I think, however, that this will change, and is already starting to do so with the Artificial Intelligence – englobing Machine Learning, Deep Learning, …

These algorithms are designed to mimic the human brain. As we learn more about how the brain works, we understand that it is not an exact machine. The brain receives multiple signals from a wide range of sensors within our body, processing all that data rapidly but in context. If nothing goes wrong with the process of “search, evaluate likelihood, judgement, decision”, then for every event, our brain returns the most likely answer.

With #AI algorithms are designed to mimic the human brain, receiving multiple signals from a wide range of sensors processing rapidly in context. Click To Tweet

What do we memorize and how do we learn? Well, not being a brain scientist, my simplistic view of how it works is that there is an input, we search for the best possible answer while judging it (applying, for example, social standards and constraints), we predict the likely outcome and then take action, evaluating then its impact/outcome. All of this, of course, happens in milliseconds. The learning comes from the difference between our prediction and the actual result. Again simplistically, we memorize the input, the context, the action, the outcome and the difference from the expected one. It does sound simple, but it has enabled us to evolve to become the most advanced specie on the globe. As nobody has yet come up with a way to implement emotions within AI, I’m leaving that out of this equation.

Fundamentally, however, we can consider ourselves as processing multiple data streams (some even in combination) all the time. Effectively, we’re applying predictive models over streaming data.

Big data is reaction, streaming data is action

As AI evolves towards actually replicating how the human brain works, I believe that self-learning algorithms will be using fewer and fewer past data, and more of their own results. Past data, by which I mean big data from a wide range of all sources, is required in the design, testing and tuning of self-learning algorithms. But once they can self-tune, the need for past data will decrease at the same rate as the difference between prediction and reality. As these self-learning/self-tuning algorithms develop, we will need less and less big data as we know. In the future, we are likely to have algorithms that teach themselves from streaming data, naturally evolving creating new algorithms and dynamically work in cooperation with others depending on the context, just as humans already do.


About Author

Joao Oliveira

Information management has been part of João’s professional DNA for more than twelve years. He has driven and supported many EMEA-wide data management initiatives across multiple industries. João's professional experience includes functional knowledge of end-to-end solution architectures. His experience and knowledge has been leveraged on multiple engagements supporting information management, from data capture to data archiving. He also helps organizations cope with legal requirements like GDPR, improve customer experiences, drive digital transformation and get the most business value from data at-rest and data in-motion. João has a degree in Applied Mathematics and is a spearhead in matters related to Artificial Intelligence, Machine Learning and Advanced Analytics.

Leave A Reply

Back to Top