When machine learning and streaming data meet


streaming data and machine learning meetI recently started using a task automation app on my smartphone to automate many of the settings I had been changing manually. Most of these settings revolved around my current location. When at home, I connect to my wireless network to avoid monthly data overages on my smart phone plan. I also connect to my Bluetooth earpiece for hands-free talking, turning the volume up for calls and notifications. When I leave the house, I disconnect and disable both Wi-Fi and Bluetooth for security purposes. I also switch all volume settings to silent with vibrate.

The automation app uses GPS sensor data to periodically check where I am, and it adjusts these settings as my location changes. As useful as this automation is, it still relies on user input. And this means I have to configure the automation algorithm myself, teaching it to understand and implement my preferences.

This pales in comparison to the potential of machine learning, where computer algorithms learn without explicitly being programmed (beyond perhaps a training model with basic rules to frame the learning process). Such algorithms learn autonomously as they’re iteratively exposed to more data.

Machine learning itself is not a new concept. Machines learned to play (and consistently beat human champions at) checkers in the 1950s, chess in the 1990s, Jeopardy! earlier this decade, and the complex ancient Chinese game Go last year. Recent history has also super-powered machine learning with rapid advancements in compute power, data storage, parallel processing, network bandwidth and high-speed connectivity. And over the years, machines have also learned to do a lot more than just play games with humans. In fact, machine learning has progressed to the point where it’s challenging the role humans play.

In traditional analysis, a human selects a model they believe best fits a collection of data. I believe humans will remain an essential element of analytics – and while a human analyst might create one or two good models a week, machine learning is capable of creating thousands of good models per week. This is why machine learning is one of the leading drivers of the evolution of analytics – especially when it’s paired with streaming data, which shortens learning cycles and increases the number of iterations that can be rapidly processed.

With streaming data, machines can learn without being taught or preprogrammed with domain knowledge. For example, Google Search learned to correct misspelled words without access to a dictionary by processing billions of searches and learning from the websites the majority of users selected (i.e., when the search terms were spelled correctly).

Speech recognition is another example. Apple’s Siri was perhaps the pioneer in this field, but as an Android fan I have spent more time with Google Assistant. Previous attempts at this technology relied on end users spending countless hours teaching the algorithm how to recognize their speech patterns. The streaming data of online audio (e.g., podcasts and videos) provided the rapid learning environment that essentially perfected these algorithms. My current smartphone recognized my voice automatically without needing me to train it.

An enterprise example is machine learning that detects credit card fraud. It works by analyzing millions of financial transactions without requiring a human expert to explicitly explain how to detect the fraud.

When machine learning and streaming data meet, amazing things can happen. But there are also ethical concerns. Mistakes made by machine learning when it involves learning how to play games, spell, recognize speech or detect fraud are relatively easy to accept. Mistakes made by a machine that's learning to drive a car, however, can result in the loss of human life.

There’s also the loss of human privacy to consider. The future of smartphone automation may be completely driven by machine learning. Not only will my smart phone be able to learn my preferred Wi-Fi, Bluetooth, volume and other settings as I travel, it will likely also detect and automate other preferences. It could direct my self-driving car to stop at the grocery store on the way home from work because my smart refrigerator told it I needed milk. It could also automatically pay for the milk with my credit card so that I can just grab it and go (perhaps without even getting out of the car). And as I enter my driveway, it could tell my smart house to turn on the lights, adjust the temperature, disable the home security system, and queue up the latest episode of my favorite TV show.

The amount of streaming personal data I would have to give machine learning access to gives me pause. But there’s no doubt that machine learning and streaming data are pressing the fast-forward button on the future of analytics.

Download an e-book: The Machine Learning Primer

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Related Posts

1 Comment

  1. Jim, this is a great overview and I certainly agree with your points. I have just recently begun delving into machine learning as the company I work for built a toolkit to use machine learning along with our big data platform. This is an exciting field where mathematics meets science meets technology. I have been writing some blog posts recently on the topic of machine learning, but I am just learning this space. Thanks.

Leave A Reply

Back to Top