Analytically speaking, West Perth is paradise. We know that thanks to the Paradise Found project and my colleague Andreas Becks’ blog posts on the topic. One thing is clear: Without machine learning and analytics, we never would have found this Australian boomtown neighbourhood among 5 million data points collected on 148,233 locales around the world from 1,124 different sources. To me, data management is every bit as important as analytics. That is because to get the data to tell us anything, we need data management and analytics to work together optimally.
The real big data challenge: V as in variety
The challenge in analytics projects (as in Paradise Found) often lies not in the volume of data but in the variety of source systems and access pathways and in the diversity of data structures and missing structures. Here we have once again seen confirmed how important it is to have an open analytics platform that can transparently access nearly every data source and acquire the data without problems.
Diverse data sources and heterogeneous data structures demand the whole repertoire of modern data quality capabilities. Standardising and consolidating city names from a dizzying variety of formats around the world – in terms of language and the alphabets used – were among the easy tasks in Paradise Found. Apart from standard data quality methods like profiling, parsing and cleansing, analytical data enrichment is absolutely critical to the success of such a project. Instead of excluding missing or incorrect data from the analysis, processes like machine learning make it possible to improve the usefulness of the data.
Modernizing the Analytics Life Cycle includes speeding up supporting processes. Analytics agility demands intuitive data preparation. Tune in to this webinar to learn how.
Success factors: Speed and simplicity
Besides the great importance of having the right data management tools, this project has demonstrated once again how important it is to closely integrate data management and analytics. Only an iterative, integrated process makes it possible to make rapid progress and to enrich the analyses with additional data and derive insights. The traditional division of labour between data scientist and data architect or between analyst in the department and IT is now a thing of the past. These processes must be merged into an iterative process to generate innovation. Only an integrated platform like the SAS® Platform, which covers these iterative steps in a complete process, makes it possible to implement a project like this in just a few weeks’ time.
The key aspects here are the consistent use of analytics and machine learning algorithms throughout the process – even at the earliest stage of data preparation – and constant transparency of the existing data, data quality, and any information already generated from the data in the form of models. In combination with an intuitive front end, this can enable a broad range of users to get the data to speak to them very quickly, in a “self-service” process.
Thus, big data management is more than a simple finger exercise, but it also doesn’t have to be an onerous chore. It is the only way to obtain a clear, undistorted picture of the data and derive models – and the success or failure of every analysis turns on that. So you won’t find paradise without good data management – at least not a paradise proven by analytics. In the case of Paradise Found, valid and telling results are certainly nice to have. But in business, machine learning will generate entirely new realms of potential.
#BigDataManagement is more than a simple finger exercise, but it also doesn’t have to be an onerous chore. There are 2 success factors. #ParadiseFound Click To Tweet