“Begin with the end in mind” - Habit #2 from Stephen Covey’s ‘Highly Effective People’.
The Edge Foundation is based on the premise of: “To arrive at the edge of the world's knowledge, seek out the most complex and sophisticated minds, put them in a room together, and have them ask each other the questions they are asking themselves.” Each year they pose to their illustrious contributors their annual “Edge Question”, after which John Brockman, editor and publisher of Edge, gathers together the various responses and publishes them in book form. The question for 2014 was “What scientific idea is ready for retirement?”, with the replies recently published as “This Idea must Die”.
The nomination from Gary Marcus, cognitive scientist at NYU, for an idea whose time has come was ‘"big data" (Already? We hardly got to know you, big data). Marcus' argument wasn’t that big data has become unnecessary, but that it’s quickly become a case of putting the cart before the horse. Data has its place, but that place is AFTER you have formulated a hypothesis or theory about a problem you are trying to address. With a theory in place, you then devise an experiment to test that hypothesis. The most important property of the data at this stage is that it be relevant to the problem / experiment at hand, and if so, then the more the merrier. But if not, well, as Marcus puts it, “Big data should not be the first port of call; it should be where we go once we know what we’re looking for”.
This ‘starting with the end in mind’ approach serves to highlight the key factor that will drive the effectiveness of your data scientists and business analysts – the right data management tools. You want your data scientists and analysts spending the bulk of their time and effort on developing and working out the details in those hypotheses, theories and models, not in data collection and preparation. Analytics uses data in a format quite different from that found in the EDW, where storage efficiency is paramount. The right data management tools can cut the proportion of time spent on data prep in half or more from the 80% typically seen, freeing your valuable analytic resources to focus on solving the important business problems.
I have my own related anecdotal evidence on this subject, which comes from my experiences as conference chair, a story I call, “The Big Honkin’ Data Cube”.
As a conference chair, one of your primary concerns is keeping to the schedule. If each speaker has 30 minutes, and you’ve allotted the last five minutes for Q&A, you would expect them to be getting around to their ‘summary / key takeaways’ slide by around the 22 minute mark, which means the crucial “So What” moment should occur by about 16-18 minutes into the presentation.
There is, however, one category of presentation that puts me into panic mode at least once a conference, the presentation where the user or vendor talks about how they implemented their data warehouse. I keep waiting, and waiting, and waiting for the punchline, when with just four minutes left for Q&A they announce that they are finished and ready for questions.
As I look out over the audience, I see two different sets of facial expressions. Half of the audience, the IT segment, has that look of satisfaction - they got their roadmap and tips & tricks and lessons learned and it was time well spent. The other set of faces, the business users, has the look of, “What – did something just happen? What did we miss?”
It took me the longest time to figure this phenomenon out. The conclusion I came to was that the primary focus of the IT-oriented presenter was on the construction and implementation of the data cube itself, and for them the job was complete at that point - any related business case for the EDW was taken for granted and left unstated. The business users, however, were left at the alter waiting in vain for the presenter to connect the dots to the business issue(s) this data monster was built to address; their expectation for the "So What" moment being more along the lines of: What is this data warehouse being used for, and who are the intended users? Marketing, HR, quality, R&D, forecasting and planning?
The two audience segments approach the business arena from opposite ends, with one working from the data towards the business problem, and the other from the business problem backwards to the data. I find myself, along with Stephen Covey, in that latter group, who are typically the ones to initiate the Q&A with a question about the primary business case(s) attached to the EDW implementation.
I don’t want to leave the conversation here, though, as, this conclusion entirely contradicts my contention last week in “Big Variety” that the value in big data lies in the connections, correlations, networking, explorations and insights that can be gleaned from both its variety and its bigness. Last week’s assertion was that big data / Big Variety / the EDW is most definitively the place to start to discover your Unknown-Unknowns.
In the end I think there is value in approaching that Big Honkin’ Data Cube from both directions: Getting the right data to tell the story / address the business problem, but also in interrogating that data for the many interesting stories hidden within. When it comes to big data, good habits can start at either end.
4 Comments
Valuable put up about good habits for big data. This conference clearly explains right data management tools will save time spent on the data preparation.
Pingback: Your information supply chain - Value Alley
Pingback: Lifelong learning and analytics - SAS Voices
Fantastic article! Thank you very much Leo for sharing this informative article about big data. It was very interesting and helpful.