If a picture is worth a thousand words, then visualizing data in Hadoop would be like a billion. Over the last few years, organizations have rushed to leverage the low-cost distributed computing and storage power of Hadoop clusters.
As Hadoop environments mature and move away from their initial focus of batch and search, visualization and analysis of all this new data is moving into the limelight.
Over 5,000 data architects, data scientists, modelers, business managers, IT executives and big data thought leaders will gather in San Jose for the Strata Hadoop World conference March 29-31. As I mentioned in my previous posts, this blog series will take a look at some of the key technologies Strata attendees can learn about at the SAS booth (#1022). We’ve already answered questions about streaming analytics and how SAS supports Spark -- in this post we’ll explore what’s involved in visualizing Hadoop data.
I sat down with Keith Renison to find out what he’ll be doing at Strata next week. Keith is a senior solutions architect at SAS and when he’s not helping solve customers’ challenges, you’ll probably find him either snowboarding, wakeboarding or riding his Ducati Monster 796 (that is if he didn’t buy the Panigale he was test driving last weekend).
What technology will you be focusing on at the show?
Renison: I’m super excited to show SAS Data Loader for Hadoop and SAS Visual Analytics, two technologies designed to prepare and visualize massive amounts of data inside the Hadoop platform.
Has the introduction of Hadoop made data visualization easier or more difficult?
Renison: I’m often told that 80 percent of analytic work is in preparing data for visualization.
Think about it, how many times have you carried your data all the way to the visualization phase, only to have to go back to the beginning to rework aggregations, add new variables, format or clean up crummy data?
Rinse and repeat, is the mantra of the modern data scientist. So in many ways, data visualization has become increasingly more difficult with the introduction of Hadoop, particularly when it comes to data preparation at massive scale.
That’s why adding tools like SAS Data Loader for Hadoop to the data scientist’s arsenal is critical to visualization. SAS Visual Analytics is also designed to analyze data in detail as opposed to pre-aggregating (cubes are dead! Dead, I tell you!). These tools are explicitly designed to expedite time-to-insight by simplifying data profiling, data quality and data visualization without the time-cost of moving data around.
How is this different from what SAS has done in the past?
Renison: SAS has a tremendously smart and capable user-base spanning almost 40 years! It’s not a mistake that SAS analytics have become entrenched in organizations around the world. Our users paired with world-class technology has proven the value of putting analytics into production.
This doesn’t just benefit us, but has raised the bar for commercial and open source technologies alike. We like open source, and in fact have all the right integration points. If I could say one thing about how our software is different, I would say that as the leader, SAS has been working extremely hard to not just keep up, but has stayed ahead of these trends.
We're continuing our tradition of innovation and leadership. These technologies are built upon progressive data architectures and are designed to capitalize on our experience in analytics at scale and production, including Hadoop. Watch for SAS to continue to pave the way for analytics at scale, analytics in the stream, and modeling at ridiculous scale (automation of models at each segment anyone?).
What are you most excited to see at this conference?
Renison: Our customer, and those who will be our customers! Everyone brings an amazing story, new ways to approach problems. It’s an inside joke that SAS can do everything. I love listening to the ways problems are being solved, and the new data challenges that people bring us at these conferences. Stop by the SAS booth (#1022). We’ll surprise you, you’ll learn something -- and you’ll make some new friends.
If you’re looking to learn more about Data Visualization read the post, Why data visualization matters or jump right in with a free trial of the software.
Stop by the SAS booth (#1022) to chat with Keith, pick up a “Data Dude” or “Data Diva” t-shirt and meet the rest of the SAS team. You also won’t want to miss Paul Kent’s and Patrick Hall’s presentation March 30th at 4:20, A survival guide to machine learning: Top 10 tips from a battle-tested solution.
It’s not too late to register for Strata Hadoop World. Use the discount code SAS20 to receive 20% off of your registration.