The epicenter of big data moves to New York City on September 29 at Strata + Hadoop World. It’s a great chance to mix and mingle with people that live and breathe analytics, including a core SAS team of thought leaders, developers and executives.
We’d love to be a part of your Strata + Hadoop World agenda, so we thought introductions to our team might spark your interest! This is the first of a series of blog posts over the next three weeks that will feature Q&As with the folks you can meet at the SAS booth #543.
First up is Patrick Hall, whose official SAS title is “Senior Machine Learning Scientist,” but I like to refer to him simply as “The Machine.” This guy is impressive – from his patent submissions for an algorithm that will determine the number of clusters in a data set, to his engaging and informative presence on social media.
What is your background?
I earned a BA in math at the University of North Carolina, and then entered a PhD program in physical chemistry at the University of Illinois, but found out it wasn’t really for me. So I took the programming skills I’d picked up and began working at small start-ups back in the North Carolina Research Triangle area. After a few years, I went back to school at NC State in the Masters of Analytics program, and now I’m at SAS.
As a data scientist, what skills help you most?
To me – and I do think data scientist is a hard term to define – being able to manipulate large amounts of data is what separates data scientists from more traditional roles like analysts or statisticians. Being able to pull diverse sources of data together usually requires creating your own software tools, but once you get data into the appropriate format, it’s much easier to analyze it, visualize it and tell a persuasive story with it.
When did you figure out you wanted to be a data scientist? What motivated you to become one?
Data science wasn’t a thing when I was in high school and college, but I realized I liked analyzing and visualizing data in grad school. I gravitated toward advanced data analysis methods, and I just really liked making the visualizations that we used to communicate our experimental results.
What’s your biggest accomplishment thus far?
Submitting a patent application, along with several colleagues, for an algorithm that can estimate the number of clusters in a data set. Determining the number of clusters in a data set is a fundamental problem in data mining and customer relationship management (CRM).
What are you most looking forward to at Strata + Hadoop World NYC?
I’m really looking forward to Spark Camp. It’s important for me to stay in touch with big data technologies outside of the SAS ecosystem. Also, Spark is just cool. In terms of sessions, there are just too many good sessions to name, but I'll try to spend time in sessions that focus on bridging the gap between training machine learning models and using them in real life. Hopefully, I’ll also get to catch up with some friends, like Nate Neff at Cloudera, who's running the “Designing the Big Data Applications” tutorial.
What software/techniques are you excited to show attendees?
I’m always excited to show people the distributed, in-memory machine learning capabilities in SAS Enterprise Miner. I also enjoy explaining the work we’ve done recently that allows SAS to interface with other technologies, like Java, Python and R.
Need a quick lesson on machine learning before meeting Patrick? Take a look at his recent blog post – An Introduction to Machine Learning.
Please stop by the SAS booth (#543) to talk to Patrick, grab a cup of coffee and meet the rest of the team. And don’t miss Paul Kent’s session on Thursday morning, Oct. 1: Patterns from the Future.