To succeed in any data-focused hackathon, you need a robust set of tools and skills – as well as a can-do attitude. Here's what you can expect from any hackathon:
- Messy data. It might come from a variety of sources, and won't necessarily be organized for analytics or reporting. That's your job.
- Nebulous problem set. Usually the goal of a hackathon is to generate insights, improve a situation, or optimize a process. But you don't know going into it which insights you need, which process is ripe for optimization, or which situations can be improved by using data. Hackathons are as much about discovering opportunities as they are about solving problems.
- Team members with different viewpoints. This is a big strength of hackathons, and it can also present the biggest challenge. Team members bring different skills and ideas. To be successful, you need to be open to those ideas and to allowing team members to contribute in the way that best uses their skills. Think of yourselves as the Oceans Eleven of data analytics.
In my experience, hackathons are often a great melting pot of different tools and technologies. Whatever tech biases you might have in your day job (Windows versus Linux, SAS versus Python, JSON versus CSV) – these melt away when your teammates show up ready to contribute to a common goal using the tools that they each know best.
My favorite hackathon tools
At the Analytics Experience 2018 Hackathon, attendees have the entire suite of SAS tools available. From Base SAS, to SAS Enterprise Guide, to SAS Studio, to SAS Enterprise Miner and the entire SAS Viya framework -- including SAS Visual Analytics, SAS Visual Text Analytics, SAS Data Mining and Machine Learning. As we say here in San Diego, it's the whole enchilada. As the facilitators were presenting the whirlwind tour of all of these goodies, I could see the attendees salivating. Or maybe that was just me.
When it comes to getting my hands dirty with unknown data, my favorite path begins with SAS Enterprise Guide. If you know me, this won't surprise you. Here's why I like it.
Import Data task: Import any data
Hackathon data almost always comes as CSV or Excel spreadsheets. The Import Data task can ingest CSV, fixed-width text, and Excel spreadsheets of any version. Of course most "hackers" worth their salt can write code to read these file types, but the Import Data task helps you to discover what's in the file almost instantly. You can review all of the field names and types, tweak them as you like, and click Finish to produce a data set. There's no faster method of turning raw data into a SAS data set that feeds the next step.
See Tricks for importing text files and Importing Excel files using SAS Enterprise Guide for more details about the ins-and-outs of this task. If you want to ultimately turn this step into repeatable code (a great idea for hackathons), then it's important to know how this task works.
Note: if your data is coming from a web service or API, then it's probably in JSON format. There's no point-and-click task to read that, but a couple of SAS program lines will do the trick.
Query Builder: Filter, compute, summarize, and join
The Query Builder in SAS Enterprise Guide is a one-stop shop for data management. Use this for quick filtering, data cleansing, simple recoding, and summarizing across groups. Later, when you have multiple data sources, the Query Builder provides simple methods to join these – merge on the fly.
Before heading into your next hackathon, it's worth exploring and practicing your skills with the Query Builder. It can do so much -- but some of the functions are a bit hidden. Limber up before you hack!
See this paper by Jennifer First-Kluge for an in-depth tour of the tool.
Characterize Data: Quick data characteristics, with ability to dive deeper
If you've never seen your data before, you'll appreciate this one-click method to report on variable types, frequencies, distinct values, and distributions. The Describe->Characterize Data task provides a good start.
Using SAS Studio? There's a Characterize Data task in there as well. See Marje Fecht's paper: Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide for more about this and other tasks.
Data tasks: Advanced data reworking: long to wide, wide to long
"Long" data is typically best for reporting, while "wide" data is more suited for analytics and modeling The process of restructuring data from long to wide (or wide to long) is called Transpose. SAS Enterprise Guide has special tasks called "Split Data" (for making wide tables) and "Stack Data" (for making long data). Each method has some special requirements for a successful transformation, so it's worth your time to practice with these tasks before you need them.
Program Editor: Flexible coding environment
The program editor in SAS Enterprise Guide is my favorite place to write and modify SAS code. Here are my favorite tricks for staying productive in this environment including code formatting, shown below.
Have another favorite editor? You can use SAS Enterprise Guide to open your code in your default Windows editor too. That's a great option when you need to do super-fancy text manipulation. (We won't go into the "best programming editor" debate here, but I've got my defaults set up for Notepad++.)
Export and share with others
The hackathon "units of sharing" are code (of course) and data. SAS Enterprise Guide provides several simple methods to share data in a way that just about any other tool can consume:
- Export data as CSV (CSV is the lingua franca of data sharing)
- Export data as Excel (if that's what your teammates are using)
- Send to Excel -- actually my favorite way to generate ad-hoc Excel data, as it automates Microsoft Excel and pipes the data your looking at directly into a new sheet.
- Copy / paste with headers -- low-tech, but this gets you exactly the columns and fields that you want to share with another team member.
When it comes to sharing code, you can use File->Export All Code to capture all SAS code from your project or process flow. However, I prefer to assemble my own "standalone" code piecemeal, so that I can make sure it's going to run the same for someone else as it does for me. To accomplish this, I create a new SAS program node and copy the code for each step that I want to share into it...one after another. Then I test by running that code in a new SAS session. Validating your code in this way helps to reduce friction when you're sharing your work with others.
Hacking your own personal growth
The obvious benefit of hackathons is that at the end of a short, intense period of work, you have new insights and solutions that you didn't have before – and might never have arrived at on your own. But the personal benefit comes in the people you meet and the techniques that you learn. I find that I'm able to approach my day job with fresh perspective and ideas – the creativity keeps flowing, and I'm energized to apply what I've learned in my business.