SAS on GitHub

3

Right now I’m crossing the Pacific toward Australia and New Zealand for the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (a.k.a. KDD), a Data Science Melbourne MeetUp, and the SAS Users of New Zealand conference. New Zealand is the birthplace of open source R. So this trip has me thinking a lot about the relationship between open source and proprietary analytics tools. I’m expecting some tough questions from the local data science community, but one thing I’m really excited to share with them, and with you, are several new, SAS-sponsored, open source GitHub repositories (repos). Below you’ll find out why I’m excited about SAS’ participation in the open source analytics community, why I think GitHub is a great way to facilitate that participation, and more info about the repos themselves.

Whether you work alone, for a startup, or for an established company, participation in an open source project is about way more than just sharing code – among other things, it fosters intellectual ownership of the project and boosts your visibility. Proprietary analytics software  is seldom mentioned on social media by people not on the payroll of said proprietary product. Yet my Twitter feed is literally choking on what amount to unsolicited love poems to open source analytics software. (That’s right, I have no life whatsoever.)  Users of open source analytics tools feel a sense of pride and intellectual ownership toward their software, and this is incredibly important . Once a user feels ownership of a tool, of course they will blog, tweet, and defend it to the death in those “spirited” SAS vs. R vs. Python debates. Why does representation on social media matter? Because perception can become reality. Analyst firms, practitioners, journalists, and even decision makers at large corporations use social media to know what is current. If social media discussions are dominated by proponents of open source tools, then people might begin to mentally relegate SAS to the legacy software dustbin. Social media matters even more for younger practitioners. They have to polish their online reputations just like older professionals polished their paper resumes. SAS provides excellent documentation and examples online and social channels to communicate about our tools, but the new generation rightly clamors for a two-way channel of communication in which useful technical contributions can be shared easily and can boost an individual’s professional reputation. SAS’ is putting forward these new GitHub repos in hopes that user contributions will encourage both intellectual ownership for users and public recognition of valuable contributions.

GitHub
GitHub

Just in case some of you don’t know about them yet, Git and GitHub combine into a sophisticated solution for version control, bug tracking, and collaboration between developers, plus some social goodies to help you follow your favorite developers and projects. Oh yeah, and they’re free. For long time SAS Users, it has code snippets and snarky repartee like SAS-L, plus really nice development tools. GitHub even has syntax highlighting for SAS. The only thing missing from the party is you!

So without any further ado, the repos are:

github.com/sassoftware/enlighten-integration - Integration strategies between Java, PMML, Python, R, and SAS.

github.com/sassoftware/enlighten-apply - Applied machine learning projects.

github.com/sassoftware/enlighten-deep - Code for deep learning (as it becomes available).

Why the name enlighten for the SAS repos? For two reasons. First, it is a move of enlightened self-interest for SAS to put macro code on GitHub: SAS will benefit from user knowledge and contributions. Second, SAS wants to shed light on some of our own more advanced machine learning capabilities. SAS has been working in machine learning since the early 1980s and has a lot of production quality machine learning functionality to offer to our customers.

These repositories are a work in progress, and we really hope they continue to evolve with input from inside and outside of SAS. They have also been a team effort by me and several of my peer SAS Enterprise Miner developers, including: Funda Gunes, Tim Haley, Radhikha Myneni, and Ruiwen Zhang. It’s been a really fun project for us and we look forward to your pull  requests. (Pull requests are one way you contribute to a GitHub repo. Just google it.)

Image credit: photo by othree // attribution by creative commons

Share

About Author

Patrick Hall

Senior Machine Learning Scientist

Patrick Hall was the 11th person worldwide to become a Cloudera certified data scientist. He designs new data mining and machine learning technologies for SAS. Talk to him about data on Twitter, Linkedin, or Quora.

Related Posts

3 Comments

  1. Thank you for your comment.

    You are correct that most of the code currently available in these repos are examples.

    However, it is quite easy to construct a library of SAS macros (called a catalog). We or others could create such a library and commit it to these repos.

    In SAS, the primary data mining and machine learning library is currently supplied by Enterprise Miner. (It is a collection of procedures, macros, and a graphical user interface.) I personally have not yet found it necessary to create an additional library of data mining and machine learning functionality to augment Enterprise Miner.

    I acknowledge that purchasing a proprietary library is quite different than a group of users working together to create an open source library.

Back to Top