For many organisations, open source supports the rapid and agile development of projects and models. New projects enable significant success with fast deployment of analytics projects, and also – importantly – supports a failing fast strategy without having to incur significant infrastructure costs.
Deploying open source at the enterprise level often requires manual integration to execute score code in production environments. IT departments like to control the activities of data scientists and manage the analytical life cycle. This need for control is amplified when there is no central repository or platform where everything analytical is brought together. The need for control often extends to processes that monitor and retrain analytical models.
Today, many companies are extending their open source investments with an analytics platform that's used to operationalise models, govern analytics across packages and to deploy analytics projects at a large scale.
Introducing the idea of a hybrid approach to deployment
As is so often the case, a hybrid approach combining open source with proprietary software can pay dividends.
Software such as SAS® can be used to address some of the challenges. It can, for example, be used to carry models across into production and scale them for enterprise-level use. It can also support good governance of models and data, essential to comply with regulatory requirements, as well as to ensure stability into the future.
There are a number of different ways in which SAS® Viya® can interact with open source software:
- SAS Cloud Analytics Services (CAS) actions are the tools used to interact with data using CAS server. They can be used to leverage the analytical capability of SAS Viya without leaving the open source programming application. It is possible to use CAS actions for data manipulation and modelling.
- Model importing in SAS Model Manager, which can translate in SAS DS2 language models developed in open source that has previously been exported in PMML. The generated code can be executed in database, in memory, in streaming or through API rest call. This capability is extremely useful if data scientists are using packages and algorithms not supporting parallel processing. Once the model is imported in SAS Model Manager, it becomes part of the SAS central repository, and the IT department can easily retrieve all the metadata, such as what KPIs were used, what the target variable is, and what language has been used to develop the original model. The same model can be monitored automatically to check if the performance satisfied company benchmarks.
- SAS Visual Data Mining and Machine Learning allows models in different languages to be compared using one interface. It speeds up the modelling phase and makes it possible to find the right analytical solution by testing a wide range of algorithms.
- It is also possible to execute Python from SAS using PYMAS (Python Micro Analytical Services). In this case, SAS make a call out to Python, and the process is executed leveraging the Python environment.
All these options have advantages and disadvantages. Some require manual coding, or eventual recoding into SAS, or for model comparisons to be performed in the open source environment. They all, however, offer ways to scale up smaller models in a much simpler way than would otherwise be possible. They therefore provide an interface for combining agility with reliability at scale.
Integrate and embrace
Open source and proprietary software should never be seen as an either/or. The two have very different skills, and very different advantages and disadvantages. A hybrid approach, integrating and embracing the two, is perhaps the strongest and most effective way to combine rapid development, deployment at scale, and reliability in model management. Hybrid vigour is a concept that works well for natural ecosystems. It turns out that it also works for analytics.