Celebrate World Creativity & Innovation Week with Analytics R&D - Part 1

Note from Udo Sglavo: When curiosity meets capability, the world moves forward. As we celebrate World Creativity & Innovation Week, it seems appropriate to document some of the innovations Analytics R&D is working on. But with such a broad portfolio covering topics like statistics, machine learning, natural language progressing, forecasting, optimization, econometrics, image analytics, and others, how do you keep track of all the innovative work happening? Let's kick this series off with me talking to Amy Shi, Sr. Research Statistician Developer, Maggie Du, Sr. Machine Learning Developer, and Phil Helmkamp, Sr. Software Developer.

Amy Shi and team provide Bayesian network meta-analysis using SAS/STAT

Udo: What inspired you to pursue this particular project?

Amy: Network Meta-Analysis (NMA) is a popular and powerful technique for comparing multiple treatments in a collection of randomized trials. The primary drawback, from a practitioner’s perspective, is the difficulty in implementing NMA methods. All other software packages (R, BUGS, Stata, Python) require significant coding and modeling background. We wanted to enhance current Bayesian procedures, PROC MCMC and PROC BGLIMM, to provide a convenient and computationally efficient means for conducting various forms of NMA in SAS.

Udo: Did you encounter any challenges?

Amy: There were several along the way. First, Bayesian NMA is computationally intensive. We incorporated fast sampling techniques (Hamiltonian Monte Carlo and Gamerman Metropolis sampling) and employed modern parallel sampling methods embedded with multithreaded-computing capability. Second, the model is complex and hierarchical with very limited data. To counter that, we employed the data augmentation strategy and missing data mechanism. Lastly, one model has the scale parameter fixed as a constant for the normal distribution. We decided to add that feature in PROC BGLIMM to include that model. As a result, SAS/STAT can offer a comprehensive solution to Bayesian NMA analyses.

Udo: Why do you consider your project cutting-edge?

Amy: Our Bayesian procedures provide convenient access to NMA with simple syntax, high performance, and fast convergence. Our SAS tool is one of a kind! Users can utilize our procedures to fit NMA models with various types of responses (continuous, binary, count…). In addition, they can estimate model parameters, infer both absolute and relative risk-effects, account for different types of heterogeneity, and predict missing data. Collaborating with several professors, we submitted a paper that showcases our Bayesian procedures to a peer-reviewed journal. The paper is expected to be published soon. We’ve even been invited to present this work at the Duke Industry Statistics Symposium this April.

Flexible pipelines for time-series forecasting? Phil Helmkamp and team say yes

Udo: What was your motivation for this innovation?

Phil: After many years of supporting our Forecast Server product, our team came up with the idea to break up the forecasting process into customizable “model strategies” that can be assembled into a pipeline. It's still excellent software that provides a ton of value to this day but it's quite rigid. The product essentially has one strategy for producing forecasts. With the new approach, modelers can add any combination of pre-built and user-defined strategies. Our pipeline also supports segmenting the input data which allows different strategies to be applied to different buckets of time series.

Udo: Did you run into any obstacles?

Phil: In order to support arbitrary strategies, we had to come up with a common contract to which they all must adhere. This includes the arguments that are passed to the code and the set of tables to be read and produced.

For the arguments, a single JSON object is populated with global settings along with node-specific ones. This JSON object is then passed to the strategy where it is processed by utilities that we provide to parse the JSON into variables that are suitable for that strategy’s language. For example, SAS macro variables are created for strategies that are written in SAS code. These generated variables make up the contract that is documented and surfaced to our users.

For the data, we use a naming convention as a sort of namespace to indicate which tables belong to which strategy. The input and output namespaces are passed to the code as arguments so the strategy knows where to read and write data. This allows us to chain nodes together while avoiding unnecessary copying of the data. We also validate the output namespace after the node runs to ensure that it meets the contract of downstream nodes.

Udo: Why is your project innovative?

Phil: Products like SAS Forecast Server have long supported generating models and comparing them on a per-series basis. But we are not aware of any systems to define multiple strategies for automated forecasting and then compare them based on the overall accuracy of all the models.

To accomplish this, the strategies are run in parallel, each producing forecasts for all of the time series. The error distributions of each strategy are compared to select and carry forward only those forecasts produced by the strategy that is most accurate overall. This enables modelers to try out various methods. It includes more modern ones like neural networks while maintaining confidence that downstream systems are being fed the most accurate results. This capability is now available in SAS Visual Forecasting.

Real-time image analytics for self-driving with Maggie Du

Udo: What sparked your interest in this enterprise?

Maggie: Autonomous driving is a big trend nowadays, and computer vision is an essential part of it. Particularly, image segmentation plays an important role in finding what objects are in front of the driving vehicle and where they are. Therefore, it’s particularly important that the self-driving car can segment objects so it knows how to follow the roads as well as to avoid pedestrians and other vehicles. I believe it’s a good idea to construct the pipeline with SAS deep learning.

Udo: Did you encounter any road bumps?

Maggie: This application is for use in vehicles with limited calculation capabilities. Therefore it requires image analysis in a real-time manner so the cars can react simultaneously. To make it faster, I chose a lightweight semantic segmentation model that is created specifically for tasks requiring low latency operations. I then deployed it to edge devices. However, the original output is in data table format and the process of converting a table to images became the bottleneck. I added a new feature that allows base64 format and largely expedited the image conversion.

Udo: How is your project breaking new ground?

Maggie: This project shows how to achieve real-time road scene segmentation, which is a vital part of self-driving capabilities. It utilizes a lightweight model. After that, it is trained on a cloud device and deployed to edge devices. Now it can complete image taking (with a camera attached to edge devices), image analysis, and action taking according to road conditions. All in a real-time manner! In addition, this example can be extended to other fields easily, such as precision agriculture and video segmentation. You will find this capability is now available in SAS Visual Data Mining and Machine Learning.

Celebrate World Creativity & Innovation Week to be continued

More innovators will be featured in our next post to celebrate World Creativity & Innovation Week. Udo will talk to some developers working on computer vision, machine learning, and language identification.

LEARN MORE | SAS Viya