Note from Udo Sglavo: As we continue with our celebration of World Creativity & Innovation Week, we want to continue sharing some of our recent innovations from Analytics R&D at SAS. Previously we looked at Bayesian network meta-analysis, flexible pipelines for time-series forecasting, and real-time image analytics for self-driving. I'm thrilled to continue my discussions with Chris Barefoot, Manager Analytics R&D, Matthew Galati, Distinguished Operations Research Specialist, Courtney Ambrozic, Sr Associate Staff Scientist, and Davood Hajinezhad, Machine Learning Developer.
Udo: What sparked your interest to develop this innovative product?
Chris: The inspiration for SAS Conversation Designer came about very organically. One of our initial charters (as the Cognitive Computing group at SAS) was to make natural language interaction a priority for SAS products. SAS has always empowered our users to find answers from within their data. Therefore, it was really the next logical progression to give our users the ability to connect natural language questions to answers via chatbots. Now even those who aren't technical can ask simple questions as if they were talking to another human.
Udo: Were there any major challenges to deal with?
Chris: A challenge we constantly face is striking the balance between making a product easy to use for non-technical users and allowing our expert users the power and customization we know they’ll want. To try and solve this, we’ve stuck to the principle of least astonishment. Our users should never be surprised by how something behaves. However, for our power users, many of the product elements have a deeper layer that users can interact with if they take the time to understand them.
Udo: Why would this be considered the leading edge of this technology?
Chris: One of our primary goals when creating SAS Conversation Designer was to allow someone with no knowledge of Linguistics or Natural Language Processing to build a chatbot. On top of that, we wanted the product to work well from very small data sets and easily connect back to the SAS ecosystem. It’s the combination of these elements that position us in a unique spot compared to some of our competitors and we feel make for a compelling package.
Udo: What inspired you to pursue this particular project?
Matthew: In today’s world, connected data is everywhere. Being able to efficiently query connected data is an increasingly important issue as data collections grow.
Despite their name, relational database systems are not designed to model relationships between data entities. This is typically accomplished using joins. Joining tables on keys is slow and compute-intensive. Recursive joins are even worse and quickly become unmanageable.
Graph data modeling and graph search allow for properly modeling relationships between data, allowing for blazingly fast queries.
This project started out supporting SAS consultants working in Fraud. My research led me to study the quickly evolving market of graph databases. We immediately realized that graph search had applications in a broad range of industries. This includes genomics, cybersecurity, public health, and criminal investigation.
Along the way, we learned a great deal about the mathematical theory that underpinned the proper topological search of a graph and made this our focus as a differentiator.
Udo: What were some challenges you ran into and how did you overcome them?
Matthew: Throughout the general analytics space, there is an aversion to approaching theoretically difficult problems (those in the complexity class NP-hard). These problems are considered impossible to solve exactly and efficiently. Graph search falls under this category, as it is a generalization of subgraph isomorphism – an important but notoriously hard problem to solve. The challenge of tackling these types of problems is to find the proper balance between generality and specificity. For certain data characteristics, some of these problems can be solved quite efficiently. The key is to build a robust algorithm that can take advantage of these specific cases in an automated way, while still providing a generic framework to solve practically any graph search problem.
Coming from a background in mathematical optimization (specifically, integer programming), I deal with the impossible on a regular basis. I enjoy the challenge!
Udo: Why do you consider your project the cutting-edge in the industry?
Matthew: Most of the research in graph search focuses on schemas that improve path-based searches, and relies heavily on indexing schemas for improving performance. Unfortunately, this has two deficiencies. First, it performs poorly for queries that are not modeled as simple paths through the graph. Second, it often requires user expertise to tune the indexing schema for a specific use case. In SAS Viya’s network analytics toolkit we took a different view of graph search, focusing more on cutting-edge approaches to topological search rather than simple path-based search. That is, we optimize the query matching algorithms based on the structure of the relationships. This yields a framework that is both flexible and performant. You will find this capability in Network Analytics in SAS Visual Data Mining and Machine Learning.
Udo: What inspired you to focus on this technology?
Courtney: Artificial intelligence and machine learning are exciting fields that are constantly evolving. My innovation applies these front-line advancements to clinical data to gain insights as well as providing personalized treatment for patients suffering from colorectal liver metastases. This combination alone makes for an extremely motivating and rewarding project to work on.
The main product that I use to guide my work is SAS Visual Data Mining and Machine Learning. This powerful toolset allows me to build end-to-end pipelines to perform automatic segmentation on medical images using deep learning models. In particular, I used the bioMedImage action set to load and preprocess 3-D biomedical image data. The deepLearn action set helped me build models, train the models, and score test sets for image segmentation. After the images were scored, morphological insights could be gathered about the segmented regions. These important biomarkers can assist in developing treatment strategies for cancer. It’s great to know that the work I do at my desk can have a resounding impact on the care of cancer patients everywhere.
Udo: Did you have to overcome any challenges?
Courtney: Most of my challenges on this project stem from the fact that I don’t have a background in medicine. Without this experience, I faced barriers understanding how to work with the data and derive insights that would be the most useful to medical professionals. After spending time working closely with our partners and researching the field, I gained an understanding of how to process the image data and guide the solution to derive clinically relevant biomarkers. As this project continues to grow, I’m sure it will challenge me to keep learning.
Udo: How is your project breaking new ground?
Courtney: 3D medical image morphometry is the intersection of artificial intelligence and clinical science. While this combination itself isn’t necessarily a rarity, bringing computer vision-assisted decisions directly to the clinic is a novelty in the field. We’re working to break down the barriers between the true adoption of AI within clinical and operational workflows. This project is on its way to help millions of patients across the globe.
Udo: What prompted you to delve into this particular area?
Davood: When I was a grad student, my focus was optimization algorithms for modern big data problems in supervised and unsupervised learning. In fact, most of my job was the theoretical part of the proposed methods. That is such things as convergence guarantees and rate of convergence. But, when I joined Duke university for a postdoc, I decided to switch gears. I became interested in a new branch of machine learning called Reinforcement Learning (RL).
Unlike supervised learning, the learning agent does not have a training data set as a supervisor. Instead, it is responsible to learn on the fly, based on its own experience and through trial and error. During training, the learning agent runs actions across a state space and improves its decision process through a reward structure. This idea was very interesting to me since it is similar to how humans learn. So it makes more sense when used as an Artificial Intelligence technique. At that time, I found that this area was really is in its infancy, and there is a lot of room to investigate. I decided to utilize my previous expertise and experience to explore the RL area because of this.
Udo:What kind of obstacles did you have to overcome?
Davood: As I said, RL was in its infancy at that time, and in fact it still is. So, various amount of challenges were in the road from both theory and application perspectives. To be a bit more specific, let's talk about our recent product in SAS called Deep Q-Network (DQN) algorithm.
This algorithm actually combines Deep Learning with the classical Q-learning method. One of the main components of online RL algorithms such as DQN is the environment. Typically, this is the set of states that are influenced by the actions taken by the agent. One of the first challenges we faced here in SAS was how to connect the RL agent to the environment since the environment is usually provided by the customer. Notice that the RL learning agent needs a lot of interaction with the environment for learning, so this communication needs to be as quick as possible in order to have an efficient algorithm.
After a thorough search by the team in this regard, we found that gRPC, which I think stands for Google Remote Procedure Calls. This communication protocol is an open-source and general-purpose framework. So, we utilized it to communicate with a remote environment provided by the customer. Besides, to accelerate DQN, we implemented a parallel version of DQN. The parallel implementation of the DQN algorithm, in a multi-worker environment, relies heavily on the Cloud Analytic Server (CAS) architecture. This high-performance distributed computing environment lies at the heart of our analytics and machine learning, allowing customers to use our Reinforcement Learning methods in seamless integration with the rest of SAS Visual Data Mining and Machine Learning.
Udo: Why would this be considered innovative in your area?
Davood: With growing the industry, the problem sizes in this area are growing too. This means we need to have a generalizable model, which is able to handle any unseen scenario. This is actually what RL is doing. Also, the real world is actually dynamic, which means that current decisions made by your model have an impact on the world around it. So, we need to design an adaptive system, which is again another property of RL, because RL models do not need too much previous data or knowledge for making efficient decisions.
Did you miss part one of our celebration of World Creativity & Innovation Week?
We kicked off our celebration of World Creativity & Innovation Week with Udo Sglavo talking to Amy Shi, Maggie Du, and Phil Helmkamp about their innovations. If you missed it, check out part one of Celebrate World Creativity & Innovation Week with Analytics R&D.LEARN MORE | SAS Viya LEARN MORE | World Creativity & Innovation Week