The Note from Udo Sglavo: The World Creativity & Innovation Week 2022 is coming to an end. Let's continue our series on recent innovations and collaborations from Analytics R&D. Let me draw your attention to two customer-facing engagements and two continuous-improvements projects. I am pleased to discuss these innovations with Hardi Desai, Sr Associate Machine Learning Developer, Steven Harenberg, Sr Machine Learning Developer, Hamza Ghadyali, Machine Learning Developer, and Billy Dickerson, Principal Software Development Engineer in Test. And by the way, if you would like to join our division working with our outstanding team, we are hiring!
Data for good: using analytics to build safe workplaces
Udo: Can you share some background for your team's project?
Hardi: For the safety of their workers, a leading manufacturing customer wanted to implement a resilient and scalable solution for intruder detection in prohibited spaces within their plant. Our SAS team developed a computer vision-based solution using pre-trained ONNX CV models running on SAS Event Stream Processing. This solution sent out automated violation messages when detecting people in prohibited spaces from the live camera feed in their plants. This new solution is built on a Kubernetes cluster with a Kafka Bus set up in the customer's facility.
Udo: What is the advantage of building this solution using Kubernetes?
Hardi: We used Kubernetes because of its support for deploying, scaling, and managing different containers. In this solution, each functionality is a separate Kubernetes pod. Kubernetes aids in resource balancing of containers and clusters and traffic management of multiple services. It also supports the self-healing of individual containers. It restarts containers that fail and replaces containers by terminating those that don't respond to your user-defined health check.
Udo: Did you run into any issues?
Hardi: As a part of the implementation, we ran into technical and non-technical issues. We developed the entire solution on the customer's location, so we had to depend on their IT team to debug IT issues. We also had a limited time frame where everyone could meet to discuss progress, debug issues and integration testing with the customer because we were working with cross-division developers from different time zones.
Udo: How is this a resilient and scalable solution?
Hardi: It is a highly scalable solution as we were able to develop, integrate and test five cameras in deployment in two weeks. Also, as each functionality in the solution is independent and in a separate Kubernetes pod, it is easy to add new features. When combined with Kubernetes and Kafka, SAS Event Stream Processing builds a highly resilient solution as we can store the images streaming from cameras for up to three days. This implementation accounts for individual camera failure, and one camera feed down does not affect the solution running in production.
Udo: Would you define this as an innovative solution?
Hardi: Absolutely it is. We built a resilient, scalable and efficient computer vision deployment architecture that supports a complete analytical lifecycle. Before this solution, the customer struggled to use a single GPU to process multiple camera footages. With the help of this new implementation, they were able to score various camera footage (in our case, five cameras) with a single GPU hardware resource. With SAS Event Stream Processing, we have utilized these GPUs for inferencing the deep learning models. We have support from SAS Event Stream Manager for managing the deployment of models on the edge. The fault tolerance in this architecture is achieved by combining Kafka, SAS ESP and Kubernetes.
Taming the data tsunami: using analytics for better data preparation
Udo: Tell me what Entity Resolution does…
Steven: Entity Resolution is an essential data management task. It consists of determining which records refer to the same entity. An entity could be a person, address, organization, and so on. Entity resolution is helpful in use cases such as customer intelligence, fraud and compliance, and criminal investigation. Often multiple, large-scale data sources have various imperfections in these scenarios. Examples would be things such as (possibly intentional) spelling changes, missing data and abbreviations. Entity resolution can provide a consolidated or cleansed view of the data. This improvement enables you to confidently analyze multiple data sets that would otherwise be ambiguous and redundant.
Udo: How did the collaboration work within your Network team?
Steven: Being part of the network team is a great privilege for me, as we have many experienced and talented members. One problem we tackled as a team was looking at ways to split an entity if it pulled in conflicting data that we know should belong in separate entities (for example, conflicting biometric data). We worked together to devise a pipeline that relied on network optimization techniques. The technique cuts an entity into multiple disjoint partitions, based on the record linkage within the entity, such that the conflicting data would belong to separate sections. This pipeline has become a valuable feature inside the rteng action set. It has been continually improved upon to give you more opportunities to tune your entity resolution results.
Udo: Why is this important?
Steven: Performance is always an essential consideration for us. This is because the scale of the data sets in these domains can be large, consisting of billions of rows. We developed the core algorithm of the rteng action set to efficiently distribute computation across multiple workers on an MPP CAS Cluster. We are continually working to improve the performance of this algorithm.
Key differentiators of the rteng action set include:
- its ability to process data directly in its CAS tables without first loading it into memory.
- multi-user add/upsert/delete/query actions.
- rollback any changes when an unexpected error occurs.
Finding the needle in a haystack: using analytics for the development of new drugs
Udo: What made the drug repurposing project stand out from other projects?
Hamza: You may look at this project as solving a "needle-in-a-haystack" problem. We can model complex biological relationships between drugs, diseases, genes, proteins and other biomedical entities. This effort allows us to take the universe of 10,000+ drugs and reduce it to a more manageable list of 20 drug candidates for treating a particular disease. Subject matter experts review candidate drugs, which is how we keep humans involved in this composite AI solution. Our solution makes no automatic decision.
So, our project with a federal organization is an excellent example of subject matter experts and analytics experts at SAS working together to develop a new solution. It was a creative effort in how we leveraged multiple public and private data sets and multiple analytic techniques to develop in SAS a composite AI solution keeping humans in the loop. We provided a custom dashboard built with SAS Visual Analytics to explore the therapeutic candidates proposed by our solution interactively.
Udo: Why is drug repurposing important, and what are the benefits?
Hamza: Drug repurposing refers to taking a drug that is already approved by the FDA for a specific medical need and using it to fill a currently unmet medical need. A study once estimated that the cost of bringing a single drug from lab to market is over $2 billion and takes more than a decade. Prohibitively high costs make it unfeasible for pharma companies to develop cures for rare diseases. It is impractical to wait ten years to treat new illnesses, such as those emerging in a pandemic from a novel virus.
Hence drug repurposing is indispensable. This is because many of the steps involved in designing a drug from scratch are bypassed. The drug has already gone through extensive safety reviews, and a safe manufacturing process has likely already been established. Repurposed drugs still need to go through clinical trials for their new proposed use. However, you save time by skipping the basic science research and preclinical testing since it's already been done. This allows drug repurposing to bring to market treatments more rapidly for rare diseases and emerging diseases.
So, in a nutshell, the benefits of drug repurposing are to shorten time to market, lower cost and reduced risk and discover therapeutics for rare or new diseases.
Udo: Did you face any challenges?
Hamza: One of the most challenging parts is the translation that needs to happen between the subject matter experts (virologists, biologists, and so on) and SAS to match the right technology and modeling technique to the proper use case. We had to work side-by-side on all aspects of the project.
Drinking our own champagne: CI/CD process improvements using analytics
Udo: What is Project "Continuous Champagne," and why is it essential to SAS?
Billy: As SAS continues its DevOps journey, we continuously adapt our pipeline processes to deliver and verify our code changes faster. The earlier we know there's a problem with our software, the faster we can remediate the issue. This means quicker delivery to our customers and market while having high customer satisfaction. We must have an efficient feedback loop to achieve Quality at Speed.
Since data and analytics are core pieces of the DNA of SAS, we have started a project, code-named Project Continuous Champagne. It will focus on the Measure slice of the CI/CD feedback loop.
This project's overall goal is to continuously collect, measure and analyze our CI/CD processes using our own SAS Product Stack to move towards continuous improvement.
Another key takeaway is that we can open feature requests to our development and test teams for possible enhancements and bug fixes by using our software as our customers would. Ultimately, we want to use our software to find solutions to our problems.
Udo: What is the motivation for this project?
Billy: This project is motivating and impactful for several reasons. First, the metrics collected can help drive better decisions to deliver our software to market faster. Second, this project involves automation, where you can develop code to collect and analyze the metrics. Finally, this project also involves data modeling to allow the various metrics to be joined together from multiple sources.
Udo: What are the challenges?
Billy: The challenge of this project is the management of the quantity of data and deciding what metrics to collect that would help shorten the CI/CD feedback loop. The initial charter of this project is focused on the following standard DevOps metrics:
- Change Volume and Velocity
- Quality and Health
- Performance and Stability
- Cost
- Auditing
However, some of these metrics can be challenging to collect without collaboration with other involved parties across the organization. Even though this is challenging, this is a positive because teamwork is at the core of SAS DevOps and the continuous ecosystem.