From Email Overload to Efficiency: A Transformer-Based LLM Solution for SAS Tech Support

SAS Tech Support recently developed an AI-driven email classification system using SAS Viya's textClassifier, paving the way for a more efficient future in customer communication. Rigorous testing achieved very high validation accuracy in distinguishing between legitimate customer queries, spam and misdirected emails. Key achievements during the development phase include efficient processing, near-perfect identification of legitimate customer emails (<0.2% misclassification), remarkably fast model training using GPU acceleration and successful validation on data from the ServiceNow/CSM platform. This model is poised to significantly improve email handling efficiency upon deployment.

Introduction

At SAS Tech Support, efficient handling of customer communication is paramount. However, we face an overwhelming influx of emails—many spam or misdirected to Scandinavian Airlines System (SAS)—which diverts our agents from addressing genuine customer concerns. To address this challenge, we developed an AI-driven email classification system using SAS Viya's advanced textClassifier. A key objective of this work is to develop a model that can accurately categorize incoming emails into three groups: legitimate customer inquiries, spam and misdirected emails intended for Scandinavian Airlines System. This will enable us to flag the misdirected 'SAS airlines' emails and the spam emails so agents can perform the appropriate actions. This case study details the development of this transformer-based spam detection model.

Data Privacy and Security

Given the sensitivity of customer data and the increasing importance of responsible AI (as highlighted in Exploring generative AI's impact on ethics, data privacy and collaboration), data security was a primary concern. SAS Viya, deployed in a secure Azure cloud environment, provided the necessary protection while offering the scalability to process our extensive dataset (104,000+ emails) while complying with regulations like GDPR.

Data Collection and Preparation

The dataset comprised 18 months of Sirius v2 customer tracks, spanning from 2022 to July 2023. Content was extracted exclusively from the initial incoming email of each track, omitting values in the From and To fields. Email topics were divided into three categories: 'SAS airlines,' 'other' and 'spam.' With over 104,000 documents, training a large language model required careful planning. To balance costs and effectiveness, two samples were created—one with 20% and another with 30% of the original data—allowing for a comparison of model performance. Stratified sampling ensured the training data remained representative, particularly for the rare 'Scandinavian Airlines System (SAS)' cases.

Legitimate emails are significantly better represented in the dataset

Methodology

Several text classification models were considered: a BERT-based approach using SAS Viya's textClassifier action, a SAS BOOLRULE classifier, and a topic modeling approach combined with machine learning. While BOOLRULE offered interpretability, its rule-based nature lacked the contextual understanding needed for the nuances of our email data. Topic modeling, though powerful, proved less efficient and scalable for our large dataset (104,000+ emails) due to the iterative nature of topic discovery and computationally intensive text parsing.

The BERT-based textClassifier model (Devlin et al., 2019) was ultimately selected for its superior contextual understanding, adaptability to large datasets, and efficiency. Its transformer-based architecture provided high-quality classification with minimal preprocessing and manual effort, making it the most suitable choice for this project.

Model Training

Leveraging the computational power of an NVIDIA A100 GPU proved crucial for efficient model training. The model, trained on a 30% subset of the data, achieved remarkable speed, completing in approximately 42 minutes—a significant improvement over initial estimations. Expecting a lengthy training process after work, I decided to take a walk, only to return and find that the model had already finished training! While this unexpected speed is positive, it underscores the need for proactive model saving, a critical lesson learned during a system shutdown that resulted in data loss and required retraining. Importantly, the model performed well without requiring traditional text preprocessing steps (such as stemming or stop word removal), highlighting the robustness of the transformer architecture. The iteration history and a detailed visualization of the training progress are provided in Appendix A. For a deeper dive into training considerations and memory management for the trainTextClassifier action, refer to the SAS Viya documentation. Training costs were also closely tracked throughout the process, allowing us to maintain cost-effectiveness while achieving the desired performance.

Model Evaluation and Results

The model's performance exceeded expectations, particularly considering the single-shot training without hyperparameter tuning. While the models achieved overall misclassification rates of approximately 3.38% and 3.43% for the 30% and 20% data subsets respectively, the key performance indicator (KPI) was minimizing the misclassification of legitimate customer emails ("other") as either 'SAS airlines' or 'spam,' a crucial factor in maintaining efficient customer service. The misclassification rates for the "'other' category were exceptionally low: less than 0.2% for "other" misclassified as 'SAS airlines' and around 1.5% for "other" misclassified as 'spam,' for both models. (See Appendix B for the detailed model evaluation metrics.)

Addressing Data Quality Issues and Model Robustness

While our model achieved low misclassification rates on the holdout data, we identified instances of mislabeling during our detailed analysis. For example, as shown in the accompanying image, some emails from the holdout data initially labeled as 'other' but containing references to flights or booking changes were correctly classified by the model as 'SAS airlines'. This raises the possibility that the original training data may have contained similar labeling errors and that with perfect training labels, the misclassification rates could have been lower. This highlights both the model's ability to overcome these data quality limitations and the potential for improving model performance through better data labeling.

Evidence of mislabeled entries in the original dataset

Further assessment using data from a new source (the ServiceNow/CSM platform) reinforced these findings, confirming the model's high accuracy and ability to identify misclassified data, even with CPU processing.

Conclusion and Future Work

This project marked an important first step in developing a robust and scalable spam detection system for SAS Tech Support. By leveraging SAS Viya's BERT-based textClassifier action, we were able to efficiently process a large dataset while maintaining a high level of accuracy. Crucially, this system was developed while prioritizing data privacy and security, using SAS Viya deployed in a secure Azure environment. The model's ability to handle large datasets efficiently while achieving exceptionally low misclassification rates for legitimate customer emails ("other") has created clear potential for improving support operations.

Future efforts will focus on continuously improving model performance through regular data updates and user feedback. We are also exploring the potential of saving the trained model as an Astore, which would streamline its deployment to environments such as SAS Micro Analytic Service (MAS) or SAS Container Runtime (SCR), after registration in SAS Model Manager.

We are committed to sharing future updates as we move closer to full deployment and integration of this powerful classification solution.

Learn More

Appendix A: Model Training Log and Performance

This appendix summarizes the key metrics obtained during the training of the BERT-based text classification model. The training was conducted using an NVIDIA A100 GPU, completing in approximately 42 minutes in real time on a 30% subset of the data. The chart below visualizes the training progress:

Key Observations:

Rapid Convergence: The model achieved high validation accuracy (96%) within the first two epochs, demonstrating efficient learning, with an optimum performance observed at epoch 2.
Minimal Preprocessing: High accuracy was obtained without traditional text preprocessing steps, showcasing the robustness of the transformer architecture.
Early Stopping Potential: The marginal improvement in validation accuracy beyond epoch 2, coupled with the fact that validation loss reached a minimum at epoch 1 and increased after that, indicates that training beyond the second epoch did not yield any significant performance benefits. It is important to note that the trainTextClassifier action, as of the writing of this post, does not support the direct use of early stopping or step-based training. Should trainTextClassifier implement step-based training in the future, it would be wise to train for a very few epochs at a time, assessing the performance benefits against the computational costs of training more epochs.

Appendix B: Key Model Evaluation Metrics

This appendix summarizes key evaluation metrics for the models trained on 20% and 30% of the data. The primary focus is on the misclassification rates of legitimate customer emails ('other') as either 'SAS airlines' or 'spam'.

Blogs

Blogs

From Email Overload to Efficiency: A Transformer-Based LLM Solution for SAS Tech Support

Introduction

Data Privacy and Security

Data Collection and Preparation

Methodology

Model Training

Model Evaluation and Results

Addressing Data Quality Issues and Model Robustness

Conclusion and Future Work

Learn More

Appendix A: Model Training Log and Performance

Appendix B: Key Model Evaluation Metrics

About Author

Related Posts

Leave A Reply Cancel Reply

Follow Us

What is...

Blogs

Introduction

Data Privacy and Security

Data Collection and Preparation

Methodology

Model Training

Model Evaluation and Results

Addressing Data Quality Issues and Model Robustness

Conclusion and Future Work

Learn More

Appendix A: Model Training Log and Performance

Appendix B: Key Model Evaluation Metrics

About Author

Related Posts

From slopes to stats: Building a snowboarding performance dashboard with python and my own sensor data

Ready-made AI models: The next industrial revolution in manufacturing

Saving sea turtles with AI: The volunteers behind the mission

Leave A Reply Cancel Reply

Follow Us

What is...