In today’s digital world, we are constantly conversing through instant messages, emails and social media. This generates an enormous volume of unstructured data in the form of text, images, audio, video and perhaps more. Government organisations – be it health care, law enforcement, transportation or defence – can all achieve significant benefits from a deeper understanding of this data to improve policies, provide efficient services, prevent fraud and save lives.
Abundance and value of unstructured data
When it comes to analysing unstructured data, text analytics significantly improves our ability to scale the human acts of reading, organisation and quantifying freeform text in meaningful ways. Text analytics is a form of artificial intelligence that relies on a variety of capabilities, including natural language processing (NLP), machine learning (ML) and the use of human-generated linguistic rules.
Hybrid approach to text analytics
It’s a best practice to analyze unstructured text data with a combination of natural language processing, machine learning and human input. NLP draws from a variety of disciplines to bridge the gap between how humans communicate and how machines interpret it. Machine learning helps that analysis scale with ease while human expertise provides guidance for accurate analysis.
Using NLP in a text analytics project typically involves:
- Splitting unstructured text into smaller, comprehensible units (aka tokens).
- Parsing and applying linguistic rules to extract features such as root words, along with variations of the word, sentence boundaries, parts of speech and more.
- Identification of concepts and topics based on out-of-the-box rules. It is common practice to extend these rules and develop a business or industry-specific taxonomy.
The next phase of such projects generally involves embedding the model in business processes to achieve a high degree of automation. This could mean applying content categorisation to accurately classify a new document in an automated manner to enable efficient search capabilities. Or continuously monitoring a social media feed for sentiment analysis and routing relevant messages to a specific department for appropriate follow-up actions.
Brain of text analytics with a human heart
There are times when organisations require not just automation but also the ability to combine the power of analytics with human expertise. This is where regular business users can use the automated output as a complementary aid to their own business judgments.
The relevance of this aspect became evident while I was working with a defence organisation and then a law enforcement agency recently. While responding to Subject Access Requests (SARs) under the provision of General Data Protection Regulation (GDPR), these customers frequently had to sift through large volumes of unstructured data to identify personal data tokens such as name, date and place of birth, address and NINO. Applying text analytics in this scenario to search thousands of documents in an automated manner for context-specific terms appeared to be the perfect answer in the first instance.
Text analytics plus human judgment
But with a bit more probing, it became clear that text analytics on its own was not the complete answer customers were looking for. They were, in fact, looking for a solution that supported their end to end business processes. They needed a solution with an intuitive and flexible user interface where the automated text analytics output (in the form of personal data tokens) is presented to the business users. If the staff identifies any other business-sensitive information in processed documents, then they could suppress/mask it. And they could make automated recommendations with text analytics while creating the redacted output. A fully automated process could expose the risk of sensitive information getting into the wrong hands – as evidenced in this news article.
They also wanted the case management capabilities to be a natural part of the same solution. That way staff could manage and track the progress of DSARs within the same user interface. Furthermore, there were additional assumed expectations for governed data access, performance, scalability and cloud deployment. All of this would give the business the confidence it needed in the robustness of the business process.
There are many such cases where you could consider text analytics a critical part of the solution but not the solution on its own. If you present it as a black-box approach and restrict users from applying their own business judgment, it can lead to adoption issues. You need to combine the brain of text analytics with a human heart. This is the best way to achieve healthy outcomes. To learn more about how SAS is working with the UK government, visit sas.com/uk/gov.