Imagine trying to dig a useful bit of information out of 50,000 lines of a chat log? Now, imagine if that needle in the haystack was the difference in a criminal being arrested or staying at-large? Thousands of lines of confusing and unreadable chat text are more and more frequently
Tag: Text Analytics
Using such features and Natural Language Processing capabilities like text parsing and information extraction in SAS Visual Text Analytics (VTA) helps us uncover emerging trends and unlock the value of unstructured text data.
To find exact duplicates, matching all string pairs is the simplest approach, but it is not a very efficient or sufficient technique. Using the MD5 or SHA-1 hash algorithms can get us a correct outcome with a faster speed, yet near-duplicates would still not be on the radar. Text similarity is useful for finding files that look alike. There are various approaches to this and each of them has its own way to define documents that are considered duplicates. Furthermore, the definition of duplicate documents has implications for the type of processing and the results produced. Below are some of the options. Using SAS Visual Text Analytics, you can customize and accomplish this task during your corpus analysis journey either with Python SWAT package or with PROC SQL in SAS.
Corpus analysis is a technique widely used by data scientists because it provides an understanding of a document collection and provides insights into the text.
It is increasingly possible to use text analytics to explore different types of data. When a news story this summer caught my eye, I decided to see if I could use SAS Visual Text Analytics (VTA) and SAS Visual Analytics (VA) on customer complaints to provide information that might be
Les institutions gouvernementales que ce soit pour la défense, les transports, les services publics, la sécurité, ou les soins de santé ont un défi et une opportunité à traiter : donner un sens à d'énormes volumes de textes non structurés qui ne font que croître. Plus de 80 % de
The Text Investigation Framework utilizes several technologies built on SAS Viya, including SAS Visual Text Analytics, SAS Visual Data Mining and Machine Learning, and SAS Visual Investigator. SAS Visual Investigator acts as the orchestrator to surface the results. With its broad set of capabilities, SAS Visual Investigator can perform scenario authoring, alert generation and disposition, and comprehensive workflow to gather vital outcomes and feedback.
I think that this pandemic has put digital transformation at the top of every executive agenda.
Critics of sports analytics (and there are some entertaining ones) love to point out that analytics isn’t capable of capturing the things that don’t show up on a box score. A player who dives on the floor to save a loose ball, a quarterback strategically misleading a defender to free
At the end of March, the German government sponsored a hackathon called #WirVsVirus. The aim was to bring Germany’s collective coding expertise to bear on some of the many problems surrounding COVID-19. In total, more than 27,000 coders joined the challenge, working from home, and programming for 48 hours from
A major UK insurance company used text analytics to categorise complaints.
Analyzing tweets is challenging because of their succinctness (max 280 characters). However, that task is facilitated by the powerful features of SAS Visual Text Analytics (VTA), which includes embedded machine learning algorithms.
Generating a word cloud (also known as a tag cloud) is a good way to mine internet text. Word (or tag) clouds visually represent the occurrence of keywords found in internet data such as Twitter feeds.
If you consume NBA content through social media, then you know just how active that online community is. Basketball arguments and ‘hot takes’ on the Internet are about as commonplace as Michael Jordan playing golf instead of running a functional NBA front office. I wondered if NBA fans happened to
Et si, en dehors de la nouvelle organisation des moyens de production, la 4ème révolution industrielle induisait également une évolution significative dans la gestion de la connaissance intrinsèque à chaque domaine ? Et si les nouvelles technologies numériques permettaient aux acteurs opérationnels d’accéder simplement à cette connaissance, le plus souvent fruit de méthodes
Natural language understanding (NLU) is a subfield of natural language processing (NLP) that enables machine reading comprehension. While both understand human language, NLU goes beyond the structural understanding of language to interpret intent, resolve context and word ambiguity, and even generate human language on its own. NLU is designed for
Recently, the North Carolina Human Trafficking Commission hosted a regional symposium to help strengthen North Carolina’s multidisciplinary response to human trafficking. One of the speakers shared an anecdote from a busy young woman with kids. She had returned home from work and was preparing for dinner; her young son wanted
Structuring a highly unstructured data source Human language is astoundingly complex and diverse. We express ourselves in infinite ways. It can be very difficult to model and extract meaning from both written and spoken language. Usually the most meaningful analysis uses a number of techniques. While supervised and unsupervised learning,
The Special Olympics is part of the inclusion movement for people with intellectual disabilities. The organisation provides year-round sports training and competitions for adults and children with intellectual disabilities. In March 2019 the Special Olympics World Games will be held in Abu Dhabi, United Arab Emirates. SAS is an official
There is tremendous value buried text sources such as call center and chat dialogues, survey comments, product reviews, technical notes, legal contracts... How can we extract the signal we want amidst all the noise?
Amidst the growing popularity of modern machine learning and deep learning techniques, one of the biggest challenges is the ability to obtain large amounts of training data suitable for your use case. This post discusses how the analytical approach for Named Entity Recognition (NER) can help.
Word Mover's Distance (WMD) is a distance metric used to measure the dissimilarity between two documents, and its application in text analytics was introduced by a research group from Washington University in 2015. The group's paper, From Word Embeddings To Document Distances, was published on the 32nd International Conference on Machine
My local middle school publishes a weekly paper. Very recently, I noted an article in that paper regarding an expose on human trafficking overseas, "World Slavery: The Terrors Our World Tries to Forget." The eloquent article in part highlighted how children have been exploited in the fishing industry in Ghana
Maybe you’ve heard of text analytics (or natural language processing) as a way to analyze consumer sentiment. Businesses often use these techniques to analyze customer complaints or comments on social media, to identify when a response is needed. But text analytics has far more to offer than examining posts on
As a former intelligence analyst, I can't help but breathe a huge sigh of frustration. The special AI "task forces" and their massive budgets are great, but it's time to get honest about the rest of the military. Ask any every day soldier, sailor, airman or Marine their opinion of