In Part I of this blog post, I provided an overview of the approach my team and I took tackling the problem of classifying diverse, messy documents at scale. I shared the details of how we chose to preprocess the data and how we created features from documents of interest
Unstructured text data is ubiquitous in both business and government and extracting value from it at scale is a common challenge. Organizations that have been around for a while often have vast paper archives. Digitizing these archives does not necessarily make them usable for search and analysis, since documents are
The Text Investigation Framework utilizes several technologies built on SAS Viya, including SAS Visual Text Analytics, SAS Visual Data Mining and Machine Learning, and SAS Visual Investigator. SAS Visual Investigator acts as the orchestrator to surface the results. With its broad set of capabilities, SAS Visual Investigator can perform scenario authoring, alert generation and disposition, and comprehensive workflow to gather vital outcomes and feedback.
올해 1월 IDC에서 발표한 ‘IDC 마켓스케이프: 2019-2020년 전 세계 범용 인공지능 소프트웨어 플랫폼 벤더 평가(IDC MarketScape: Worldwide General-Purpose Artificial Intelligence Software Platforms 2019–2020 Vendor Assessment)’ 보고서에서 SAS가 리더로 선정되었습니다🙌🙌 IDC에서 인공지능(AI) 플랫폼 공급업체들을 평가한 것은 이번이 처음이었는데요. IDC 마켓스케이프 보고서는 수익 및 시장성을 비롯해 기업별 AI 전략 및 기능에 대한
~ This article is co-authored by Biljana Belamaric Wilsey and Teresa Jade, both of whom are linguists in SAS' Text Analytics R&D. When I learned to program in Python, I was reminded that you have to tell the computer everything explicitly; it does not understand the human world of nuance
Double negatives seem to be everywhere, I have noticed them a lot in music recently. Since Pink Floyd sang "We don't need no education", to Rihanna's "I wasn’t looking for nobody when you looked my way". My own favourite song with a double negative is "I can't get no sleep" - Faithless. This
Today’s natural language processing (NLP) systems can do some amazing things, including enabling the transformation of unstructured data into structured numerical and/or categorical data. Why is this important? Because once the key information has been identified or a key pattern modeled, the newly created, structured data can be used in
My Mum could have been a doctor – most can’t read her handwriting. It’s only because I’ve been trained to read it, I can. The analysis of unstructured data is similar. Text analysts can be quickly overwhelmed to learn that you have to manually develop a training corpus. Reading a