최근 화두가 되는 빅데이터와 머신 러닝은 예측 모델의 성능을 올리기 위한 방안으로 시작된 것입니다. SAS VDMML(Visual Data Mining and Machine Learning)은 예측 모델 개발 시 텍스트 데이터를 이용하여 모델의 성능을 높여주는 텍스트 분석 툴로, 비즈니스 사용자와 데이터 사이언티스트, 예측 모델 개발자 모두가 활용할 수 있습니다. 텍스트 분석은 자연어 처리 과정이
Tag: text mining
SAS' Leonid Batkhan shows you how to delete a substring from a character string - one of the common character data manipulation tasks.
SAS' Leonid Batkhan demonstrates a common character data manipulation task of inserting a substring into a character string.
Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and
Maybe you’ve heard of text analytics (or natural language processing) as a way to analyze consumer sentiment. Businesses often use these techniques to analyze customer complaints or comments on social media, to identify when a response is needed. But text analytics has far more to offer than examining posts on
The role of analytics in combating terrorism Earlier this spring, I found myself walking through a quiet and peaceful grove of spruce trees south of the small hamlet of Foy outside of Bastogne, Belgium. On travel in Europe, I happened to have some extra time before heading to London. I
Mal ehrlich, wenn ich Sie fragen würde, worüber die Kandidaten im diesjährigen US-Wahlkampf in ihren Aufeinandertreffen debattiert haben – welche Kernthemen würden Sie mir spontan (abseits von Skandalen und Affären) nennen? Und könnten Sie diese Kernthemen den einzelnen Kandidaten zuordnen? Als ich mir diese Frage stellte, war die Antwort –
Anyone know what's the number two form of economic crime, in terms of losses? Believe it or not, it's procurement fraud. I grew up in a small town south of “Big D” and in my neck of the woods having two first names is, well, normal. So, when Will Farrell’s character,
This is the third of the seven parts of blog post series “A practical guide to tackle auto insurance fraud”. In the first two articles of the series we drilled down to Data Management and Data Quality as the basis for insurance fraud detection analytics and also to the Business
A huge proportion of big data is unstructured text (such as client interactions, product reviews, call center logs, emails, blogs and tweets). Organizations starting to invest in advanced analytics often overlook the value text analytics could add to the process. But when data scientists or analysts get to work exploring
When a person feels sufficiently wronged to lodge a complaint with the Consumer Financial Protection Bureau (CFPB), there’s likely to be some negative sentiment involved. But is there a connection between the language they use and the likelihood they will be compensated by the offending company? At the upcoming Sentiment
How many of you have read The Cuckoo’s Calling by the previously unknown author Robert Galbraith? The answer is not many, until it came out that Robert Galbraith was none other than blockbuster best-selling author JK Rowling. Sales then skyrocketed. Rowling recently published some of the rejection letters she received as
Analyzing text is like a treasure hunt. It is hard to tell what you will end up with before you start digging and the things you find out can be quite unique, invaluable and in many cases full of surprises. It requires a good blend of instruments like business knowledge,
In my last post, I talked about why SAS utilizes a rotated Singular Value Decomposition (SVD) approach for topic generation, rather than using Latent Dirichlet Allocation (LDA). I noted that LDA has undergone a variety of improvements in the last seven years since SAS opted to use the SVD method. So, the
The benefits of big data often depend on taming unstructured data. However, in international contexts, customer comments, employee notes, external websites, and the social media labyrinth are not exclusively written in English, or any single language for that matter. The Tower of Babel lives and it is in your unstructured
When I ask people what they know about Denmark they often mention Hans Christian Andersen. He was born in Denmark in 1805 and is one of the most adored children’s authors of all time. Many of his fairy tales are known worldwide as they have been translated into more than
My Mum could have been a doctor – most can’t read her handwriting. It’s only because I’ve been trained to read it, I can. The analysis of unstructured data is similar. Text analysts can be quickly overwhelmed to learn that you have to manually develop a training corpus. Reading a
It is always important to continue to sell the value of analytics within your organization, especially to your leaders. Usually, these type of results are delivered via reports, dashboards, or emails. However did you know that analytics: Detects when expensive machinery like electrical submersible pumps (ESP) or oil platforms need maintenance before
According to research, less than half of an organization's data is structured data; nearly 80 percent is unstructured data that may come from social media, customer letters, web pages, invoices and freeform survey answers. Getting the information you need from that data can be a quick and automated experience or
It is becoming more and more apparent that social media is a gold mine of unstructured data that is just waiting to be analysed so that the nuggets can be extracted. At SAS Global Forum, I was particularly impressed with the diversified use of sentiment analysis and the exploration that
An event is fast approaching that is the highlight of the year for many members of the SAS community. I am, of course, referring to SAS Global Forum 2012, which this year will be hosted in the Walt Disney World Swan and Dolphin Resort in Orlando, Florida. I am particularly