Using such features and Natural Language Processing capabilities like text parsing and information extraction in SAS Visual Text Analytics (VTA) helps us uncover emerging trends and unlock the value of unstructured text data.
Tag: data scientist
To find exact duplicates, matching all string pairs is the simplest approach, but it is not a very efficient or sufficient technique. Using the MD5 or SHA-1 hash algorithms can get us a correct outcome with a faster speed, yet near-duplicates would still not be on the radar. Text similarity is useful for finding files that look alike. There are various approaches to this and each of them has its own way to define documents that are considered duplicates. Furthermore, the definition of duplicate documents has implications for the type of processing and the results produced. Below are some of the options. Using SAS Visual Text Analytics, you can customize and accomplish this task during your corpus analysis journey either with Python SWAT package or with PROC SQL in SAS.
The question to ask is no longer, “Do you want to be a data scientist?” But rather, “What kind of data scientist do you want to be?”
Robert Blanchard's role as a data scientist at SAS has afforded him the flexibility to live where he wants, in his case, on a beach in San Diego.
The Proc Python procedure, Python code editor & Python code step facilitate low-code analytics calling Python and SAS from a common interface. Data scientists also appreciate the connection to Python & R through the Model Studio Open-source Code node. Older methods of interaction include the swat and sas_kernel packages running on Python clients.
SAS' Ricky Tharrington and Jagruti Kanjia explain two ways bias shows up in model predictions.
With modern advancements in artificial intelligence, we can teach computers to achieve super-human performance in retro videogames.
Corpus analysis is a technique widely used by data scientists because it provides an understanding of a document collection and provides insights into the text.
Generative adversarial networks (GANs) are one of the newer machine learning algorithms that data scientists are tapping into. When I first heard it, I wondered how can networks be adversarial? I envisioned networks with swords drawn going at it. Close… but I can assure you that no networks were harmed in the making of this article.
Decision trees are one of the top machine learning algorithms used by data scientists. Decision trees use supervised learning to classify problems. Even if you are not a data scientist, chances are you can interpret the visual output from a decision tree.