The SAS Data Science Blog
Advanced analytics from SAS data scientists![Find duplicates and near-duplicates in a corpus with Natural Language Processing](https://blogs.sas.com/content/subconsciousmusings/files/2022/09/637278988-702x336.jpg)
To find exact duplicates, matching all string pairs is the simplest approach, but it is not a very efficient or sufficient technique. Using the MD5 or SHA-1 hash algorithms can get us a correct outcome with a faster speed, yet near-duplicates would still not be on the radar. Text similarity is useful for finding files that look alike. There are various approaches to this and each of them has its own way to define documents that are considered duplicates. Furthermore, the definition of duplicate documents has implications for the type of processing and the results produced. Below are some of the options. Using SAS Visual Text Analytics, you can customize and accomplish this task during your corpus analysis journey either with Python SWAT package or with PROC SQL in SAS.
![How advancements in automated computer vision are keeping us safe](https://blogs.sas.com/content/subconsciousmusings/files/2022/09/logo-detection-object-score-702x336.png)
Using SAS Viya in combination with open-source capabilities, we were able to develop an automated solution for logo detection that does not require any manual data labeling.
![3 data scientist jobs and how to land them](https://blogs.sas.com/content/subconsciousmusings/files/2022/09/Three-data-scientist-jobs-and-how-to-land-them-702x336.jpg)
The question to ask is no longer, “Do you want to be a data scientist?” But rather, “What kind of data scientist do you want to be?”
![Intelligent Decisioning: Ensuring fairness in analytically-driven decision making](https://blogs.sas.com/content/subconsciousmusings/files/2022/09/SAS-Explore-Header-702x336.jpg)
Attend this session during the SAS Explore event on Sept 27-29 or view the recording at your convenience. We will showcase the use of SAS Intelligent Decisioning, SAS Model Manager, and SAS Visual Analytics on the SAS Viya platform for a solution that helps mitigate inequitable credit decisions.
![Using PROC DEEPCAUSAL to optimize revenue through policy evaluation](https://blogs.sas.com/content/subconsciousmusings/files/2022/09/sept-walton-banner-c-702x336.jpg)
SAS' Gunce Walton introduces to you a new scoring capability, how it utilizes Deep Neural Networks (DNNs) and shares use cases with PROC DEEPCAUSAL.
![Analyzing demographics and patterns-of-life using SAS Visual Analytics](https://blogs.sas.com/content/subconsciousmusings/files/2022/08/fig-c1-q1-2-702x336.png)
The IEEE Visual Analytics Science and Technology (VAST) Challenge provides a great opportunity to validate our software against real-world scenarios using complex data sets. Not only do we learn from these projects, but we also send feedback to our development teams to further improve product capabilities for customers.