SAS Users
Providing technical tips and support information, written for and by SAS users.
Which character variables have the highest frequency count? You can easily determine this using a variety of procedures that calculate frequency count. For example, the FREQ Procedure or the MEANS Procedure. This blog post illustrates this process through two examples.
See how to sample unstructured (text) data using SAS Viya and CAS actions. This post includes complete code to cluster the text documents via k-means, and treats the cluster memberships as strata for analysis.
SAS batch jobs can generate many log files that accumulate over time. In this post, we present a SAS program that cleans up old log files on your system.
Word Mover's Distance (WMD) is a distance metric used to measure the dissimilarity between two documents, and its application in text analytics was introduced by a research group from Washington University in 2015. The group's paper, From Word Embeddings To Document Distances, was published on the 32nd International Conference on Machine
Did you know that 80 percent of an analytics life cycle is time spent on data preparation? For many SAS users and administrators, data preparation is what you live and breathe day in and day out. Your analysis is only as good as your data, and that's why we wanted
SAS Visual Text Analytics provides dictionary-based and non-domain-specific tokenization functionality for Chinese documents, however sometimes you still want to get N-gram tokens. This can be especially helpful when the documents are domain-specific and most of the tokens are not included into the SAS-provided Chinese dictionary. What is an N-gram? An