See how to sample unstructured (text) data using SAS Viya and CAS actions. This post includes complete code to cluster the text documents via k-means, and treats the cluster memberships as strata for analysis.
See how to sample unstructured (text) data using SAS Viya and CAS actions. This post includes complete code to cluster the text documents via k-means, and treats the cluster memberships as strata for analysis.
Word Mover's Distance (WMD) is a distance metric used to measure the dissimilarity between two documents, and its application in text analytics was introduced by a research group from Washington University in 2015. The group's paper, From Word Embeddings To Document Distances, was published on the 32nd International Conference on Machine
SAS Visual Text Analytics provides dictionary-based and non-domain-specific tokenization functionality for Chinese documents, however sometimes you still want to get N-gram tokens. This can be especially helpful when the documents are domain-specific and most of the tokens are not included into the SAS-provided Chinese dictionary. What is an N-gram? An