Blogs

Tag: OCR

Programming Tips

Daria RostovtsevaAugust 16, 2021 0

Classifying messy documents: A common-sense approach (Part II)

In Part I of this blog post, I provided an overview of the approach my team and I took tackling the problem of classifying diverse, messy documents at scale. I shared the details of how we chose to preprocess the data and how we created features from documents of interest

English

Advanced Analytics | Analytics | Machine Learning

Daria RostovtsevaAugust 4, 2021 0

Classifying messy documents: A common-sense approach (Part I)

Unstructured text data is ubiquitous in both business and government and extracting value from it at scale is a common challenge. Organizations that have been around for a while often have vast paper archives. Digitizing these archives does not necessarily make them usable for search and analysis, since documents are

English