Data Mining and Text Analytics: Segmenting Textual Data for Automobile Insurance Claims


I feel the experience of participating at SAS Global Forum year after year is like relishing a warm brownie! It’s the same treat but a unique experience every time. Subtle changes in the ingredients can lead to a unique and enhanced flavor. The team behind the scenes works hard to ensure that SAS users experience a new flavor each time they attend the conference. My goal through this series is to give you a taste of the most delicious in the form of paper winners from the previous year, along with a sampling of what’s coming this year through our new sections and their descriptions.

If this is the first time you are reading my series, welcome! You can catch up on the previous posts and read the paper winners’ stories on Snapshot of the Best Papers of 2010. The spotlight this week shines on David Dobson with his winning presentation Segmenting Textual Data for Automobile Insurance Claims in the Data Mining and Predictive Modeling section. In the spirit of adjusting the ingredients and enhancing the flavor, the section name has been modified to Data Mining and Text Analytics. Tyler Smith (DoD Center for Deployment Health Research) heads this section this year. Keeping with the changing times, the focus this year is to combine statistical, economic and forecasting techniques to use a multitude of SAS tools to solve problems in today's economic environment. Special consideration will be given to those presentations using real-world data, both structured and unstructured.

I reached out to David Dobson to discover the recipe of his success. Dobson is the President and CEO at Dobson Analytics. Here’s what he had to say:

VI: Why did you pick this topic: Segmenting Textual Data for Automobile Insurance Claims? Was there a particular business problem you were trying to solve?

DD: Prior to coming to NC State University as a student in the MS in Analytics (MSA) program, I worked with a large auto insurance company in Canada as a statistical research advisor. In this role, I had an opportunity to analyze volume of structured data, including customer segmentation and profiling. Later, as a MSA student, I learned to analyze unstructured data. Given my prior experience in the auto insurance business, I became interested in analyzing the textual data of insurance claims. This is how I picked my topic.

My primary objective was to perform data reduction and profiling of the text data. There is an online auto insurance claims forum (a free question-and-answer service) where customers seek expert advice regarding their claims after an auto accident. I wanted to analyze and classify posted questions based on the nature of the claims.

VI: Is there anything you’d like to share that’s not in your paper?

DD: The paper explains clustering and profiling clusters using structured data, but there is also a possibility of doing predictive modeling using logistic regression.

VI: How did you prepare for the presentation? Do you have any tips and advice for future presenters?

DD: For my presentation, I talked mostly about the data collection experience and results analysis. I personally prefer not to have too many slides, or slides filled with primarily SAS codes.

VI: What was your experience presenting at SAS Global Forum 2010?

DD: This was the first time I presented a paper for SAS Global Forum, and I thoroughly enjoyed it. The audience and session chair were very encouraging, which is always helpful. Having active participation from the audience made me feel both comfortable and welcome as they showed interest in my topic.

VI: What kinds of feedback and comments did you receive after your paper presentation? Did you submit a paper this year?

DD: After the presentation, people came by and said that my paper was interesting and that they enjoyed my presentation. I received several words of encouragement. My experience at SAS Global Forum 2010 was a very positive one. I will be presenting once again at SAS Global Forum 2011 (Understanding Drivers of Achieving Academic Success in University Students) in Las Vegas. Hope to see you there!

I wish Dobson good luck presenting again this year! Apart from the opportunity to meet him in person at Las Vegas, you can also read other papers written by Dobson on and check out the other paper winners of 2010. If you are the organized kind, you can keep receiving updates about this series with a click of a button and subscribe. To subscribe, click on the orange Snapshot of the Best Papers of 2010 XML button in the right side of the page, or paste this URL into your browser ( So are any of you presenting under the Data Mining and Text Analytics section this year? What has been your preparation strategy? Share your tips and advice with others on this post.


About Author

Viji Iyer


  1. The SAS Global Forum 2011 conference will be my 3rd year attending. 2008 was my first SAS Global Forum, which took place in San Antonio, Texas. I found each conference provided a unique experience. The conference chair and committee do an amazing job of making each conference interesting and meaningful. I look forward to participating at this year's conference at Caesars Palace, which is a great venue and has wonderful entertainment. Regarding my paper presentation on April 6, 2011, I am dividing my talk into three segments: research methods, statistical techniques and SAS applications used in my research. I will try to keep it interesting, so hopefully no one falls asleep!

  2. Thanks for your kind words David! Glad that you've been enjoying reading this series. Your paper this year regarding the drivers of academic success in university students sounds fascinating and I'm sure will draw up a crowd. Good luck presenting again this year!
    So is this your 2nd year attending SAS Global Forum? Having a sense for what to expect at the conference, are you preparing any differently for your presentation this year? - Viji

  3. David Dobson on

    Viji-- you have been doing a great job writing the blog for SAS Global Forum. Keep up the good work! I read your blog to learn about the various award winning SAS presenters and their papers. I am excited to present my paper once again at SAS Global Forum, in the Data Mining and Text Analytics section. I recently conducted a study to understand the drivers of academic success in university students. In my presentation, I will share the various statistical techniques I used in data mining, as well as the text mining application I used to analyze the open-end questions in this study. Attendees will learn the factors that are positively and negatively associated with student academic performance. This includes the factors which students have control over, and the factors they do not have direct control over. I hope it makes for an interesting presentation! I look forward to meeting you and fellow SAS attendees at SAS Global Forum.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top