With contributions from David Ogden & Barry deVille
Adding Text Analytics to your predictive modeling has proven to result in better predictive models, including those related to customer churn. Once text data (e.g. call center comments, website feedback forms, blogs, emails, twitter) is converted to structured variable(s) and included in your data mining data set, you can expect greater model lift, and richer insight regarding the underlying causes of why a customer might leave. If there is more than one comment from a customer (let’s say multipleTwitter comments), customer id is no longer unique, because there are repeated rows of input where only the text field is different.
Some pre-processing of the data is needed to address multiple text entries for a single person. Three options are described below:
- Create two data sets. One would have a single row for each customer with all variables except the text field. The other would have one row for each comment; with a variable for the comment + the target variable (churn) + the customer id in each row.
- Apply the following SAS Text Miner nodes to the second data set: Input Data Source -> Text Parsing -> Text Filter -> Text Topic (making any changes you wish to the default settings), and run. The Text Topic node is creating topics using rotated SVDs (singular value decompositions) from the text data.
- Use a SAS code node after the Text Topic node, to apply the SAS Summary procedure to the results, which will create a single observation for each customer id that includes the summary information for each raw topic variable (done using the max or some other quantile such as median or 75% quantile, for the structured text data).
- Then simply merge the results with the first data set by customer id
- Similar to Option 1 – creating the two data sets
- Applying the same flow, but instead of using the Text Topic node (wherein the text may belong to more than one topic), you can use the Text Cluster node (wherein the text has exclusive membership to a single cluster).
- In the Text Filter node step, it is suggested that you use the Mutual Information term weight property, so that the term weights are based on correlations with the target, churn, variable.
- You could merge your target variable onto the second dataset (the one with multiple records per individual), generating the SVDs based on the Mutual Information weighted terms, and then summarizing the SVD values by individual (perhaps using the max).
- You could also “solve” this issue by concatenating the text, a simple operation in SAS. By putting all the comments associated with a single individual into one long string, a single observation would then represent all the twitter comments from that individual.
These options have unique benefits – but under different data conditions. Options 3 is the simplest solution. However, it has the drawback that you may lose the sense for how comments over time impact likelihood of future churn, and it will not represent the documents as the first two methods, if you often have multiple comments per user. So, if the cardinality is small, i.e. when your average customer only has one text field, and only occasionally you see two or three per person, then this may be the best solution.
However, when the average number of fields for each customer is greater, you retain more information by using the SVD, so either Option 1 or Option 2 could work best. With Option 2, however, interpreting the results of your modeling is more difficult since the SVD dimensions would not have a clear interpretation. And with Option 1, if you have the churn target defined prior to running the Text Topic node, Mutual Information weighting is automatically applied – and a rotated SVD is calculated, making interpretability easier.
Twitter specific data? Well, you also need to keep in mind that with the associated 140 character limit, there is likely a trade-off between power, precision and level of aggregation (at least in many languages). It is very common to roll up tweets over a given time period or author to create an “author-aggregate” record. You may want to try various aggregations: message, author, day, week, and so on.
And there are many, many cues in twitter that have to be extracted from emoticons and “net-speak”. Irony and sarcasm are likely to be good triggers for a churn event as in “Great service!! NOT!!!” As you can see, even upper/lower case and punctuation are important. In addition to the text component of tweets we also have the social network component -- @, RT and so on. This enables you to construct a social network and this, in turn, can be used as a predictor of churn.
In the current era, it is essential to use network metrics – if you can – to do state of the art churn analysis.
For more information on Twitter analysis, read the paper, “How You Can Identify Influencers in SAS® Social Media Analysis.” And for other data processing considerations in Social Media, take a look at this whitepaper, “Sifting Through the Noise of Social Media”.
Are these ideas helpful to you? Please comment on your challenges and successes analyzing text comments for churn propensity.