Sharpening the Executive Edge with Text Analysis

~ Contributions from Elena Lytkina Botelho, Stephen Kincaid, Chris Trendler - ghSMART & Beverly Brown, Pamela Prentice and Dan Zaratsian - SAS ~

 

A fascinating talk at the SAS Global Forum Executive Conference focused on text analytics, one of the newer weapons in the arsenal for analytic understanding.

Dr. Goutam Chakraborty, a marketing professor at the Spears School of Business, Oklahoma State University, described how he’s seen text-based insights expand knowledge beyond the numbers. He spoke of previously unknown facts that made companies more responsive and effective.

A practical step-by-step guide to applying SAS Text Analytics

His work shows that most analytical models are pretty good at predicting future scenarios and describing conditions.  However, competitive pressures and dynamic market conditions leave little room for substantial improvement in existing algorithms.  Marginal   gains are possible by tweaking existing algorithms, fine tuning parameters, distributions and the like.

But sweeping advancements are likely when we incorporate new types of information into existing analytic paradigms.  Text data, for example.

In case studies from different industries, Dr. Chakraborty shared how quantitative returns jumped substantially when text analytics were applied to operational data.  For example:

  • Automating SMS text message classification and sentiment scores within mobile logistics applications reduced professional drivers’ response times.
  • Debt collection increased when call agents were armed with new intelligence from call center conversations.

Besides operational improvements, he also gave strategic planning examples. In one case, merging text-based insights with numeric data improved predictive accuracy of future conditions so intervention strategies were more effective. In another, fact-based understanding of reputation (by tracking the impact of controversial statements in social media) led to better social media strategy.

Text analytics extends existing analytic methods, answering questions such as:

  • Why is this happening?
  • What should we say?
  • Who needs to take action?

If the Q&A after Dr. Chakraborty’s talk was any indication, the audience agreed that text analytics could help them make better executive decisions.

Executives’ words reveal their fitness to lead

Did you know that text analytics can also help decipher what makes a good executive in the first place?

ghSMART is the elite consulting firm that helps CEOs lead at full power.  Based on their branded method of assessment (called SmartAssessments), along with their expertise in understanding human behavior, they answer the question: Who should run your business?  And now they, too, are seeing how text analytics can be used to advance insights.

In a collaborative project with SAS, some of the early findings of this study are indeed, quite intriguing. Based on analysis of anonymized transcripts of candidate interviews and SmartAssessment ratings, we’ve found:

  • Lower-rated candidates used the term ‘mentor’ (and its stem variants, mentoring, mentored, etc.)

Initially, the team believed this to be counter-intuitive.  One of the benefits of text analysis is that you can dig deeper into the rationale of a result – and understand what is driving statistically significant numbers.  It turned out that the distinction was really that lower-rated candidates described their mentors or expressed wanting to be a mentor, whereas higher-rated candidates talked of being a mentor to others in the interview.

Frequency of terms associated with lower-rated candidates (left) and higher-rated candidates (right).

  • Candidates with a consulting background were statistically more likely to successfully transition directly into executive-level positions than those without consulting experience.

Here’s a helpful nugget for aspiring executives: Acquire broad experience from working with all aspects of a business (even if in a smaller company) prior to your CEO interview

  • Lower-rated candidates frequently described some form of failure, setback, or disappointment throughout the interview.

Telling the truth is important.  Context is too.  What seems to matter in these preliminary results is how much the candidate focused on describing previous failures in relation to the length of the interview (specifically, the number of words in the full dialogue transcript).

Heat map (top) illustrating the frequency of ‘mentor’ is, on average, correlated with higher-rated candidates. The bar chart (below) shows that a higher mention of failures (on average) is less correlated with higher-rated candidates and more correlated with lower-rated candidates.

Actionable intelligence from analyzing text is helping organizations reduce risk, lower operational cost and inform both tactical and strategic decisions.  And, some exciting new research suggests that text analytics can also help decide who should be at the helm of the organization.  Having the mindset of being a mentor as a CEO, for example, calibrates to being more effective leader than simply being successful in the CEO role.

 SAS and ghSMART continue to sift interviews to see what sets Grade A executives apart from the rest. We look forward to sharing what we’ve learned later this year.

Post a Comment

Predictive Coding - What's so predictive about it?

Recently, SAS announced support for White House efforts in the fight against patent trolls.  As indicated in the announcement, lawsuits filed by patent trolls cost innovators $500 billion in lost wealth from 1990 to 2010[1] - and are growing at an average rate of 22% a year.

Finding the right information, in seas of documents is a challenge for many organization – patent search and litigation are no exception.  Legal organizations are awash in hard drives filled with reports, emails, communications, and alike.

Which brings me to the topic of predictive coding.  In following some of the historic debate regarding the usefulness of this approach to help alleviate some of the burden of manual review, I’ve asked: Why has this been called ‘predictive’ in the first place?  For the legal profession, a field founded in facts – predictive notions might even be downright scary. And ‘predictive’ doesn’t really even describe what this text analytics method does to improve legal searches.

According to Wikipedia a prediction is “a statement about the way things will happen in the future often but not always based on experience or knowledge”.   Well, that’s not what predictive coding does.  This type of analysis uses computer software to analyze documents, with the goal of finding important or highly related content within existing material. There is nothing futuristic about it.

A patent troll

Predictive inference (in statistics) considers extending the characteristics of a sample to an entire population. Well, that’s not what predictive coding does.  A document is examined to determine membership to one or more topics, terms, themes or phrases. A relevance score is defined – which reflects a probability of membership.

In fact, defining the relevance of a document, describing membership to a fact, taxonomy or topic is within the well-established field of categorization. Categorization of content is a descriptive analysis method – putting text/documents into relevant buckets.  Descriptive analysis is different than predictive analysis – the first explains, while the second forecasts or projects.  And probabilities are different from predictions – asking will something happen is different from asking when something might happen. But descriptive coding, while perhaps more accurate, isn’t really very catchy.  Established, alternate naming conventions for this eDiscovery technique, such as ‘technology assisted review’ or ‘computer assisted review’ seem more helpful describing what this is.

I’ve even gone so far as to interview lawyers on this topic. Their conclusion was, that for extremely high volume cases, and as a method of triage for certain types of documents, computer assisted review can be quite helpful.  The goal is to filter out materials that are unrelated to the case at hand.  Ideally, the remaining, potentially relevant materials, are grouped into different topics – providing context. And then an intensive search exercise occurs, to isolate pertinent documents.  Still – nothing futuristic.

So, one may ask ‘How do you predict from text data? Or any kind of document for that matter’?

Prediction from text happens once it’s numerically represented.  Structured in such a way that it retains the essence of the text meaning – but described as numbers (like the presence or absence of a term).  There are very sophisticated ways to do this – and are well defined in the field of text mining.  Once documents are numerically structured, then they are in the format needed for predictive models – to see if the terms, phrases, facts and themes are meaningful to future events.

For example:

Will customers leave in the future based on a dissatisfying experience that they had?

  • Say they’ve called into the 1.800 line and complained, or written emails. First you’d analyze the text to understand the issues. These ‘issues’ (whether they be topics, or concepts, or even linguistic rules) are translated to structured representations (as new variables or taxonomies). In turn, these new elements are used as input to a churn model – which is estimating the probability that they will leave at some time in the future.

When might a car no longer be roadworthy, given its history of repairs, age, use, etc?

  •  Text mining of service notes for that make/model, warranty claims, reported issues, and alike creates structured, numeric variables.  These new insights, along with other numeric information (like mileage), would be inputs into a model to identify the future failing of the vehicle.

When will demand for a product increase?

  • Monitor social media, identify the ‘buzz’ – from crawling external information sources and extracting pertinent commentary. Use these identified elements, along with sales trend data, in a model to forecast when more demand is expected to happen.

… the list goes on…

I enjoyed the recent Law Technology News award winning article: Predictive Coding Is So Yesterday by Joel Henry.  I’d even go a step further, and say that – it really never was (predictive anyway).

Text mining is a well-established discipline – and as many of our customer’s know – is a discovery process. Sound familiar? Based on the data – not humans – documents are classified – with machine learning methods that identify clusters, topics and even create taxonomies, or profile how a term changes over time.

Text mining is, however, only part of the electronic data discovery technology solution described by Joel Henry. Today, text mining can help remove the burden to manually develop training sets, and provides a method for active learning - for machine generated categories to learn from human conditioning.

ESI in Joel Henry’s article stands for ‘electronically standardized information’.  Having documents in electronic form is a requirement for any type of machine learning exercise.

SAS announced commitment to converting 38 years of user documentation and technical papers to electronic form for IP.com, who, in turn, work with the US Patent and Trademark Office (USPTO).  With the documentation in electronic form, IP.com will be able to publish, aggregate and analyze technical documentation, helping USPTO efforts reduce the burden of patent troll litigation.

The future is predicted to be very bright for organizations committed to stemming abusive patent business practices, as well as for those who are making use of advanced analytics to address big data burdens.


[1] Findings from Boston University, School of Law study : http://www.bu.edu/law/news/BessenMeurer_patenttrolls.shtml

Post a Comment

Visualizing Superbowl Tweets with Text Analytics: Post-game Analysis

A few Superbowl tweets:

"Everything worked out well with the Super Bowl in NY/NJ except the game. What a shocker tonight #TBS_SuperBowl"

"Best #superbowl #halftimeshow ever with #brunomars and #redhotchillipeppers . What talent from bruno...now thats a performer."

"Someone turn off the lights to make the #SuperBowl interesting!"

In my previous post, Visualizing Superbowl Tweets with Text Analytics, I discussed the initial trends and insights found within Superbowl-related tweets.

Now that Superbowl XLVIII is officially in the books, I'd like to take the analysis a step further by showing additional visualizations. The graph shown below, Graph 1, shows the relationship between trending data-driven topics and associated hashtags. You'll notice a cluster for security, JC Penney, Esurance, "boring game," and others.

Graph 1: Network Graph - Relationship between Data-Driven Topics and Twitter Hashtags

SAS Text Analytics uses a combination of natural language processing and statistics to automatically discover these data-driven topics within the Superbowl XLVIII tweets. Depicted using SAS Visual Analytics, I list the top 15 topics, in no particular order, with some example tweets that correspond to the network graph shown above:

  1. Pre-game coin toss ("awkward!!", "coin toss #fail")
  2. Sympathy for Peyton Manning ("feel your pain Manning, great season")
  3. Boredom ("This game is boring", "I'm going to sleep", etc.)
  4. Superbowl security ("lol superbowl security broadcasting it's wifi name and password on tv http://t.co/aOPMDmUf9x")
  5. Sodastream (commercials with Scarlett Johansson)
  6. Best halftime show ever
  7. Bruno Mars ("so classy", "Bruno Mars wins the Superbowl!")
  8. Budweiser ("Puppy Love" commercial)
  9. Happy Seahawks Fans
  10. Godaddy ("best super bowl commercial was the muscle bound spray tan fanatics. hilarious #GoDaddy")
  11. Esurance (post-game commercial with $1.5M give-away)
  12. J.C. Penney (clever social media marketing)
  13. Disappointed Broncos fans ("Broncos??? Hello?? U there?? Wake up and actually score")
  14. Terrible commercials ("#disappointing commercials", "Superbowl and it's commercials - both terrible")
  15. Sodastream Fizz Football Challenge Sweepstakes

As many of us know, the Superbowl offers a once-a-year opportunity for companies to create brand buzz through social media channels (hopefully positive, but sometimes the buzz is negative).

J.C. Penney's clever approach led to a viral event. During the first half of the game, J.C. Penney's corporate Twitter account tweeted two typo-laden tweets,

  • "Toughdown Seadawks!! Is sSeattle doing toa runaway wit h this???"
  • "Who kkmew theis was ghiong tob e a baweball ghamle. #lowsscorinh 5_0"

At first, people responded with:

  • "#sbmktg101 in a cost cutting move, JC Penney hires a toddler to tweet during game" 
  • "JC Penny's tweeter must be a Broncos fan, drinking his sorrows"

Apparently that was part of J.C. Penney's marketing tactic, which they foreshadowed earlier in the game with a tweet mentioning mittens. After the controversial tweets, they tweeted, "Oops...Sorry for the typos. We were #TweetingWithMittens. Wasn't it supposed to be colder? Enjoy the game! #GoTeamUSA pic.twitter.com/e8GvnTiEGl." The end result, nicely stated in this tweet, "J.C. Penny paid $0 for two fake drunk tweets and now have more mentions than a $3 million commercial."

One of the pre-game trending topics on Twitter was "Omaha", which is Peyton Manning's code word that he uses just before the snap. The graph below, Graph 2, shows the volume of tweets related to Omaha or any mention of Manning's code word, including misspellings (believe it or not there are many misspellings of the word Omaha in the data, such as omah, omaah, omahaaaa).

Graph 2: "Omaha" Trends on Twitter

There was even an over-under set at 27.5, so you could have bet on the number of times Manning said the word. Yet, the success story found on Twitter is seen clearly in the second spike. The Omaha Chamber of Commerce has pledged $1,500 to Manning's charity each time he says Omaha.

What else is being said about Omaha and how does it relate to the Broncos and the Seahawks?

Graph 3: Network Graph for "Omaha"

The network graph shown above, Graph 3, visualizes the relationship between Twitter hashtags and each team. You'll notice several mentions of the charity, marketing opportunities for OmahaSteaks, and even hashtags related to last year's blackout.

How can SAS Text Analytics and SAS Visual Analytics help you identify patterns and emerging trends within your data? Are you creating opportunities to promote your brand (like JC Penney)? Are you able to quickly identify critical events (like the leaked Superbowl wifi username and password)? How well are you able extract and understand emerging topics with your data? Most importantly, how does this information enable you to take action and make smarter business decisions?

See how SAS Sports Analytics is helping teams and leagues optimize pricing models, improve marketing ROI, attract fans - and keep them coming back! Learn more later this month at the 2014 MIT Sloan Sports Analytics Conference.

I’d be interested to know what you think you could discover if you had these analysis capabilities in your organization. Write back and let me know.

Post a Comment

Visualizing Superbowl Tweets with Text Analytics

In the days leading up to Superbowl XLVIII there’s a unique opportunity to capture insightful trends and patterns within social media.

Much of text analytics involves analyzing customer conversations, whether the conversations exist within social media, emails, forums, blogs, survey responses, or call center transcripts.

These conversations, just like the Superbowl tweets, are time sensitive. What is relevant today may not be relevant in a month, a week, or within the next 24 hours (think viral events). Similarly, if you contact a customer one week after they express anger, you miss the window to intervene and incentivize your customer to stay with your organization.

Below are some of the current trends and insights based on Superbowl tweets from the past two weeks.

Graph 1:  Twitter volume over time for Denver (orange) vs Seattle (green). Also, what are the top hashtags and who are the most influential authors?

Graph 1: Overall Trends, Top Authors, and Top Hashtags

Graph 2:  Who is winning the “Twitter Superbowl” based on fan support?

Graph 2: Social Media Volume - Broncos VS Seattle

Graph 3:  Do fans mention the Seahawks or the Broncos within the context of winning? How about within the context of losing?

Graph 3: Social Media Volume - Winners VS Losers

Graph 4:  Where are the Seahawks and Broncos fan’s located?

Graph 4: Mapping Fans - Denver Broncos VS Seattle Seahawks

What does this have to do with your business? When analyzing text, there are a few key questions you may want to ask yourself:

Why are you analyzing text?

This question is fundamental, but is sometimes overlooked. Organizations know that they have all this textual data and need to be doing something with it, but often fail to define a solid objective that leads to ROI (More on ROI in upcoming blog posts).

  • Do you want to identify data-driven trends? (often seen in marketing and customer intelligence)
  • Are you looking for root cause or a needle in a haystack? (seen in fraud applications)
  • Do you need to extract entities or facts such as IDs, names, demographic information, etc.?
  • Are you using textual data to enhance your predictive models?
  • Do you want to identify key influencers around a given topic or event?

What topics or categories align to your business requirements?

It's important to approach this from two angles:

  1. Use a data-driven approach to identify naturally occurring topics based purely on the data. Text mining, clustering, and natural language processing all help to enhance the statistical discovery of topics.
  2. Provide your domain-knowledge into the model, through business rules, that target the categories and topics you are specifically interested in based on your business requirements.

What data sources are you using (and how did you collect the data)?

Poor data collection methods lead to data quality issues and a large dataset with low relevancy. If you are collecting any data from online sources or 3rdparties, it’s important to understand the data collection process, filtering criteria, and queries, all of which could bias the data and introduce noise if not configured correctly.

  • What kind of web crawling techniques/tools are you using?
  • If you are using search terms to target and collect data, how did you choose these terms and are they limiting your results or introducing unnecessary noise?

What kind of action should the analysis elicit?

  • Do you need a dashboard to monitor trends, influencers and viral conversations?
  • Does your model trigger a promotional email, predict customer attrition, or flag a fraudulent event?
  • Can alerts help your social media team or call agents proactively reach out to customers with timely offers?

In the days leading up to the Superbowl, I will continue to update the analysis and give you insight into emerging trends and interesting findings. Please check out the software behind the analysis, SAS Text Analytics and SAS Visual Analytics.

Check out the Post-Game Analysis for more insights.

You can also download our whitepaper from last year's Superbowl or read Ken's recent post on measuring the economic impact of this year's Superbowl.

Post a Comment

Behind the scenes in natural language processing: Is machine learning the answer?

When you think of the phrase “express yourself,” you may think of expressing your sense of style through your fashion decisions or home décor, but most of us probably think of expressing our thoughts, feelings, opinions, needs, desires, etc. through language. Whether writing or speaking, language helps us to connect our own inner reality with the external reality that we share with others. There is so much complexity wrapped up in each linguistic expression, that it is amazing we can get computers to do anything with language at all!

During my career, I have focused on language as seen from the perspective of a computer. When a computer looks at text, all it sees are strings of characters. It doesn’t even distinguish between different types of characters like letters, numbers, punctuation, or white space. We humans have to tell the computer how to recognize meaningful patterns and cues to recognize constructs like words, sentence’, and topics.

What we really want is a computer to understand when a word is a meaningful concept alone or whether other words are required for an object, action, or relationship to become clear. For example the word 'couch' used as a noun has one meaning as a type of object in the real world, related to the action of sitting and part of a class of furniture:

Read More »

Post a Comment

Seeing is Knowing – Going Beyond just Listening to the Voice of the Customer

Seeing is knowing – Going beyond just listening to the voice of the customer

~ with contribution from Cindy Turner

We’ve all heard that a picture paints a thousand words. What if you could see a thousand words from your customers? With that view, you could probably paint a hundred different pictures – reflecting different thoughts customers expressed, different perspectives on the same topics, the same perspective on different topics – get the picture?

With all the communication channels for customers, prospects, lurkers, competitors … it’s become a mainstay activity for modern businesses to listen what their customers have to say. But when it comes to hearing the customer’s voice, there’s definitely more than meets the ear.

  • First there’s the volume. To paint a realistic picture of what’s being said, you might first need to hear (that is, collect) billions of data points.
  • Then there’s the variety. These days, data comes from direct customer feedback, survey verbatim, social media conversations, blogs, documents, call and maintenance notes, news articles and more. And a lot of it is in the form of unstructured text, which means it isn’t simple to sort out what’s relevant and what’s not, how it all relates and what it all means. Now you’re starting to listen.
  • And there’s velocity. As data is generated faster and faster, you not only have to be able to collect and manage it almost instantly – you have to be able to quickly structure it and evaluate it, both at an individual and a big-picture level. Because that’s the only way you’ll be able to respond with the type of instantaneous “I get it” and “This is what I can do for you” that your customers have come to expect.  Now you are saying “I understand".

Imagine what kind of picture you could paint if you got a good look at that other 90% of the data. The result just might be a corporate work of art.

How can you do it? With interactive text analysis visualization you have a sure way to capture, explore and visualize all that customer data – allowing you to sharpen your focus to the things that matter the most.

Think of it like this. Text data is similar to the concepts behind an abstract painting.  Its freeform data, big data, filled with plenty of errors – misspellings, abbreviations, different styles, etc. Text analysis helps reconcile all that variability by providing structure to the unstructured content. When you combine in-memory, high-performance visual analytics with text analytics, you wind up with an intuitive way to dynamically interact, explore, investigate and see your text data.  It’s simply a better lens.

Now you can visually explore all your customer data to quickly investigate relationships, patterns and outliers. You can zoom in and reveal true customer sentiment right down to the detailed comment that was written, and discover the root causes underlying the customer voice. Now you can do something very different from what you were doing before.

And that says it all. You can use words to create pictures – and with these pictures you can take action based on your customer’s voice.  We’ve just released a white paper on this topic, illustrating how you can paint as many pictures as needed so you are: Seeing the Voice of the Customer. I encourage you to take a look.  So tell us, what have you always wanted to see from your text?

Post a Comment

Big data is a big deal in government

Perhaps no other sector has more massive volumes of text data than government.  During the SAS Government Leadership Summit earlier this week, different agencies discussed big data topics, trends and techniques.  And while diverse elements of focus were apparent, shared themes emerged that Juan Zarate, former Deputy Assistant to the President and Deputy National Security Advisor for Combating Terrorism, poignantly outlined - four baseline functions that summarize many of the conversations.

View from the Newseum terrace, location of the 2013 Government Leadership Summit

Baseline Function #1 – Access: How can we better access, format, and protect big data?

Speakers explored alternate techniques to improve content access with managed oversight.  One ongoing challenge is examining    content silos from a comprehensive perspective.  One attendee stated that visibility across thousands of document sources would help reduce the over five billion dollars spent per year on third party contracts in one agency alone.  Knowing the content of the contracts, the associated services, and rates across all providers would give the means to spot duplication of effort and reduce extraneous costs.  Another shared a similar experience, using enterprise content categorization to classify documents across silos, layering the generated metadata on top of existing systems – and importantly, avoiding reinvention of any wheels. All agreed that having consistent commitment was a requirement (the OMB was mentioned) in order to be successful.

People mentioned information protection throughout the day.  Nate Silver, founder, FiveThirtyEight.com & author of ‘The Signal and the Noise’, reminded us that although the volume is increasing, ninety percent of working knowledge is not new.  One thing different today, he stated, is that the barrier for information sharing has been lowered.  And with these barriers down, privacy and civil liberty issues arise and will continue to be a topic for leadership.

Baseline Function #2 – Validation: How does big data change and shift for different users?

With different levels of confidence needed to support distinctive decisions, focus on the end use of big data is paramount.  Illustrating that decision-makers have specific needs from collected data, Juan Zarate posed: “What level of confidence does a border patrol officer need to take action?“  In legal matters there well-known rules of evidence, in statistical analysis there are quantified rules of confidence, but with big data – there is potential bias, particularly from social media sources.  As a result, situational awareness of how insight will be used was highlighted as critical in defining the level of confidence you need in the source.

Nate Silver indicated that one of the problems is that people are quite sectarian in what information they consume, and if there is bias in the data, then big data simply provides more options to delude yourself. To address this bias, he recommends assessing who is collecting the data, and understanding its purpose, amongst other things.  Denise Bedford, Goodyear Professor of Information Architecture and Knowledge Management, Kent State University, further clarified that we need ways to validate social media data to adequately assess the risks associated with any conclusions made from it.

Baseline Function #3 – Analysis: How can big data analysis help us think differently about what we do?

In analyzing big data, we need to be mindful of how we can think differently about problems and actions that we can take. Asking questions like: What actions could we change if we had immediate use of data? How would that lead to greater efficiencies or preventative measures?  Social media, a popular topic at the event, holds value but needs to be grounded in the context of issues. And analysis of social media isn’t simply a function of terms.  Language, as Denise Bedford pointed out, is much richer than ten to twenty keywords.  Dave Wennergren, Assistant Deputy Chief Management Officer, Department of Defense, cautioned the audience to not get “too liquored up by the system”, and focus consideration on the outcome or action.  In this keynote, Nate Silver provided guidance to those looking to analyze big data, suggesting that big data may change how you communicate uncertainty and probabilities, and that with big data there can be a tendency to over fit models and, above all, “try and err” in your analysis process.  And resounding loud and clear throughout the day was the directive to critically examine what decisions would be better if they were made faster, and then frame analysis questions to improve those decisions.

Baseline Function #4 – Portrayal: How can we best absorb the insight provided by big data?

Recognizing that if you “expose the data, it will be used in ways not previously conceived”, as Dave Wennergren said, different types of presentation of the data are required at different stages of analysis.  The need to have flexibility in the presentation layer used to examine big data, was explore in Randy Guard's (Vice President of Product Management, SAS) demonstration of Visual AnalyticsFEMA’s (Federal Emergency Management Agency) Carter Hewgley, Director, FEMAStat, described how in pre-event situations a lot of high impact visualization is very important to analysis and evaluation; whereas post-event, reports and statistics to assess effectiveness in addressing the needs of the communities assisted.

Big data perhaps ultimately provides us with the opportunity for reinvention. Juan Zarate described how he and his team redefined the Treasury’s role to provide unique value to Homeland Security investigations.  So too does big data provide the opportunity for every aspect of government to assess their unique contribution to better government. Policy makers and decision makers are looking for big data analytics to help determine the best possible, or at least the right options for decisions.

More coverage of Nate Silver’s talk, and the Application of Text and Social Media Panel can be found at: AllAnalytics.com

WIll you comment here with your baseline functions for big data?

Post a Comment

Get More Value From Promoter Scores

Some may remember that old shampoo commercial for Faberge Organics (with wheat germ and honey - I may add).  If someone likes the shampoo they’ll tell two friends, and so on, and so on (and yes, we all had bad hair in the 80s, in this life - or a previous one).

Rick Newberry, an education catalyst, has commented about the same network influence in education marketing. Telecommunications companies have begun to extend individual churn analysis (the propensity that someone will leave the provider) to include factors that reflect the churn propensity of others in their network, and thus their influence on an individual's churn likelihood.

The mantra is common to any field or industry. And surveys to understand the extent to which someone will recommend your product/service have been one vehicle to decipher this net promoter score.

An example promoter score range derived from ranked survey responses

Essentially you conduct a survey asking the respondent if they are likely to recommend your product/service. You’ll learn from the results that some are, some aren’t and yet others are neutral on the topic, as illustrated.  You may do a bit more with it, but essentially that’s where the survey application ends.

I’m pleased to tell you that - hot off the press - is a new white paper that describes an analytic process to get more from promoter scores.  Randy Collica, author of the book ‘Customer Segmentation and Clustering Using SAS Enterprise Miner’, outlines how to mine unstructured text the surveys, along with other customer inputs – like call notes, web chats, etc.. to predict promoter inclination for the complete customer population.  In fact, you've likely seen this before in other forms, using text analytics to assess the voice of the customer from all channels, and understand their degree of satisfaction.

But this paper takes it further. He then goes on to explain how econometric models can be used to quantitatively evaluate the impact of migrating customers from one level of promoter score to another.  This changes the tide, from assessing who is at risk, to proactively informing investment decisions designed to alter the customer portfolio.

I encourage you to download this white paper, and we'll be hosting Randy in a webinar this spring, to talk more about it.

If you’d like to ask him anything about this work, please comment to this blog what you'd like to know, and we'll add that to the webinar conversation.

Post a Comment

Unstructured data is “Too Big to Ignore”

Unstructured data is “Too Big to Ignore”

Phil Simon’s latest book: “Too Big to Ignore: The Business Case for Big Data” takes the reader on a journey traversing our familiar relational information comfort zone to the current big data overload most are just beginning to wade through.

With case studies, frank descriptions and lots of references, this is a particularly helpful volume for those who have struggled to gain the attention of their senior management and get them onboard with analyzing text (and other unstructured) data. This book might even inspire those who treat data as “a four-letter word” (pg. 41). Just anonymously leave a copy on their desk.

In the early part of the book, Simon explains why we can’t ignore unstructured data anymore. Not only is there more of it but, for the most part, it is digital by its very nature so the delays in having it available have evaporated.

Those of us who have been involved in text analytics know full well that structured data only tells part of the story and, according to IDC, that part will only be 10% of the digital universe over the next decade[1].  By now we also know that existing IT infrastructure applications have been designed around data in order to consistently deliver a known outcome, such as the same calculated field, the same monthly report, etc.  Big data has (and will continue) to change this and software, amongst other aspects of the technology food chain, continues to evolve accordingly.

New ways to examine and identify issues and critical trend monitoring based on text insights are now possible.  Imagine taking your categorized call center notes (that’s right, all of them) along with the sentiment scores associated with organizational performance, and creating an interactive picture that lets you drill into different sentiments, different times, for different customer types and for different topics.  You not only understand the voice of the customer, but you can see it.

You don’t have to simply imagine it, you can just do it.

In the diagram, the tiles at the top summarize the sentiment associated with different functional areas of a business, all gleaned from incoming customer communications, with red indicating negative sentiment. The corresponding timeline of overall sentiment illustrated at the bottom of the figure, let’s you pinpoint issues at different time periods. As a real-time application you can drill-into details highlighted in this graph, filtering by customer type, geography, issue, predicted churn rate – whatever you need to understand – for all the data.

‘Too Big To Ignore’ describes the new infrastructure paradigms and the considerations to house and manage big data.   And as you know, there isn’t any point in keeping it unless you can derive insight from it. Luckily there is an app for that.  Have you tried the SAS® Visual Analytics demo?

More insights from Phil Simon are available on The Knowledge Exchange.

What aspects of the book do you like the most?

Post a Comment

Putting together the Text Puzzle

Are you currently monitoring social media conversations for customer feedback about your products or services?  Maybe you want to predict or uncover fraud based on emails, online forums, or chat sessions.  Or, if you’re like many of our customers, you’re inundated with call center notes and surveys, both of which contain valuable information about your customers and the products/services that you provide.

No matter your goal, each customer touch-point enhances your view, providing you with additional puzzle pieces & insights that may help to drive business decisions.

As an example, consider the following customer review:

(Line #1)  "I placed an order on your website 3 weeks ago."
(Line #2)  "The price was cheap and to my surprise it arrived 2 days ahead of schedule."
(Line #3)  "However, the batteries were missing and the headphones didn’t work."
(Line #4)  "When I called, I was placed on hold for 30 minutes.
(Line #5)  "When I finally did get through, the rep was very rude and wasn’t helpful."
(Line #6)  "I am going to return the mp3 player, and will never shop here again."

This is a fairly standard customer review. It contains both positive and negative feedback, describes multiple issues, identifies the product and/or service, and has an outcome.

There are a few key things to point out:

Each line adds intelligence about the customer and their situation. Re-read the review, leaving out line #3. Without this sentence, the customer has no reason to call and complain. Even more important, you have no factual information to make your product or service better. You only have a dissatisfied customer, seemingly because they were put on hold for an excessive time and had an unfavorable conversation with the call center rep.

Line #2 contains positive feedback for the company. However, basic sentiment analysis solutions may look at the word “cheap” and classify it as a negative, as in “cheap quality.” There's also no real positive keyword to let us know that “2 days ahead of schedule” is something positive. Having the ability to analyze words within context reinforces their true meaning and enables your business to get maximum insight from your data.

If you were the company receiving this feedback, you’d also like to know that the review discussed these 5 key areas:

-  Distribution (which was perceived as positive because of the quick shipping of the order)
-  Price (perceived as a positive because it is cheap)
-  Quality (perceived as negative because of the missing batteries and malfunctioning headphones)
-  Customer Service (perceived as negative because of time on hold and the rude and unhelpful customer service rep, even though “customer service” was not explicitly mentioned)
-  Reputation (perceived as negative because they “will never shop here again”)

Why is this important to you? Because your results need to be accurate if you are going to make business decisions! The right software and technology enables intelligent extraction of relevant information. Ultimately, it doesn’t matter how much “big data” you have on your customers. If you cannot extract key information or if it's misclassified, you’re left with misinformed decisions, untapped value, and a puzzle that at best is yet to be finished – and at worst, is giving you the wrong picture. If content is king, then relevance is the kingdom.  Have you uncovered misnomers using Text Analytics on your data? If so, we’d love to hear about your experiences.

Post a Comment