Get More Value From Promoter Scores

Some may remember that old shampoo commercial for Faberge Organics (with wheat germ and honey - I may add).  If someone likes the shampoo they’ll tell two friends, and so on, and so on (and yes, we all had bad hair in the 80s, in this life - or a previous one).

Rick Newberry, an education catalyst, has commented about the same network influence in education marketing. Telecommunications companies have begun to extend individual churn analysis (the propensity that someone will leave the provider) to include factors that reflect the churn propensity of others in their network, and thus their influence on an individual's churn likelihood.

The mantra is common to any field or industry. And surveys to understand the extent to which someone will recommend your product/service have been one vehicle to decipher this net promoter score.

An example promoter score range derived from ranked survey responses

Essentially you conduct a survey asking the respondent if they are likely to recommend your product/service. You’ll learn from the results that some are, some aren’t and yet others are neutral on the topic, as illustrated.  You may do a bit more with it, but essentially that’s where the survey application ends.

I’m pleased to tell you that - hot off the press - is a new white paper that describes an analytic process to get more from promoter scores.  Randy Collica, author of the book ‘Customer Segmentation and Clustering Using SAS Enterprise Miner’, outlines how to mine unstructured text the surveys, along with other customer inputs – like call notes, web chats, etc.. to predict promoter inclination for the complete customer population.  In fact, you've likely seen this before in other forms, using text analytics to assess the voice of the customer from all channels, and understand their degree of satisfaction.

But this paper takes it further. He then goes on to explain how econometric models can be used to quantitatively evaluate the impact of migrating customers from one level of promoter score to another.  This changes the tide, from assessing who is at risk, to proactively informing investment decisions designed to alter the customer portfolio.

I encourage you to download this white paper, and we'll be hosting Randy in a webinar this spring, to talk more about it.

If you’d like to ask him anything about this work, please comment to this blog what you'd like to know, and we'll add that to the webinar conversation.

Post a Comment

Unstructured data is “Too Big to Ignore”

Unstructured data is “Too Big to Ignore”

Phil Simon’s latest book: “Too Big to Ignore: The Business Case for Big Data” takes the reader on a journey traversing our familiar relational information comfort zone to the current big data overload most are just beginning to wade through.

With case studies, frank descriptions and lots of references, this is a particularly helpful volume for those who have struggled to gain the attention of their senior management and get them onboard with analyzing text (and other unstructured) data. This book might even inspire those who treat data as “a four-letter word” (pg. 41). Just anonymously leave a copy on their desk.

In the early part of the book, Simon explains why we can’t ignore unstructured data anymore. Not only is there more of it but, for the most part, it is digital by its very nature so the delays in having it available have evaporated.

Those of us who have been involved in text analytics know full well that structured data only tells part of the story and, according to IDC, that part will only be 10% of the digital universe over the next decade[1].  By now we also know that existing IT infrastructure applications have been designed around data in order to consistently deliver a known outcome, such as the same calculated field, the same monthly report, etc.  Big data has (and will continue) to change this and software, amongst other aspects of the technology food chain, continues to evolve accordingly.

New ways to examine and identify issues and critical trend monitoring based on text insights are now possible.  Imagine taking your categorized call center notes (that’s right, all of them) along with the sentiment scores associated with organizational performance, and creating an interactive picture that lets you drill into different sentiments, different times, for different customer types and for different topics.  You not only understand the voice of the customer, but you can see it.

You don’t have to simply imagine it, you can just do it.

In the diagram, the tiles at the top summarize the sentiment associated with different functional areas of a business, all gleaned from incoming customer communications, with red indicating negative sentiment. The corresponding timeline of overall sentiment illustrated at the bottom of the figure, let’s you pinpoint issues at different time periods. As a real-time application you can drill-into details highlighted in this graph, filtering by customer type, geography, issue, predicted churn rate – whatever you need to understand – for all the data.

‘Too Big To Ignore’ describes the new infrastructure paradigms and the considerations to house and manage big data.   And as you know, there isn’t any point in keeping it unless you can derive insight from it. Luckily there is an app for that.  Have you tried the SAS® Visual Analytics demo?

More insights from Phil Simon are available on The Knowledge Exchange.

What aspects of the book do you like the most?

Post a Comment

Putting together the Text Puzzle

Are you currently monitoring social media conversations for customer feedback about your products or services?  Maybe you want to predict or uncover fraud based on emails, online forums, or chat sessions.  Or, if you’re like many of our customers, you’re inundated with call center notes and surveys, both of which contain valuable information about your customers and the products/services that you provide.

No matter your goal, each customer touch-point enhances your view, providing you with additional puzzle pieces & insights that may help to drive business decisions.

As an example, consider the following customer review:

(Line #1)  "I placed an order on your website 3 weeks ago."
(Line #2)  "The price was cheap and to my surprise it arrived 2 days ahead of schedule."
(Line #3)  "However, the batteries were missing and the headphones didn’t work."
(Line #4)  "When I called, I was placed on hold for 30 minutes.
(Line #5)  "When I finally did get through, the rep was very rude and wasn’t helpful."
(Line #6)  "I am going to return the mp3 player, and will never shop here again."

This is a fairly standard customer review. It contains both positive and negative feedback, describes multiple issues, identifies the product and/or service, and has an outcome.

There are a few key things to point out:

Each line adds intelligence about the customer and their situation. Re-read the review, leaving out line #3. Without this sentence, the customer has no reason to call and complain. Even more important, you have no factual information to make your product or service better. You only have a dissatisfied customer, seemingly because they were put on hold for an excessive time and had an unfavorable conversation with the call center rep.

Line #2 contains positive feedback for the company. However, basic sentiment analysis solutions may look at the word “cheap” and classify it as a negative, as in “cheap quality.” There's also no real positive keyword to let us know that “2 days ahead of schedule” is something positive. Having the ability to analyze words within context reinforces their true meaning and enables your business to get maximum insight from your data.

If you were the company receiving this feedback, you’d also like to know that the review discussed these 5 key areas:

-  Distribution (which was perceived as positive because of the quick shipping of the order)
-  Price (perceived as a positive because it is cheap)
-  Quality (perceived as negative because of the missing batteries and malfunctioning headphones)
-  Customer Service (perceived as negative because of time on hold and the rude and unhelpful customer service rep, even though “customer service” was not explicitly mentioned)
-  Reputation (perceived as negative because they “will never shop here again”)

Why is this important to you? Because your results need to be accurate if you are going to make business decisions! The right software and technology enables intelligent extraction of relevant information. Ultimately, it doesn’t matter how much “big data” you have on your customers. If you cannot extract key information or if it's misclassified, you’re left with misinformed decisions, untapped value, and a puzzle that at best is yet to be finished – and at worst, is giving you the wrong picture. If content is king, then relevance is the kingdom.  Have you uncovered misnomers using Text Analytics on your data? If so, we’d love to hear about your experiences.

Post a Comment

Analyzing Social Media Data and Saving Babies

~ Contributed by Susan Quinn ~

I live in North Texas.  The summer months are hot, and we typically see a surge in the number of creepy crawlies and flying insects…especially mosquitos.  This year was different.  This year those mosquitos carried West Nile virus.   As of this writing, the virus has killed over 200 people, and infected thousands more.  A few days ago, the first case was reported to have spread all the way to Maine.

The spread of West Nile virus in the United States

Scientists and government agencies are struggling to determine how the virus spreads over long distances and whether it is related to a mild weather or global warming.  In the meantime, local officials turned to social media, as well as traditional media, to remind the public to eliminate the standing water that is the mosquitos’ breeding ground.

Officials also used social media channels to tell the public when overhead spraying would take place…neighborhood by neighborhood.  Social media turned into a better source of this information, as plans changed by the minute because of weather conditions.  Individual citizens relied on social media, resulting in hundreds of thousands of tweets about the virus.  They communicated what was happening in their neighborhood, and whether a family member had contracted the disease.  They shared details of their treatment options and how the disease was progressing.  They expressed fear about the long range impact of the overhead spraying on their children’s’ health.  They also asked for, shared, and received information on what they needed to do to stay safe.

It is now December.  The weather is turning cooler here in North Texas, and the insect populations have started to decrease.   But we wait and wonder what next years’ warm weather brings and how we can do a better job of protecting our children and the elderly.   The officials tell us that our best hope is for a good freeze this winter.

Now, imagine a world where scientists could use social media data to detect early warnings of local disease outbreaks, predict their spread, and put preventative measures in place to save lives.   Imagine if they could relate what is happening in social media to other data and better predict events.  That is exactly what many government agencies are looking to do by using text analytics to monitor social media discussion. This article titled "Canary in a data mine: How analytics detects early signs of bio threats" is just one example of how SAS is working with government entities to analyze social media data.

Most of us are familiar by now, that businesses analyze social media to better understand what consumers are saying about their brand.   Social media is a valuable source of data because individuals discuss emerging events, create and share content and opinions, and openly discuss what is happening in their daily lives.  This organic spread and growth of information can provide tremendous insights about our world and its people.  And when it is carefully analyzed, and responsibly used…it can be used to improve the human condition.

National and local governments are realizing that by listening to social media, they can better serve their constituents and better accomplish their missions.    In fact, compared to traditional sources information, social media provides a near real time source of ground information, which enables these organizations to more quickly get resources in place to better manage emerging events.

National governments can listen to social media to get early warning of disease outbreaks, help fight terrorism, counter drug-trafficking, and monitor the marketplace for fraud.  They can use social data for more accurate situational analysis to detect geographic micro-spots where natural disasters have had the most impact or where refugees have fled due to human violence… and therefore they can more effectively deploy humanitarian aid to vulnerable populations.  At a more macro level, they can use social data to better predict future economic conditions and take proactive actions to minimize population hardships.

Local governments can monitor for signs of riots or crime, and prevent violence.  For example, the New York Police Department has found that they can combat teen violence fueled by insults and dares traded on social media.  Gang members will openly boast on in social media, their intent to encroach on another gangs turf or “diss” another gang member.  And now, the NYPD monitors those tweets and uses the information to deploy resources and curb that violence.  

As I think about how government are using text analytics on social media, I realize how different this world is from the one I come from.  You see, I am from the commercial business world.   And in that world I have heard business colleagues use the phrase, “Hey, we are not saving babies here”.  This is usually said in an attempt to put a stressful business problem in perspective.

Over the past year, my colleagues and I have had the opportunity to work with many government agencies.  And we have seen that the same great solutions that SAS has to solve business problems can also be deployed by those organizations whose missions really do encompass improving the lives of people… and in effect 'save babies'.  I think it is one of the many exciting, interesting, and rewarding things about working at SAS.

What are some other examples where the analysis of social media data has helped non-business entities accomplish their missions of societal benefit?

Post a Comment

Analyzing Twitter for churn propensity

With contributions from David Ogden & Barry deVille

Adding Text Analytics to your predictive modeling has proven to result in better predictive models, including those related to customer churn.  Once text data (e.g. call center comments, website feedback forms, blogs, emails, twitter) is converted to structured variable(s) and included in your data mining data set, you can expect greater model lift, and richer insight regarding the underlying causes of why a customer might leave. If there is more than one comment from a customer (let’s say multipleTwitter comments), customer id is no longer unique, because there are repeated rows of input where only the text field is different.

Some pre-processing of the data is needed to address multiple text entries for a single person. Three options are described below:

Option 1:

  • Create two data sets. One would have a single row for each customer with all variables except the text field.  The other would have one row for each comment; with a variable for the comment + the target variable (churn) + the customer id in each row.
  • Apply the following SAS Text Miner nodes to the second data set: Input Data Source -> Text Parsing -> Text Filter -> Text Topic (making any changes you wish to the default settings), and run.   The Text Topic node is creating topics using rotated SVDs (singular value decompositions) from the text data.
  • Use a SAS code node after the Text Topic node, to apply the SAS Summary procedure to the results, which will create a single observation for each customer id that includes the summary information for each raw topic variable (done using the max or some other quantile such as median or 75% quantile, for the structured text data).
  • Then simply merge the results with the first data set by customer id

Option 2:

  • Similar to Option 1 – creating the two data sets
  • Applying the same flow, but instead of using the Text Topic node (wherein the text may belong to more than one topic), you can use the Text Cluster node (wherein the text has exclusive membership to a single cluster).
  • In the Text Filter node step, it is suggested that you use the Mutual Information term weight property, so that the term weights are based on correlations with the target, churn, variable.
  • You could merge your target variable onto the second dataset (the one with multiple records per individual), generating the SVDs based on the Mutual Information weighted terms, and then summarizing the SVD values by individual (perhaps using the max).

Option 3:

  • You could also “solve” this issue by concatenating the text, a simple operation in SAS. By putting all the comments associated with a single individual into one long string, a single observation would then represent all the twitter comments from that individual.

These options have unique benefits – but under different data conditions. Options 3 is the simplest solution. However, it has the drawback that you may lose the sense for how comments over time impact likelihood of future churn, and it will not represent the documents as the first two methods, if you often have multiple comments per user.  So, if the cardinality is small, i.e. when your average customer only has one text field, and only occasionally you see two or three per person, then this may be the best solution.

However, when the average number of fields for each customer is greater, you retain more information by using the SVD, so either Option 1 or Option 2 could work best.  With Option 2, however, interpreting the results of your modeling is more difficult since the SVD dimensions would not have a clear interpretation.   And with Option 1, if you have the churn target defined prior to running the Text Topic node, Mutual Information weighting is automatically applied – and a rotated SVD is calculated, making interpretability easier.

Twitter specific data? Well, you also need to keep in mind that with the associated 140 character limit, there is likely a trade-off between power, precision and level of aggregation (at least in many languages). It is very common to roll up tweets over a given time period or author to create an “author-aggregate” record. You may want to try various aggregations: message, author, day, week, and so on.

And there are many, many cues in twitter that have to be extracted from emoticons and “net-speak”. Irony and sarcasm are likely to be good triggers for a churn event as in “Great service!! NOT!!!” As you can see, even upper/lower case and punctuation are important.  In addition to the text component of tweets we also have the social network component -- @, RT and so on. This enables you to construct a social network and this, in turn, can be used as a predictor of churn.

In the current era, it is essential to use network metrics – if you can – to do state of the art churn analysis.

For more information on Twitter analysis, read the paper, “How You Can Identify Influencers in SAS® Social Media Analysis.” And for other data processing considerations in Social Media, take a look at this whitepaper, “Sifting Through the Noise of Social Media”.

Are these ideas helpful to you?  Please comment on your challenges and successes analyzing text comments for churn propensity.

Post a Comment

‘That’ Meatloaf Won’t Do

A few weeks ago my friends and I were at a karaoke night and a married couple put in for Meatloaf's "I Would Do Anything for Love (But I Won't Do That)" simply because it had both male and female parts (and likely because they’d gotten sick of the duet "Time of My Life" from “Dirty Dancing”). My friend and I responded to this song choice by debating what the 'that' is in the song title. So instead of listening to the song, we were hunched over an iPad reviewing the lyrics.

Now figuring out what Meatloaf is singing about may not immediately scream "critical business problem" to most people, but it parallels a problem that comes up more often than not in corporate communication: ambiguous pronouns. Let's take the above paragraph for example. In the last sentence I used the pronoun 'we.' Based upon the story the average reader can process that it refers to my friend and me. By grammatical convention, the pronoun matches to the last personal noun set; that would be 'another friend and I’ and not 'the married couple.'

To a computer though, it simply appears that someone named 'we' was looking at an iPad. And there are two possible candidates for 'we.' The question becomes how to match the pronoun to the correct object. With the 12.1 release of SAS Text Analytics we've enhanced pronoun resolution - so computers can more easily match the right pronoun to the right concept of interest. There's two new features that help with this. The first involves much greater entity resolution capabilities, especially in SAS Enterprise Content Categorization. Content Categorization can now connect to a website that lists available entities and cross-reference them in documents. The second is a new Boolean operator, UNLESS, which associates pronouns to the closest noun that could match to the pronoun.

In insurance and legal matters there are, fortunately, well-written documents that make this easier. A colleague was recently doing some work with OSHA data around determining the facts of an accident. So in the following passage, for example:

"The butcher was passing the baker's rack to the candlestick maker. The rack fell on the candlestick maker. He broke his nose."

To a computer the word "he" is simply a word. To a human, we automatically assign it to candlestick maker (assuming grammatical correctness). Because of that, we know that candlestick maker broke his nose, not the butcher. We can assign the broken nose to the candlestick maker instead of "he." Plus, with our new entity extraction tools, we know that "candlestick maker" is the full entity, not just the word maker. If you're an insurance company, for example, you want to be confident you assign fault to the proper party. Historically, this would require reading the documents because of the frequent need for pronoun resolution – but now you can run it on a laptop and save tremendous amounts of time and human energy.

The above is a fairly simple case. One of the advances we've done at SAS is to help determine when pronouns switch from one actor to the next. Take the following review for example:

Pinkie Pie is the best! She's awesome and pink! Forget about Fluttershy. She's too quiet.

We can figure out that in this informal review that the writer really likes Pinkie Pie and isn't the biggest fan of Fluttershy from My Little Pony: Friendship is Magic. We can go further and figure out why exactly the reviewer likes one character over the other. The first "she" we see resolves to Pinkie Pie and, with our entity extraction and pronoun resolution capabilities, we know that Pinkie Pie is some type of "she." Even though Pinkie Pie doesn't fit into any conventional name list, our integration with DBPedia lets us know it's a type of My Little Pony (and a girl). From there we can tell that Pinkie Pie is awesome.

Now we've also got a second "she" thrown into this review. With the new UNLESS operator SAS Sentiment Analysis and Content Categorization determine when a new "she" hits the scene --- in this case, Fluttershy. Because SAS Sentiment Analysis already has the ability to analyze features of products, we could parse out the sentiment around both My Little Ponies and find out why this reviewer finds Fluttershy negative (too quiet).

SAS goes beyond statistics to help businesses figure out more precisely what's going on in their data. In this case, SAS has done it to help answer such serious questions as "What is the 'that' which Meatloaf will not do for love?"

Post a Comment

Feeling Presidential

I give the patch of red-green crabgrass at my feet a couple of small kicks, and the wind picks up just enough to make the flag pop a few times before it falls to rest again.  Four or five spots ahead of me, a young mother bends over her stroller to make sure the blanket around her child is tucked and keeping him warm.  No one around is terribly loud, so I can hear her joke: “If this line doesn’t speed up, my son might grow up and hate democracy.”

If the brisk chill then were not evidence enough, the pumpkins sitting out front of the library doors or the piles of leaves across the street would convince you.  You might understand that Election Day is finally here, but my mind and body know that this is Fall in North Carolina.  While both statements are factual, only the latter contains that mix of physical experience, memory, and implications that make it a part of who I am.  Election Day is marked on a calendar, but the smells, the sound of leaves blowing across the ground, the sight of pumpkins, and feel of Fall air make up a stronger bond of association.

Yes, I do have a point.  Colleagues of mine have written about using unstructured data to predict elections, but as a hobby, I’ve been paying more attention this year to the mechanics of language throughout this race.  Political scientists understand that votes are often cast based on a similar nuance to the calendar/belief example above - that a particular feeling may determine which bubble we’ll fill in on the ballot - whether it is a feeling about the candidates themselves, the political party that they represent, or an issue that we feel to be important.  If we feel that we identify with a particular party, then we may cast a vote based on party alone.  And it is very common that the set of experiences that are ingrained in us will determine how we view the choice of candidates.  It’s the language of feeling, rather than the language of fact, that resonates the strongest when it comes to voting.

This may help explain why a significant proportion of the social chatter following the Vice Presidential debate was about Vice President Biden.  While he was certainly the more animated figure, his expressions and laughter caused reactions that were largely split along party lines.  If you related to him (or his party or ideas), you may have viewed this as confidence or a pleasant disposition.  If you related more to Representative Ryan, you may have viewed Biden’s behavior as rude instead.  Same fact, different perceptions.  This demonstrates the need to have social listening tools that reflect patterns of both facts and perceptions, and in this context a ‘smirk’ may mean something very different than a ‘smile’.  The ability to capture nuance such as this enables a much finer association analysis.  After all, the words that a politician uses are usually chosen carefully.  If it were not Big Bird but Oscar the Grouch that became a symbol in the election, wouldn’t that take on a different meaning altogether?

In an election that has largely been characterized by strong polarization, much of the sentiment analysis has served the effect of characterizing the base of each party.  This does allow us to start forming “issue” vectors and to cluster authors, as well as track the relative positive (green) and negative (red) chatter over time to spot patterns and differences,

Barack Obama - Overall

such as the morning spikes seen for Romney (below) that represent mini-campaigns – first negative and later positive.

 

Mitt Romney - Overall

 

 

 

 

 

One major goal of the political campaigns, however, was to target independents or undecided voters, especially in swing states.  In technical terms, the ability to find the clusters from those populations after projecting onto the “issue” vector space identified in the high level analysis, would allow the campaigns to understand the hot issues, what positions more independents or undecided voters tend to side with, and therefore point the campaigns toward what messages to send and along what channels.

Apart from last minute jokes or campaigns, one of the major themes in social media around the election that is being shared today is about election fatigue.  After months of advertising, messaging, calls, polls, and omni-present election coverage in the news, we’re all a little anxious to know the results and hopefully – no matter what the outcome – begin working together for a better America.

How did you feel about the language used in ads and by the candidates?  Did you consider a particular meme (e.g. an empty chair) or message something that might be powerful, whether or not you supported that side?

Post a Comment

Active Learning: Model-building via man-machine cooperation

“Forty-two”, said Deep Thought, with infinite majesty and calm. “The Answer to the Great Question, of Life, the Universe and Everything.... I checked it very thoroughly," said the computer, "and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is."  ---- Hitchhiker’s Guide to the Galaxy

Much of the time when I analyze a set of data, I really don’t know for sure what I am looking for.  I am hoping that with the right sort of analysis, I will stumble across some interesting result that will justify the effort expended.   I have my fingers crossed that the analysis will tell me something useful.

At those times I am often reminded of the euphemism garbage-in, garbage-out.   I get an answer, but removed from context it tells me little or nothing (i.e. I don’t know the question!).   I have often wished that I could interact with the machine learning algorithm, combining my real-world knowledge with the computer’s speed: looking through thousands of documents in the blink of an eye, but allowing me to pass judgment on the few documents that could use the “human touch”.

The new release of SAS Text Miner includes a Text Rule Builder node provides just this capability.  The Text Rule Builder automatically discovers sets of rules composed of presence or absence of terms for a categorical target.  Unlike many machine learning algorithms, these rules are descriptive, so the user can immediately judge whether they meet the smell test.

But there is an additional element to this node.  With a single click, you can see a list of the documents that are either unclassified or which the model thinks are incorrectly classified in the data, sorted by the model’s confidence.   You can then change the target values singly or in groups.  Once you are satisfied with your changes, you can regenerate an improved model.   This process of querying the user is known as active learning (cf. http://en.wikipedia.org/wiki/Active_learning_(machine_learning) )

But why is this so useful? Well, here are three examples:

1)  Machine learning makes an assumption that assigned categories are a “gold standard”, in other words, error-free.  In most cases that is a simplification, in some cases it is downright absurd.  For example, one of our banking customers has a call center where the operator identifies what the call is about by selecting an issue from a pull-down list.  However, they discovered that some operators were just picking an item near the top of the list, and some were always labeling calls with the very top listed item, no matter what the call was about!   By using the Rule Builder node, such a customer could utilize the active learning capability to correct these mislabels.

2)  In some contexts a very large proportion of data is unlabeled, with very few labeled examples.  One common example of this is fraud.  Suppose that you have a very large number of claims… there are a few that you have previously identified as fraudulent, but you know there are many other undiscovered fraudulent claims.  So a natural approach is to train a rule builder model on the fraudulent cases, and then use the active learning view to identify other fraud candidates out of the many thousands of claims received.  Then a human reviewer can identify which of these candidates need to be investigated.  Once again, as you identify more such cases, you can continue to refine and improve the model.

3)  Sometimes you might have uncategorized data, but you want the system to automatically discover the themes in your documents.   This is normally done via topic modeling:   In SAS Text Miner, via the Text Topic node.   What often happens is that you will get some topics that you expect to see, along with some surprises: some themes that maybe you didn’t expect, but seem reasonable.  In addition, you are likely to get some topics that seem “catch-alls”, or at least not clearly about one thing, as well as some topics that don’t seem that much different from each other.   What if you could use your real-world knowledge to mold these topics into something more useful?  Well, with active learning you can.  The Text Rule Builder node can be used to discover a set of rules underlying each topic: you could think of each of these rules as representing sub-topics.  Then you can use active learning to shift the meaning of the topic by changed ratings on documents at the margins.

These are three handy ways to use this active learning framework. Maybe you can think of more? Let me know!

At the end of the day, it is important that you have the best tools to help you solve your business problem.  We believe the best solution is to utilize the unmatched sifting power of machine learning but leveraged by your business expertise.   After all, you need to understand the question to fully comprehend the answer.

 

Post a Comment

Social media speaks to sentiment: SAS Championship is classy and fun

They say golf is a simple game. The objective is straightforward; get a little white ball into the hole while minimizing the number of shots. Yet, anyone who has ever stepped up to the tee understands that there are many other factors at play.

I can’t tell you why, but it seems like there’s a magnetic force between my golf ball and the nearest pond. The greens are either too fast or too rough, and what feels like a great drive sends my ball slicing into the trees. On the surface, golf may be a simple game, but the devil is in the details.

In some ways, analytics can be analogous to golf. The concept is simple; you have all of this data stored on private servers and available to you on the Web.  As described in this new book, Win with Advanced Business Analytics: Creating Business Value from Your Data, you can use SAS software to build models, and these models provide you with results that you can use to make informed decisions.

Results are easy to obtain; accurate and insightful answers should be the goal.

As an Analytical Consultant here at SAS, I have the opportunity to see a wide range of data from customers in virtually every industry, as well as data from online sources. As you may suspect, the data is typically scattered with special characters, misspellings, jargon, and abbreviations that are difficult to decode. There’s one key commonality, the goal is to simplify the chaotic data. How can I cut through the noise, focusing on relevant information, so that I can deliver valuable insights and surface answers buried within the volumes of textual data?

To demonstrate what I’m talking about, I used SAS Text Analytics to automatically scan through 6,342 Tweets around the PGA Tour and more specifically, this year’s SAS Championship, which is now underway at Prestonwood Country Club.

What topics are being mentioned along with the SAS Championship on Twitter?

Social Network Graph - SAS Championship Network on Twitter (1 week of Tweets)

The network graph, shown above, illustrates the social network of Twitter users mentioning the SAS Championship. Users who are engaged in conversation with each other or discuss a similar topic are “clustered” together. Each circle represents a unique Twitter user and each line represents a conversation between two or more users. The size of the circle is equal to the number of mentions or tweets per user.

  1. Ryder Cup – There’s still plenty of conversations following the European victory this past weekend. The posts mention that this year’s SAS Championship will host 24 players from former US and European Ryder Cup teams.
  2. SAS Championship – Twitter is full of posts on the “free weekend passes” given away, the “Green Networking Event” and general excitement about this event. As expected, this is the largest cluster within the network graph.
  3. AGoldFan (Adam Gold, Radio host at ESPN/The Fan) – The second largest network, partly due to his radio host status and nearly 4500 Twitter followers, Adam is clearly promoting this event for SAS and the PGA. It’s also worth noting that Adam is not necessarily tweeting a lot about the SAS Championship but, rather, his content has a strong reach and is being re-tweeted by many of his followers.
  4. Raleigh & Durham Chamber – The Raleigh Chamber and the Durham Chamber show up in the analysis and are promoting the “Green Network Event”.
  5. Raleigh Radio, 999TheFan – 999TheFan is promoting the SAS Championship, giving away spectator tickets, and interacting with golf fans excited to attend the event.

To identify issues and spot trends, I used SAS Text Miner to explore the relationships between words in order to understand how they are associated with each other. The graph, shown below, expands on the “SAS Championship” and interestingly we see the words “classy”, “fun”, and “great” all strongly associated with the SAS Championship!

Concept Links - Terms and Phrases associated with the SAS Championship

It’s clear from this example that using text analytics can give organizations the timely insights they need to improve customer experience and competitive position. This free white paper, Text Analytics for Social Media, shows how marketers can get relevant information from social media.  Jim Sterne, @jimsterne an internationally known writer and speaker on electronic marketing and customer interaction, wrote this paper for marketing professionals who want to understand the practical side of text analytics as a competitive advantage.

Please join me at the SAS Championship this weekend at Prestonwood Country Club here in Cary, North Carolina to experience this fun and classy event! And if you can't join us in person, join in on the conversation @SASChampionship.

Let me know your latest social media puzzle and together we can determine the best means of analysis.

Post a Comment

Generation SAS: filling the analytic skills gap

Did you know we have a blog dedicated to the voice of professors and students?  Generation SAS is a blog forum, encouraging submissions from all those working in school setting, to share their thoughts and research.

Knowing that universities around the world are teaching text analytics using SAS, I’m thrilled we have a forum for sharing insights, tips and inventions – and I encourage you to submit yours.

I’d like to highlight two that were originally posted on the Generation SAS blog.

The first is from Satish Garla, based on work he did during his Masters in Information Systems at Oklahoma State University.  Satish wrote a SAS macro that can fetch customized tweets, clean them – for example by omitting specific words, and purging terms such as http tags, and URLs, that may not be wanted in the text mining of the data.   His %GetTweet macro paper provides solid education for tweet geography and compelling visualization of the results.

In another paper, a fellow Oklahoma State University student, Hari Hara Sudhan,  examined the tweets stemming from Wal-Mart’s win in a gender discrimination lawsuit – discovering text clusters that clearly showed variations in the way people tweeted about Wal-Mart before, during and after the verdict was passed.

I guess you could say that Generation SAS analyzes the content created by all predecessors to date: Generations AO (‘Always-On’), Z, Y, X and those even before that.

If you are going to A2012, be sure to stop by the student posters, and talk to presenters about their research with SAS.  And if you are new to text analytics, you’ll appreciate both pre-conference and post-conference workshops for hands-on experience with the technology.

Which insights would you like to share as a Text Frontier guest blogger?

Post a Comment