The Text Frontier is consolidating into SAS Voices

In life, change is the only constant and with that I'd like to add the final post to The Text Frontier. The Text Frontier is not going away. It is simply being consolidated with other SAS blogs to better serve our community. The Text Frontier will now be a part of SAS Voices.

Consolidating into SAS Voices will allow our community to go to one source for all analytics related to SAS. This provides a larger community with a broader set of topics beyond text mining and text analytics. SAS Voices will be your source for information on text mining, text analytics, unstructured data analysis and much more.

Please update your bookmarks and RSS feeds with these links to make sure you don't miss future posts:

Thank you for your continued support of Text Frontier and we look forward to growing the community in SAS Voices!


Post a Comment

Sentiment analysis, machine learning open up world of possibilities

The consumer sentiment analysis of this one's pretty easy, but will they be compensated?

The consumer sentiment analysis of this one's pretty easy, but will they be compensated?

When a person feels sufficiently wronged to lodge a complaint with the Consumer Financial Protection Bureau (CFPB), there’s likely to be some negative sentiment involved. But is there a connection between the language they use and the likelihood they will be compensated by the offending company?

At the upcoming Sentiment Analysis Symposium, I will discuss how machine learning and rule-based sentiment analysis can support each other in a complementary analysis, and produce actionable information from large amounts of free form text. In this case, machine learning and sentiment analysis could improve and evolve the CFPB’s ability to assess consumer complaints.

This is accomplished by identifying patterns between degrees of negative sentiment expressed in free-form consumer complaints. A model which generates rules based on this free-form text, where the related companies ended up paying out compensation as a result of the complaint. These machine-generated rules indicate patterns in the free-form text which tend to only be present in the cases of monetary compensation.

Read More »

Post a Comment

Could text analytics be the cutting-edge technology the oil and gas industry was waiting for?

Oil1Small causes can have large effects; or how a discovery in the Barnett Shale can spike some interest in the rest of the world and change the face of the industry.

This article is co-written by Sylvie Jacquet-Faucillon, Senior Analytics Presales Consultant, SAS France; and David Dozoul, Senior Adviser for Oil and Gas, SAS Global Energy Practice

Urgent need for digital transformation in the oil and gas industry

Demand for oil and natural gas is constantly growing, and the success of providers in meeting this demand is driven by both technological expertise and innovation capabilities. Innovation is also at the core of SAS.

With the recent turmoil in the energy industry, oil and gas companies must overcome some key challenges – price declines, strong competition, and resource replacement – while dealing with geopolitical contexts. The industry, therefore, needs to embrace new cutting-edge technologies to remain competitive and profitable.

Oil2Exploration and production projects are always large-scale, expensive and complex undertakings with long timelines. Collaboration drives innovation in the oil and gas sector to share exploratory costs and to manage risks. One of the biggest challenges is to define the exploration strategy. Applications extend to finding the best partners, identifying new trends (such as unconventional gas), and benchmarking active companies and their stakes. To stay ahead of competition and ahead of new ventures, organizations not only need to monitor trends, but also identify weak signals such as market movers and new exploration trends. It’s time for the oil and gas sector to address the big data challenge and exploit the wealth of unstructured data available: industry and government reports, local knowledge, geological studies, digital media, and so much more.

Currently, most companies manually monitor several petroleum announcements from online news providers. The volume and diversity of data sources available to collect, treat, analyze and process require an extensive number of resources (people and time) to process the information with traditional methods, and leaves little time to enhance and interpret the results for better decision making. But with text analytics, a new approach is now available to automate, optimize and industrialize these time-consuming manual tasks.

From oil refining to data refining: Text analytics is the answer

To leave no opportunity unexplored and manage the risk of the investment, the answer is a strong competitive strategy based on data-driven analytics. One innovative and differentiating component of this strategy should be SAS® Text Analytics, which automatically processes and analyzes large amounts of multiple text data sources to create relevant and accurate information for all upstream activities. The two key components used are:

  •  SAS Crawler, which automatically crawls all relevant key online news providers, including secured sites.
  • SAS Contextual Analysis, which combines the benefits of automated natural language processing and machine learning, enriched with human subject-matter expertise.

Methodology workflow


SAS Text Analytics methodology

With SAS Contextual Analysis, oil and gas providers can easily customize the default taxonomies, entities and concepts provided with the software to better address the requirements of their specific industry. Some examples are:

  • Extraction of concepts that gather critical business concepts from news feeds (company, basin, block, countries and well names) and references to oil and gas terms like stakes, hydrocarbon types, volume of the discovery, height of the gas column, water depth, etc.
    Company B has confirmed a gas-cond discovery in XX block, Water Depth 38m. Discovery resources are pegged at 250-350 MMboe in-place, and the well tested 10.6 MMcfg/d and assoc. cond. from a 240m hc column in the L. Cretaceous pre-salt…”
    The transfers of participations or awarded winners in a bid process could be automatically detected by extracting facts and events from text:
    Company C has agreed to acquire Company D's 13.058% interest in in block C in exchange for $15,000 cash payment (US$10,814) …”
  • Categorization of news among several business categories: discoveries, negative drilling results (subeconomic, dry well), changes in permits (join ventures, mergers and acquisitions, farm-in, farm-out, award). How does this work? We either define linguistic rules to classify documents or leverage the SAS Text Analytics rules generator capability. Our unique rules generator enables “active learning” from already-classified news (some petroleum news providers deliver news with some metadata) and generates the associated linguistic rules.
  • Discover the unexpected: SAS algorithms drive automatic topic discovery, so you could go much further in analysis by uncovering upcoming trends, new topics or weak signals.
  • SAS also provides the ability to integrate oil and gas internal and external repositories (for instance: wells, blocks, companies and basins definitions), remove the duplicates, and clean the information to get consistent data, which is a main prerequisite to obtain reliable results. The data quality step will ensure the proper data enrichment and eliminates the time-consuming tasks to match journalistic type of information with industry-confirmed information.
  • Exploration, reporting and industrialization: As an integrated solution, SAS enables processing automation from data crawling to text analytics and data visualization. Time and resources are saved by automating the time-consuming tasks of reading news, manually classifying it and building dedicated reports. Subject-matter experts can focus on detailed analysis and extend coverage of investigation to monitor more companies or additional topics (planed wells, seismic information, bid round analysis).

Did you say ‘challenges’?

The software also addresses common challenges of text analytics projects that are not necessarily specific to the oil and gas industry:

  • High volume of rules: Oil and gas references to blocks, basins, wells and companies (with subcompanies) result in several million classifier rules that are updated weekly in SAS Contextual Analysis.
  • Advanced disambiguation process with accurate concept rules: What happens when the basin name is also a country name? When an oil and gas company name is also a usual business term? How do you map with geospatial data and get an accurate data visualization if your contextual extraction is not quite perfect?
  • Advanced predicate rules extract key and relevant information such as proper stakes changes, awarded companies and bid runners from articles.
  • A strong data management and data quality foundation is needed to match and link entities (wells, companies, blocks, basins) to each other, even though each journalistic source has its own way to write the same information (e.g., a well name differs slightly between two sources: ‘-‘ replaced by ‘_ ‘ or by a space).
  • Strong subject-matter expertise integration through accurate and manageable rules.

Text analytics in oil and gas: The art of the possible

The digital transformation makes it mandatory for the leaders in the oil and gas industry to compete on advanced analytics and data management proficiency. Oil and gas companies can take advantage of SAS software’s advanced text analytics, data management and data visualization capabilities to identify useful insights that can be used to create better outcomes through smarter decisions.

The stakes are huge with current oil prices. The cost of exploration with standard discovery rates pushed the industry to reinvent itself and identify more agile and smarter ways to select projects in which to invest. Indeed, learning from experience is a golden rule that most of us are accustomed to in the industry. Actually, learning from competitors’ experiences, easily monitoring crucial trends in the market, and cross-checking that information against internal expertise enables organizations to better understand important market movements and identify opportunities ahead of the competition.

Opportunities can come in different ways, such as deciding to take stakes in a promising exploration block, taking over a competitor or exploring new reservoir types. Failure to identify these opportunities in time often leads to shifts in market shares, delays to first oil or in uncertain reserve replacement.

Text analytics capabilities can also address a wide range of other applications in the oil and gas sector, in addition to the competitive intelligence case studies discussed in this article:

  • Patent analysis.
  • Operational and maintenance optimization.Oil4
  • Warranty claims analysis, root cause analysis on logs, call center notes.
  • Consumer sentiment analysis: Learn how the customers felt about the products they use (call centers, survey feedbacks, online forums).
  • Procurement analysis (bids, competition, supply chain contracts).
  • Health safety environment reports analysis.

Should you have any questions, feel free to contact us at or

Post a Comment

April Fools' Text-oku

April 1st is known as April Fools’ Day.

We could have chosen to celebrate a number of events that happened on this date. In the U.S. alone, it could have been the creation of the “$” symbol (1778), the marketing of the first dishwashing machine (1889), the first U.S. national women's swimming championships (1916), and many more. But instead, we choose to celebrate – pranks. Pranks have international appeal.

Since I am not a prankster, I will take this opportunity to lighten the tone of our blog with Sudoku. Sudoku has international appeal too but it is (usually) harmless. Read More »

Post a Comment

Sentence-based Text Mining and SAS Global Forum

In an upcoming paper for SAS Global Forum, several of us from the SAS Text Analytics team explore shifting the context of our underlying representation from documents to the sentences that are within the documents. We then look at how this shift can allow us to answer new text mining questions and explore the collection from a different angle. Our approach that I will explain below uses SAS Text Miner by analyzing sentences as "documents".

Our motivation for sentence-based analysis stems from the challenges that long documents present to fine-grained analysis of unstructured data. Text Miner uses the vector space model where a document is represented as a quantitative vector of the weighted frequencies of the terms in the collection. These vectors often have a size in the hundreds of thousands because there is an entry for every kept term in the collection. The vectors are also very sparse because most documents contain only a small subset of those terms. A diagram of the document vectors of this form is shown below. In the diagram there are documents each prefixed with the letter "d" and terms each prefixed with the letter "t". The filled-in squares indicate where a term existed in the given document.


Diagram of sparse document vectors.

Several nodes in SAS Text Miner then make use of the singular value decomposition (SVD) to create new dense document vectors with far fewer latent factors than the number of terms, usually only 50 to 100. Not only that, but the factorization also provides a dense, latent vector representation for each distinct term in the collection.  The results of the factorization are shown below.  The latent factor dimensions are prefixed with an "f" and there are k of them.


The result of the SVD factorization. Document representations are on the left-hand side and term representations are on the right-hand side.


The term representations on the right hand side above become particularly useful because any new document, regardless of whether it contains a single term or thousands of terms, can be mapped into the same space as the training data.

The  approach works well for detecting the main themes and topics in a collection but subtler aspects of the text can be lost. This is because the original representation in the first figure throws away information that is contained in the text. In that representation the number of times the term occurs is maintained but not the order of occurrence. The longer the document, the more significant this loss of information can be.

The approach in our paper simply replaces each document with its set of sentences and creates a vector for each sentence and then text mining is done on the sentences. Notice that the number of rows in the diagram below is now much larger. The first index on the letter "d" is the document ID and the second index indicates the sentence within the document. I assume each document contains exactly 3 sentences to ease the notation.


Diagram of sparse sentence vectors.

Then the SVD can form a reduced representation for each sentence and each term as shown below.  Note that the term representation on the right-hand side has the same dimensions as before. Only the values in the matrix have changed.


The result of the SVD factorization of sentences. Sentence representations are on the left-hand side and term representations are on the right-hand side.

So, while the representation above still neglects the order that the terms occur, because rows are sentence-based, the values in the matrices above reflect the influence of local terms within a sentence and not across a document. Typically, a model based on this sentence-based representation will be more refined than one based on a document-level representation. In short, it can make your analysis more effective, particularly with documents that are longer than a sentence or two.

If you are interested in trying out a sentence analysis for your own text mining, you can contact me and I can send you SAS code to convert your document data set to sentences. Also, as soon as our paper is made available, I will add a link to it here. In the meantime, SAS Global Forum is less than a month away!  I hope to see you there.

Post a Comment

Voice of customer analysis (Part 2)

voice of the customer analysisThis is the second article about voice of customer analysis; you can find the first here . The first time we discussed that a simple sentiment polarity score was a rather a narrow view. This time we will examine a more insightful approach, using voice of customer analysis to monitor customers’ opinions and understand the issues they are raising. It helps if you are clear on your brand’s priorities from a customer experience perspective. For example, if we compare priorities for two online retail brands:

  • For Brand A, it might be customer service, followed by product quality and then price.
  • For Brand B, it might be price, followed by product range and the ease of ordering/returns.

Many of our customers who have embraced an NPS (Net Promotor Score) program have already identified the relative importance of the different drivers of customer satisfaction. For voice of customer analysis these can be calculated from NPS survey results, using SAS/IML® (interactive matrix programming language) to implement an approach like a Shapley Value regression, to understand the relative importance to customers of different product and service features. Our customers can then use a weighted scorecard, based upon these relative importance feature dimensions, to assess feature mentions in customer communications.

SAS® Text Analytics is then used to consistently track these opinion/feature mentions. The customers’ opinions can be thought of as having five dimensions.

The first three can be found in the text:

  1. Product/brand.screenshot 2
  2. Feature/attribute.
  3. Sentiment positive, negate ve  or neutral.

The other two elements are typically available as structured information relating to the source document:

  1. Opinion holder ID (ideally a known customer, but if not, a social media source/ID or unique survey ID).
  2. Date and time of the opinion (for example, the date the survey was completed, or for a complaint, the date of the complaint and the incident itself).


To achieve the best accuracy with voice of customer analysis, the brand/feature/ sentiment mentions need to be assessed in context, rather than just looking at isolated words or phrases. Consider some of the more sophisticated ways your customers use language when expressing their sentiment:

  • Negated sentiment, for example, “the movie wasn’t good .”
  • Submodifiers – this is where adverbs are used in front of adjectives (or other adverbs). Sometimes these increase the polarity of a positive/negative (e.g., “the dress was very pretty” or the “the food was really salty”). At other times, the adjective may be neutral, but the submodifier makes it negative, for example “the dress was too purple” or “the food was incredibly spicy.”
  • Whether words are positive or negative will sometimes be specific to the product feature. “Fast” might be considered a positive word in the context of a “fast hotel check-in,” but negative if the comment is “it was a very fast spa session.”
  • With web chat or call center interactions where there is a conversation with the customer, you might consider customers’ opinions at the end of the conversation. They might start with several negative utterances, but if at the end they are positive, you might wish to also consider this in the weighing. This decision depends on your objective. The negative expressions about service features might be more useful from a root cause and resolution perspective.
  • Consider how to handle irrelevant positive and negative opinions. For example:

I love my Apple iPhone; however, I'm angry that ABC Bank doesn't offer Apple Pay yet, and then Airline XYZ lost my phone when I left it at the gate in the airport.

This is positive if you are Apple and negative if you are ABC Bank, but what about if you are the XYZ airline, which isn't actually responsible for the gate at the airport, but the customer thinks it is?

  • Sarcasm can be handled. However, there is a trade-off between accuracy and the effort required to ensure sarcasm is correctly handled. It’s often best to focus on the most common examples, such as positives relating to negative concepts (e.g., “I was delighted to spend my lunch break in the queue in your bank”) or wordplay on brand names.

MS_VOC2_PIC2SAS Text Analytics allows you to consider the language in context. It is worth highlighting, however, that analyzing text to assess sentiment is often subjective, even for a human with considerable knowledge of the subject area. Automated voice of customer analysis using SAS Text Analytics can achieve high levels of accuracy – in excess of 90 percent – but there is a trade-off to be considered between accuracy, the time to conduct the analysis and the value of the improved accuracy of the score.

The customer experience opinions weighted by our brand’s priorities are now available as an output from SAS Text Analytics and can be benchmarked, visualized and explored within SAS Visual Analytics. Christina Engelhardt showed a number of good examples of this within her recent “Come chat with us!” post.

screenshot 3

Ultimately, the consistent measurement of the voice of customer to understand customer opinions can have significant benefits. Monitoring customer satisfaction can improve both business performance (through a weighted sentiment scorecard) and the understanding of customer opinions at a product and feature level. This can be used to:

  • Improve handling of customer interactions (e.g., web chat responses), to train staff on customer preferences and improve resolution/call times.
  • Continually improve through understanding problems and the root causes of service and product quality issues.
  • Provide feedback to product and service design.

We would love to hear from you about your experiences with voice of customer applications in your organization.

Post a Comment

SAS® Text Analytics and Text Mining in Action: Experiences From a ‘Self-Trial’ With SAS® Contextual Analysis

GS_1Don’t get me wrong. I have no doubt in the capabilities of our SAS products and SAS  solutions! But I wanted to get a firsthand experience of our new solution for text analytics, SAS Contextual Analysis 14.1. And the result is very convincing!

But let’s start from the beginning.

Functions and capabilities of SAS® Contextual Analysis

If you take a look at the product description of SAS Contextual Analysis, you learn that you can use it to analyze large collections of text documents, identify sentiments, and create robust models to categorize and extract content. This allows you to automatically identify topics in your document collections and define categories and rules in natural language to assign documents to these categories.

The self-trial: Text analytics with my two SAS® Press books

To betteDPFADQFAr understand the processes and the outcome of text analytics with SAS Contextual Analysis, I used a document collection that is close to my heart and that I know in great detail: the 59 chapters of my two SAS Press books, Data Preparation for Analytics Using SAS and Data Quality for Analytics Using SAS.

Sure, the small number of 59 documents is not really a “big data problem,” and the SAS In-Memory Analytics engine can also deal with millions of documents. However, I was interested to see whether SAS Contextual Analysis can identify topics in my book chapters and which book chapters should be combined into the same cluster. And no a priori knowledge from me as an author would be used for the categorization.

Text analytics processing with SAS® Contextual Analysis

From a data mining point of view, we are dealing here with a typical unsupervised analysis. Just the data are presented to the analytic tool, and no


Illustration of underlying topics in the documents

additional information of segment assignment is available. SAS Contextual Analysis imports the data, one file per chapter, from a folder on my hard disk and runs through the entire process of text analytics:

  • Document parsing and assigning the words to different entities (noun, verb, etc.).
  • Synonym detection and the application of stop lists to remove redundant words like “the,” “and,” “of,” “with,” “we,” etc.
  • The weighting of the terms and the identification of those terms that are important to define groups of documents.
  • Automatic detection of underlying topics in the documents.

  It works! Eight clearly separated document clusters as a result

For better illustration, I have used weights of the automatically detected topics for each of the 59 documents to cluster them with SAS® Enterprise Miner™. Eight clusters were automatically detected, which are presented in the table below.

For better visualization, the chapters of the “Data Quality Book” are shown in green and the chapters of the “Data Preparation Book” are shown in yellow.

You can easily see how the chapters grouped to clusters based on content. Some clusters only contain chapters from one book:

  • Cluster 1 contains those chapters from the Data Quality Book that deal with the topic of missing values.
  • Cluster 7 contains the simulation studies that are described in chapters 15-23 of the Data Quality Book.
Clustering with SAS Text Analytics

SAS Text Analytics automatically detected 8 clusters in the 2 books








Some clusters contain chapters from both books:

  • Cluster 8 contains chapters from the Data Preparation Book that deal with analytics data mart structures. And Appendix E in the Data Quality Book is a summary of the content of these chapters. This is an impressive example of documents only grouped based on their content. And chapter content that is considered to be “close” or “similar” is truly detected as such.

The different number of documents per cluster also show that no fixed clustering scheme is used here, but that the document content defines how the groups are set up and how they are populated.

  • Cluster 4 only contains a single chapter. This chapter is an introduction to a collection of case studies and obviously does not compare with other chapters in the books.

Moving on to new business cases

These results convinced me even more that SAS Contextual Analysis allows you to gain insight into your document collections. You learn what your customers think and write about your company or organization. You see the topics that are contained in your documents and how you can automatically group them without having to read every single document.

I also presented a paper (in German) on this case study on the Austrian SAS User Conference “SAS Club” in November 2015.

Post a Comment

Voice of the customer analysis (Part 1)

VOC1This is the first of two articles looking at how to listen to what your customers are saying and act upon it – that is, how to understand the voice of the customer. Over the last few years, one of the big uses  for SAS® Text Analytics has been to identify consumers’ perceptions and attitudes from the language they use. This is commonly called sentiment analysis, opinion mining or voice of the customer analysis.

Voice of the customer analysis can have significant value for organizations looking to listen to and understand the customer’s “voice” (e.g., from surveys, social media, complaints or web chat) to improve operations and help direct strategy. This approach can, ultimately, help improve customer satisfaction, net promoter score (NPS) and loyalty while reducing churn and dormancy, thus increasing revenues.

There are however a number of challenges to doing voice of the customer analysis well, especially as the focus often seems to be the “sentiment score.” This binary polarity of the positive/negative score is limiting and has a number of challenges:

  • Customers have different personalities and emotions and communicate in very different ways. Should “amazing” mean a higher sentiment score than “good”? Should someone who swears or uses sarcasm get a more negative sentiment score than someone who says “I was rather disappointed you let me down”? The language a person uses is as much about personality as it is sentiment.
  • The polarity of the sentiment score is often too simplistic; it does not consider your businesses objectives and the brand experience you are aiming to deliver. Rather than producing a score, it’s often more useful to assess what people think of your brand’s products and services and their features. For example, for an airline, features might be check-in, on-board service, price, quality of food or timeliness. These opinions can then be weighted by your brand’s priorities if you want an overall score.
  • Ultimately thought needs to be given to how the sentiment score will be used. Is it for reporting, or could it be used for alerts when it rises or falls? The top-level score on its own is not particularly useful. If a benchmark is the objective, a net promoter score (NPS) may be a more useful metric.

With the voice of the customer analysis in hand, we can then focus on gaining insight into the root causes of satisfaction and dissatisfaction, for example, improved feedback on product design or refined handling of customer interactions.

voice of the customer

Frequency of positive and negative feature mentions, color-coded by the average NPS

A simple example of this is shown in the screenshot, which shows frequency of positive  
and negative feature mentions, color-coded by the average NPS. This shows a positive check-in experience is associated with higher NPS, and although customers are negative about food, this doesn't seem to effect customer NPS scores as much as delays or lost baggage.

Further analysis of feature sentiment can identify the causes of the low satisfaction. For example, poor check-in experience may be caused by perceived poor service quality; queuing time; design of automated check-in machines; problems with baggage allowances; or time for first-class passengers to get to the lounge.

For voice of the customer analysis, your source documents could include surveys (e.g., NPS), complaints, web chat or social media. It’s often worth starting with the internal sources, like surveys and complaints, as these can be directly attributed to a known customer. This will mean that you can also use structured data about these customers and their behavior in your text analysis; the combination will be more powerful than just the text alone. Even in an anonymous NPS survey, the structured questions will help with assessing the causes of dissatisfaction.

Whatever the document source, as part of the definition stage it’s worth reading a small sample of documents. This will give you an initial impression of how the documents fit with your analysis objective. At this stage you should just be aiming to assess how language is used. For example, how long are the documents? Are the documents written by customers or employees? Is the language formal or informal? voice of the customer Do they contain much sarcasm? How concise (or verbose) have the authors been? What’s the quality of spelling like? How technical is the language? Are many abbreviations used? If speech to text technology (or web chat) has been used, does the text differentiate between speakers?

So we have started to define our problem. Next time we’ll explore an approach to voice of the customer analysis that moves beyond the rather narrow view of sentiment polarity and focuses on listening to the voice of the customer so you can make decisions that will improve the customer experience.

Post a Comment

Cognitive Computing - Part 1

Is cognitive computing an application of text mining?

If you have asked this question, you are not alone. In fact, lately I have heard it quite often. So what is cognitive computing, really? A cognitive computing system, as stated by Dr. John E. Kelly III, is one that has the ability to be trained to “… learn at scale, reason with purpose, and interact with humans naturally.”

What does that mean exactly? Perhaps the best known example is IBM’s Watson winning Jeopardy!. Watson was able to understand the questions asked by a human, with all of the intonations, puns and expressions inherent in the asking of the questions and the questions themselves; search for the appropriate answer; give its answer a confidence that would determine whether Watson would “buzz” in; and then provide the answer. Was it perfect? Of course not; just as there are no human experts who truly know everything in their fields, no cognitive computing system is perfect either. Every person, even the most knowledgeable expert, sometimes has to say, “I don’t know,” or “I think xyz, but I’m not completely sure.” The degree of certainty that a human has with knowledge is hard to quantify, and is based mostly on a gut feeling, but the degree of certainty that a cognitive computing system has for any answer is a confidence level. This can be thought of as a score that corresponds to the quality of the decision after the system has evaluated all of its options and information. Based on that number, the machine can decide whether to answer a question, or what the best answer is for a given question.

CC Blog Part 1 Image

The second part of cognitive computing systems is that these systems are able to learn. This promises great things for the future, because not only can everyone interact with the system without having to learn a specific language or a specific interface, but also the machine learns how to interact better with the user over time. How does such a system learn? Typically after it is fed information, usually in large amounts, a cognitive computing system receives inputs and outcome sets. The inputs could include textual data, images, videos, speech, numbers, IoT data, etc. Next, as a human interacts with the system, grading the system’s responses and their accuracy levels, and providing feedback , the system refines its internal training model. To me, it’s analogous to a human studying for a test in a new subject, and using tools such as textbooks and practice exams (Q&A, essentially) as study aides, and perhaps also working with a tutor.

The third exciting component is that cognitive systems take advantage of what computers do really well. They can access and process enormous amounts of data very fast, and the data does not have to be specifically prepared for the processing. Cognitive systems are built to handle unstructured data; this extends the sources of information for these systems far beyond the realm of the traditional databases. The majority of data in the world is unstructured data; this makes sense because unstructured data is the natural way humans communicate with each other. After all, structured data is an artificial construct that was created to be able to apply analytical concepts to better understand the world. Given that unstructured data is the majority of the information out in the world, and if unstructured data will continue to expand in volume at high rates, and given a cognitive computing system can “read” and digest this huge corpus of information, the result can be powerful. This corpus of information even includes the system being able to “see” images in addition to its ability to “read” text and “hear” spoken language. Amazing! This addresses many of the challenges in text mining today by taking advantage of information that is often largely ignored.

So is Siri® a cognitive computing system?

Not quite, but it’s a start. What makes cognitive computing the next era of computing is that it is not a programmed compute system, the way personal computers, tablets, smartphones, and other gadgets are. It’s hard not to love Siri (after all, it can be fun to ask Siri such questions as “what is zero divided by zero?” or “what is the meaning of life?” – if you haven’t already asked one of these questions, take a break right now and try it), but Siri has been highly programmed, and provides scripted answers (I’m sure Apple employees had a lot of fun with this part of the development of Siri). It can learn a little from you, and most likely it will continue to evolve to learn even more from each individual user, but it is not truly a cognitive computing system. Some say it can be categorized as a “cognitive embedded capability,” which is important for cognitive computing. Any development efforts in the world of cognitive computing need to be able to be embedded in the technology and applications that users know and love, such as Siri being a part of the iPhone®. So while Siri has some capabilities that are on the road to a cognitive computing system, its emphasis is on its programmed functionality rather than on what it learns from a user. Instead of assessing information from multiple sources (including what it has been taught), offering hypotheses, and allotting confidence to potential answers, Siri is mostly programmed with canned responses or matches from its available information. After all, Watson beat former Jeopardy! champions, while Siri often cannot even find the nearby location to which I request directions.

Given all of this, what exactly can cognitive computing systems do today?

Stay tuned: This will be discussed in Part 2. Thanks for reading, and let us know your thoughts on cognitive computing. Will it change the world?

Post a Comment

To data scientists and beyond! One of many applications of text analytics

Hi, there! First of all, let me introduce myself, as this is my first blog. I am Simran Bagga, and three weeks ago I became the Product Manager for Text Analytics at SAS. This role might be new to me, but text analytics is not. For the past 12 years, I have helped customers in government, health care, and small businesses understand the value and application of text analytics to enhance their existing business processes. From simple questions to complex application requirements, I have heard it all. And I have seen the field evolve over the years from many different perspectives.

Organizations in every industry have realized the potential of tapping into unstructured text and are embracing the power of this capability at a rapid rate. They want to leverage both internal and external information to solve a variety of different problems. The applications of text analytics are many, from enhancing the customer experience to gaining efficiencies in solving criminal investigations.

One of the misconceptions I often see is the expectation that it takes a data scientist, or at least an advanced degree in analytics, to work with text analytics products. That is not the case. If you can type a search into a Google toolbar, you can get value from text analytics.

An investigative agency recently asked me, “Our focus is closing cases quickly by connecting the dots and finding linkages between incidents. Can SAS help our crime analysts be more effective so they can drill into the incident narratives like a traditional business intelligence application and find that needle in the haystack?”

I love it when customers ask such easy questions. The answer is, “Absolutely.”

I am really excited about a SAS cloud-based offering that many people might not be aware of: the SAS® Text Exploration Framework. This provides an easy, search-based interface for all relevant information, presented in a compelling and visual way, to virtually any question you want to ask of the data.

SAS Text Exploration Framework - Search and Explore


The crime analysts desired the ability for free text search – a search that gave them smart results. Rather than sifting through hundreds of documents that had the term “firearm theft,” for example, they wanted to be prompted for the different crime areas and incident types where those terms were found. The SAS Text Exploration Framework allows them do exactly that and more. They can focus on incidents that occurred within certain time frames, locations, etc. – and ones that are associated with specific crime categories or subcategories – so they can identify linkages across incidents. Terms of interest are highlighted so they can see where they are referenced and explore the data graphically.

The excitement in the room was palpable when the users saw this application of text analytics. A question came up: Can we actually visualize links between entities (criminals, gangs, weapons, etc.) and incidents with the ability to drill down to identify social networks, gang-related crimes, and so forth? The social network capability within the framework supports this type of visualization nicely:

SAS Text Exploration Framework - Social Network Analysis


Various user personas – analysts (data scientists), business analysts, computer scientists, decision makers – want to use analytics to make data-driven decisions, but in different ways. The challenge is how to appeal to all these personas and meet their expectations with individualized text analytics applications, from data-driven algorithms and business rules to applications solving specific problems and cognitive computing.

SAS certainly has the technology and expertise to meet each of these needs. But I am here to understand your needs, what matters to you, and to represent the voice of the customer: your voice – an aspect of my new role that I truly enjoy and am excited about. So don’t hesitate to reach out.

Post a Comment
  • About this blog

    At Text Frontier we want to discuss all things related to unstructured data analysis. We spotlight text analytics and text mining best practices, trends, news, and much more.

    Join SAS community thought leaders and see how we will take unstructured data analysis to the next frontier!

  • Subscribe to this blog

    Enter your email address:

    Other subscription options

  • Archives