Friday, June 26. 2009Travels to Paris and Copenhagen this week!
SAS is sending data & text mining experts (including Teragram employees) over the ocean to Europe for two different events this week. We'll have a booth in the exhibit hall at KDD09 Sunday through Wednesday. If you are one of the lucky ones attending KDD, mark your program to attend the panel discussion to listen to Dr Wayne Thompson from SAS talk about Emerging Trends in Open Standards and Cloud Computing for Data Mining . ![]() Even if you don't make it to the KDD conference to personally pick up the new book authored by the conference chair John Elder , you can experience our Software on Demand version of data mining by buying his book on Amazon here The second event where you can find us is at the SAS conference devoted to ANALYTICS called A2009 in Denmark July 1,2 . The program is online and there you can read the abstract about the success story by a Swedish Insurance firm who studied hand written notes collected by police officers and security guards during 2004-2007. At both shows, you'll be able to see live demos of our software and pick up a hardcopy of the most recent fact sheet - highlighting the enhancements that are now available with the TEXT MINER 4.1 version that was made available to customers 5 weeks ago. Those of you reading this blog that haven't yet seen our SAS 9.2 release of Text Miner can download the fact sheet here. ![]() What does your summer hold for you? Do you have travel plans to shows or conferences with text analytics tracks or sessions included? Please add a comment to this blog and do share! Friday, June 19. 2009IDG asks 131 executives about their IT spend priorities for 2009
A recent survey by IDG Research Services, highlights Business Process Automation as an IT priority.Some of the findings include: • More than 2/3 of respondents are automating most of their core business processes • Another 21% are moving towards this goal • 87% consider BPA to be a critical or important IT priority • 87% see a connection between unified communications and process automation, • More than one third envision communication technology being incorporated into BPA in the future Even though I have not spoken with Joe Staples and Brad Herrington from "Interactive Intelligence", I share their observation that many in today’s economic environment, are trying to streamline operations and do more with less. As organizations seek ways to be more efficient, both in the front office and back office, we might position our technology as a tool for automating business processes leading to improved business results. Have any of you motivated your IT department to spend $$ on Textual Analytic software or recruit support for your research program with this approach? Its rare for BPA companies to include automating manual processes surrounding words or unstructured content via TEXT Technologies. After I watch the webcast on June 25 and get the white paper - I'll let you know if any mention of Natural Language processing or Content Categorization or Sentiment Analysis is made. Meanwhile, it's up to all of us to continue to promote awareness and implement Text Analytics into real world situations. We aren't talking about a dream of some vague emerging futuristic possibility , the time is now to include text communication in with traditional data sources of computer processing applications. When one combines text analytics with mathematical optimization and predictive analytics, we can go well beyond merely automating business processes by improving and discovering entirely new processes leading to a sustainable future. Thanks for reading. Wednesday, June 17. 2009Text Speak
I just posted a tweet to my @ManyaMayes Twitter account. In order to get my message across, in 140 characters or less, I had to shorten my text. This is a very common practise for mobile phone users who send text messages that look a lot like a foreign language. My Mum writes messages that are so clipped that I have trouble deciphering them! As a BlackBerry user, I send email messages but I rarely send SMS messages. I've spent many years making sure I write messages that are easy for audiences to understand. It's going to take me a while to get used to writing clipped text (writing in text speak) as part of my job. It goes against much of my professional training to write like this: u no wot u no & u don't no wot u don'tHow does text mining handle this? One approach would be to specify synonyms for these clipped terms: u = you no = know wot = what But "no" and "know" are both valid dictionary entries, so this will immediately cause a follow on problem since surely not all occurrences of "no" should be replaced with "know". Deciding which occurrences of "no" should be replaced with "know" is aided by using additional context of the document. Boolean and linguistic rules can help with this. It can be difficult to solve data quality problems like this and typically solutions are specific to both the data and the application. For example, the way you would replace R&R would depend on whether the data came from a forum for military personnel talking about upcoming "rest and relaxation" or whether it was a warranty report describing "repair and replace" for a defective part or other... Thursday, June 11. 2009Sentiment Analysis Overview
I saw the following comment on Twitter yesterday about sentiment analysis limitations and decided it would make a good topic for a blog update:@concannon: Can anybody explain to me why automated sentiment analysis is anything more than flaky, snake-oil BS? The technology just isn't ready yet. I’m going make a bold statement here – automated sentiment analysis using the right methodology – is actually superior to human sentiment analysis. Bear with me and read through. The available approaches to analyzing sentiment/satisfaction vary based on the data provided. I would categorize the approaches based on the availability of three types of data: 1. Customer feedback (free-form text) with customer ranked satisfaction (discrete value), like Amazon product reviews. 2. Customer feedback (free-form text) with manually ranked satisfaction (discrete value), where human readers subjectively score the content. 3. Customer feedback only, no ranked satisfaction, as with blog posts and comments For the first data type, machine learning algorithms do a good job of measuring overall sentiment (say, +ve/neutral/-ve). Examples of data suitable for this approach are: survey data and product review forums. The problem is that not a lot of text is gathered this way (with a purpose in mind). Even if it is, the machine learning algorithms struggle with distinguishing positive elements from negative. It's one thing to know if a customer is dissatisfied, it is another to know about what! Given no customer ranked satisfaction, it is possible to build a statistical model using a sample of manually ranked documents, then automatically score the remaining unranked documents. Not many companies are willing to do this. It also doesn't truly represent the customer’s opinion - just the reader’s interpretation of what the customer thinks. For the third option, customer opinion with no ranking, you can derive sentiment from the context of the text using natural language processing or NLP. This data is most common and hence so are the approaches to analyzing it. It’s not easy, but it’s the sweet spot for gain value from the massive volumes of consumer generated text. One widely available, cheap technology assigns an overall positive or negative sentiment based on assigning positive or negative values to individual words then summing them to get an overall sentiment rating. This approach fails in situations like the following: "It's not bad" (two negatives that actually suggest a positive) "I'm not going to say this sucks" (sarcasm or humor) “The keyboard is impossibly small but the display is the best I’ve seen.” (combination) The most recent advances in sentiment analysis technology use a combination of techniques: (1) statistics (2) rule-based definitions and (3) human intervention, e.g. a final review of the machine scoring. The results are less expensive than human-only sentiment analysis, but more consistent. Why? Because the automation adds consistency, while the human verifies the result. When put in the right workflow then it clearly increases scalability by a substantial factor. Teragram, a division of SAS, announced the Teragram Sentiment Analysis Manager at the Text Analytics Summit early June. More to come on that! The Phenomenon that is Twitter
I mentioned the buzz around Social Media Analysis (SMA) at the Text Analytics Summit. If we took all the speakers content and produced a tag cloud, Twitter would have the biggest 'floor space'. I don't think there was a single presentation that did NOT mention Twitter. While doing some background research for SMA, I ran across an article entitled State of the Twittersphere, that HubSpot blogged about just this week (that's @HubSpot for the 55.5% of Twitter users that don't follow anyone). There's a lot of really great Twitter usage statistics in this report. It's amazing how many people sign up with Twitter but are very inactive (I have multiple Twitter accounts and one is definitely contributing to inactivity). I'm more interested in those users that are very active. It would be good to connect with other users who post materials similar to my own (like a document recommendation system) and Text Mining can definitely help with this. I'd also like to see something like a “users who posted materials like this, also connected with these users:" - like the recommendations you get from Amazon. Ranking the tweets of users you follow based on content would also be fabulous. Some users post about both personal and business related materials. I personally prefer not to read the personal posts (sorry y'all). Having personal tweets, or topics less interesting to me appear further down the list (if at all) would be another desirable feature... I have a bunch of other recommendations for Twitter product management - as do many other Twitter users. How about using Text Analytics/Text Mining for managing product requirements... Wednesday, June 3. 2009Text Analytics Summit Review
I am back in my office after a thoroughly enjoyable time at the annual Text Analytics Summit in Boston. I have to admit I was in my element rubbing shoulders with thought leaders, end users, analysts and press.Jim Cox and I arrived Sunday afternoon to attend two preconference presentations: "Text Analytics for Dummies" by Conference Chair, Seth Grimes of Alta Plana, and a vendor comparison presentation by Nick Patience of technology industry analyst company, 451group. The themes dominating the conference were: sentiment analysis, social media analysis, social network analysis, voice of the customer, eDiscovery, Web search, visualization, SaaS and Cloud. We heard keynote presentations: “Discover and Drive Brand Activity in Social Networks” by Emmanuel Roche, Teragram and Jim Cox, SAS “A Tale of Two Search Engines – The Evolution of Search Technology and the Role of Social Networking in Marketing” – Usama Fayyad, Open Insights “Sentiment Analysis” – Bing Liu, University of Illinois We also saw end user case studies, analyst and end user panels, a Text Analytics Market Report by IDC, vendor presentations and a group of very active roundtable discussions. sentiment analysis. Key capabilities focused on product and feature level sentiment extraction. Sentiment is also considered a key component to Social Media Analysis. While many vendors play in the social media analysis space, not many vendors provide all the necessary capabilities on their own. Tracking social networks, reach, promoters, detractors, key influencers/key opinion leaders (KOL) and key themes/trends were put forth as valuable. Voice of the customer / customer feedback continued to play a key role of text input to text analytics models that look to find key issues being reported by customers. eDiscovery is probably the top text analytics application area at this year’s summit. Several law firms were represented and the ability to mine legal documents crucial. Web search in relation to advertising was shown to be very powerful due to the user indication of intent. Advertising based on Web search and user behavior improves click-through ratio (CTR) by an average of 652%! Also mentioned was the mammoth effort required to tag massive volumes of rapidly changing Web content. There are numerous Web sites who employ user bases to do this for them. The new look of Web search goes far beyond providing lists of documents. Document facets, snippets, images, sentiment and more can be derived from search results. Sue Feldman of IDC indicated the Text Analytics and Search market is moving in direct opposition to the current economic market. The analysts represented at the summit all agreed that visualization of huge volumes of text should be an area that all vendors pay more attention to. Other sentiments echoed by the analysts included the desirability of Software as a Service (SaaS) applications, and the overwhelming need (and analyst amazement) that Text Analytics vendors had not provided Cloud Computing yet. On the whole, conference goers imparted a great amount of valuable information. I will wrap up my commentary with these overheard statements: “Search doesn’t help you discover things you are unaware of.” “TA technology can solve problems we don’t even know about yet.” “Text analytics puts humanity into statistics.” (Thanks to Chris Bowman for that one!) “The most common search on Monster is: Find me a job!” (followed by another that Blog Administrator refuses to post) "Missing a piece of a puzzle is frustrating, can anyone spot the missing piece to my wardrobe?" [shoes] Additional conference commentary can be found on twitter.com #textsummit. My colleague Anne Milley also summarized Day 1 and Day 2 wrote about it on our sascom voices blog. Curt Monash, we missed you this year! SAS and Teragram would like to thank conference goers. It was a pleasure seeing you all!
Posted by Manya Mayes
in Manya Mayes
at
14:28
| Comments (0)
| Trackbacks (0)
Defined tags for this entry: conference, sentiment analysis, teragram, text mining, text mining summit
Tuesday, May 19. 2009Eweek story on Voice Mining with SAS
I wrote a piece for eWeek about the Voice of the Customer. In it, I talk about how conversational data collected in call centers is growing faster than our ability to deal with it. Those who don't want to miss insights buried in their data, can now turn to predictive modeling (data mining and text mining) to help them perform voice analytics. Armed with these emerging technologies you can decipher key messages from all the noise and really listen to what customers are saying. Those who learn quickly can respond first (before competitors do) and can deliver better service, better products resulting in happier customers! Component No. 3: Voice mining your own business Where have you seen these technologies implemented? do share! Saturday, May 9. 2009An Invitation to make your voice count!
If you are applying these technologies today - or are considering implementing Text Analytic into your organization in the near future - we invite you to take a few moments and take a survey here. As Manya and others have stated , interest in this field is indeed growing, however there remain many unanswered challenges for our R&D groups to pursue. With your inputs here you can help craft the direction of the next enhancements and guide future application direction. This is an opportunity for all of you out there to share your Perceptions & Plans for text analytics. Seth Grimes' text-analytics survey will close tommorrow - May 10. He'll write up his findings on how organizations are dealing with unstructured sources and the role text mining/analytics plays as a free report, available in early June. The survey will take you 5-10 minutes. Thanks for responding! PS - new members are welcome to the YAHOO group on text analytics. read about and join us here http://tech.groups.yahoo.com/group/TextAnalytics/ Tuesday, May 5. 2009Text analytics sales on the up
BI Network columnist Seth Grimes says 2008 global text analytic sales exceeded $350 million and expected 2009 growth is at least 25 percent, with SAS one of the large players in this specific technology segment: “Market Outlook for Text Analytics”
Wednesday, April 22. 2009The changing face and pace of text mining
For a rather long time I have been talking of the convergence of text-related technologies such as search, text mining, text analytics, machine learning, voice analysis, video mining, enterprise content management (ECM), business intelligence (BI) and business analytics (BA) etc. The industry continues to change with the merger of three text analytics companies into one this week. To me, this merger serves to validate SAS' direction in the unstructured space where our strategy is to take unstructured data right across the platform so organizations can have access to the full depth and breadth of SAS capabilities with a complete range of tools, products and solutions. Some day users will not consider text to be any different from standard structured database fields. Analytic applications will automatically roll up text and other unstructured information. IT departments and Business Reporting users no longer need be restricted to partial views on limited data. Data sources in the future can be gathered from Tweets, emails, dynamic web 2.0 sources and then integrated with the traditional IT data warehouses before they are cleansed and analyzed rigorously -- resulting in better decisions and greater impacts. SAS is ready to assist you in this exciting journey – and we applaud those who see the necessity of integration across the IT Storage, Analytics , reporting and line of business users. Tuesday, March 24. 2009Invite to DM radio and Boston TM summit
Tune into the March 26 episode of DM Radio at 3 p.m. Eastern Time (U.S.) as editors from Information Management magazine (formerly DM Review) talk to several experts about the power of analytics. The live Internet radio broadcast will include SAS strategist Tammi Kay George, Forrester Analyst Boris Evelson and SPSS vice president of Customer Analytics, Colin Shearer. Together they'll answer the following questions: Why Analytics are bolstering BI across the board ; How the Cloud can make analytics very affordable ;Why text analytics can be crucial for BI and customer relationship management Forrester Research predicts that the BI industry will grow to nearly $13 billion by 2014, despite the current economic downturn. This is due in part because of the power and popularity of analytical software, including text analytics, data mining and predictive analytics.While that statement comes as no surprise to readers of this blog, I find it exciting that others in the IT and BI world are finally catching on. That's why I'm recommending this short DM Radio session -- so you can invite YOUR manager to listen to this broadcast either live (with opportunity for Q&A) or as an archived recording later. Visit the Information Management site to register and find additional information. Hot news flash - SAS just signed on as sponsors to the Boston TM summit this June. For those readers who have experience with SAS Text Miner, I invite you to contact me in the next week or two. I may be able to reserve a speaking slot on the program agenda for YOU-with SAS helping pick up your registration and travel expenses. Come see first hand what Teragram and SAS are sharing June 1st and June 2nd. Saturday, March 21. 2009SAS Global Forum: ready, set, GO!
We are looking forward to interacting with those of you that make the annual pilgrimage next week to SAS Global Forum 2009, this year in Washington, DC. Personal preparations here on SAS Campus in Cary this week have included completely removing and reinstalling SAS on my laptop. I'm more than excited to have the opportunity to interface with SAS users and highlight the additional capabilities Teragram technologies are giving our SAS analytics offerings. In addition, there are a number of SAS Text Miner/SAS Content Categorization presentations for your viewing pleasure. SGFtextanalyticstalks.pdf The Teragram booth will be situated right beside the SAS Text Miner booth, enter the exhibit area and turn left -- we are looking forward to seeing you! And don't forget, if you Twitter (as I do) follow me at @manyamayes. To follow a much wider conversation during the conference, use the hash tag #SGF09. See you there!
Posted by Manya Mayes
in Manya Mayes
at
14:18
| Comments (0)
| Trackbacks (0)
Defined tags for this entry: teragram; sas global forum; twitter; demo;
Wednesday, March 11. 2009Social networks eclipse email in popularity!
Based on research published by Nielsen on Monday,"Social Networking’s New Global Footprint," social networks are now more popular than email. Plenty of customer feedback can be garnered from the Web with a lot more ease than less publicly available email. Yet not all of this information is tremendously useful (see my previous post on the Skittles social media experiment). But, really, who wants to wade through all of that information by hand!?! Discovering and categorizing valuable feedback is something that can be automated using SAS Text Miner and SAS Content Categorization.
Friday, March 6. 2009Skittles Social Media Experiment meets SAS Text Miner
Earlier last week, Skittles created a lot of buzz around the relaunching of their web site as part of a social media marketing campaign to direct Twitter comments containing the word "skittles" to the Skittles home page. For a little Friday afternoon frivolity, I decided to download some of the Twitter comments to analyze automatically using SAS Text Miner . Given my experience with analyzing web text, and having read the reports about the Skittles campaign, I was sure I would be subjected to colorful language and other less than savory comments - Web 2.0 at its best AND worst - I was right. Additional related opinion has also been posted by Dave Thomas on his Social Media at SAS blog. I downloaded 1400 posts about Skittles from the Twitter social networking site. It was not enough to cover all of the campaign and the buzz it created, but it is a start. Some of my initial results show topics about: ![]() -- Vodka Skittles! [I'm hoping they'll make some Margarita ones]; -- Religion, Viagra, Rihanna, and taste the rainbow (although not all together); -- the campaign itself. This visual (click on it for a better look) gives you a glimpse at the breath of information contained in the postings (people really thought their postings were interesting!)... I spray painted over some bad language so as to avoid offending anyone. I plan on exploring the data a little more with Text Miner, then maybe (given time) adding SAS Content Categorization to the mix allowing me to create a taxonomy using advanced linguistic techniques. Wednesday, February 25. 2009What do text mining vendors see ahead for 2009?
Seth Grimes has compiled an excellent set of quotes from interviews he had with leaders and software developers in this field. you can find his article posted now on the Business Intelligence Network along with Yves Schabes, Manya Mayes and Keith Collins’ views on 2009 challenges and opportunities for text analytics at SAS and Teragram. From discussions and Q&A with attendees at the Predictive Analytics World conference - I can personally confirm manya's first projection. Namely when she said 2009 challenges will call for "A broader set of vertical/horizontal offerings including more automated unstructured (text, voice, image) capabilities delivered for customer/product/competitive intelligence"Although text mining was not addressed in the title or absract of any of the 30 talks on the agenda there were 3 separate discussions that did include ways text analytics fit in. My favorite was when John Elder applied text analytics to "build his haystack" out of the pile of Social Security disability claim forms. By applying text mining, 20% of the cases could be automatically approved quickly leaving more time for the manual review of the others. John's point was that while Text Mining didn't specifically find the needle in the haystack , the technology most certainly helped organize and arrange the haystack so he could allocate tasks to the "artifical intelligence" of the computer and help people do their job faster. Yes , I say 2009 will be the year business and industries open their eyes to see untapped potential for insight now laying around as unstructured content...for example see this blog where Jason Burke saysAs pharmaceutical firms have learned, you can create great solutions for sharing data, and the information still not be useful. For example, if data is sitting in an EMR as an unstructured block of free text, your ability to get to better decisions and predictable outcomes will be considerably constrained. Thankfully, we do have more mature interoperability standards today that can help us along the way. But interoperability should not just be about pushing data across systems -- it should be about facilitating medical insight across stakeholders Jason has laid the groundwork ...and i wait with baited breath to see his next blog entry - because its the INSIGHT that excites me about analytics. Text analytics is not promising easy answers - infact it may spur additional new questions for investigation --resulting in worthwhile innovation. Finding case studies will be a challenge for 2009 - just as the yahoo group discussion here has pointed out.
(Page 1 of 2, totaling 24 entries)
» next page
|
ABOUT THE TEAM I'm Manya Mayes, SAS Chief Text Mining strategist. On this blog, my colleagues, friends and I discuss unstructured text and understanding the voice of the customer. Plus a few more things. Read more about me and the other contributors here. ContributorsQuicksearchSyndicate This BlogCalendar
Blog AdministrationShow tagged entriesa2009 analytics conference content categorization denmark elder book email event extraction france kdd misspellings sas sentiment sentiment analysis Skittles social networking spelling detection supervised learning svd synonyms synsets teragram teragram; sas global forum; twitter; demo; text text mining text mining summit topic detection Twitter visualization wordnet
|
|||||||||||||||||||||||||||||||||||||||||||||||||
