SAS Visual Analytics 6.4: Importing a Twitter Stream

Great news.  If you’ve been struggling to import a Twitter stream as a data source, SAS Visual Analytics 6.4 has greatly simplified that task as part of this release’s expanded data import functionality. The first time you import tweets, you are directed to the Twitter website to log on to your account and authorize SAS Visual Analytics. After the initial logon, SAS Visual Analytics uses authorization tokens for accessing Twitter instead of requiring you to log on each time.

 The product documentation provides high level instructions for how to import tweets from Twitter, but I found that additional detail makes the process much simpler to follow.  In this post, I’ll walk you through the process from beginning to end, with screenshots and helpful hints along the way.

Capturing the Twitter stream

Start by logging into SAS Visual Analytics. When you select Prepare Data from the main menu, you’ll get the SAS Visual Data Builder dialog.  Select Import Data.  You’ll see the list of possible sources, including Twitter.

importTweet1

When you select Twitter, the next dialog informs you that you’ll need to connect to your Twitter account to authorize SAS Visual Analtyics to use it.  You may want to create a Twitter account specifically for reporting and analysis purposes as opposed to using an existing business or personal account.

 Log into Twitter and authorize SAS Visual Analytics to use the account. Once authorized, you are redirected to the browser tab you used for Twitter.

importTweet2

Returning to SAS Visual Analytics, you’ll now see a different dialog requesting search strings. You may include hashtags or Twitter handles. Once you’ve entered information, you can run the query.  In this example, my search term is #MarchMadness and I’ve specified a limit of 2000 tweets to be returned.  Depending on your needs, you can also specify information such as metadata location, LASR server library and proxy server account information.

importTweet3

Understanding the results

With the table successfully created, you can now choose Create Exploration from the main menu to  see what’s there.  In this example, my March Madness query bought back fewer than the maximum of 2000 results that I requested.  I’m not sure what’s the upper limit for maximum number of results in SAS Visual Analytics, but I’ve been able to run a query asking for 50,000 rows with little trouble.

importTweet5

In a typical Twitter stream query, there is quite a bit of data to explore, and quite a number of variables associated with each tweet.  Unfortunately, finding documentation for these variables was quite difficult. Some variables are more intuitive than others. By doing a little exploration, I was about to pull together my interpretation of what some of the fields in this report represent.

Let me know if you find the following lists useful or if you have additional information to share about these.

 Category Variables

  • author - Mandatory and unique Twitter handle of the person who sent the Tweet.
  • authordescription - Optional user description.
  • authorimageurl - URL for user's profile photo.
  • authorlang - The language the user specifies in their Twitter profile.
  • authorlocation - Optional location in user's profile. Not always an actual location.
  • authorname - The displayed nickname for the user's account. Neither unique nor necessarily the same as the author value.
  • authortimezone - Time zone specified in the user's Twitter profile. Value is sometimes empty.
  • authorurl - Optional URL for user's homepage.
  • body - The actual tweet itself, max of 140 characters.
  • deviceinfo - Information about the device or platform the user sent the Tweet from. Not as clean a field as you might expect.
  • listoflinks - Any URLS that appear in the body of the tweet itself, separated by a semicolon if there are multiple values.
  • mentionedusernames - The displayed nickname of any users mentioned in the tweet, separated by a semicolon if there are multiple values.
  • mentionedusers - The "author" value of any users mentioned in the tweet, separated by a semicolon if there are multiple values.
  • publisheddatetimestr - The date and time the tweet was sent. Note: this value isn't imported as a SAS date, it's a character string.
  • referenceauthor – Unclear how this differs from mentionedusers. It consistently appears to be the first value there.

 Measure Variables 

  • authorfavouritecount - The number of times the tweet was favorited.
  • authorfollowercount - How many followers the author has.
  • authorfriendcount - Somewhat misleading label. This is actually the number of accounts the user follows.
  • authorid – Possibly a proprietary Twitter unique User ID.
  • docid - Another proprietary Twitter unique identifier.  Noninteger value.
  • doclatitude - Presumably the latitude value if user has geo-location enabled. Missing for most records.
  • doclongitude -  Presumably the longitude value if user has geo-location enabled. Missing for most records.
  • isretweet - Is this a retweet? 1 if yes, 0 if no.
  • publisheddatetime – Long numeric value
  • referenceauthorid – Another Twitter identifier.
  • retweetcount - The number of times the tweet was retweeted.
Post a Comment

8 most attended SAS administrator papers in 2014

In a comment on last week’s blog asking SAS administrators: please submit your paper idea for SAS Global Forum 2015, Andrew Howell of ANJ Solutions asked if I had any statistics on which were the most popular SAS administrator papers for last year’s conference. He suggested the following nominations although he was “only able to attend about half of these presentations—there was just so much to see!” 

I don’t have any readily available statistics on downloads for these papers, and, like Andrew, I wasn’t able to attend all of the excellent SAS Administrators sessions at SAS Global Forum 2014. But based on my own observations and feedback from others who were able to attend, here’s my very subjective ranking of Andrew’s list!

  1. Effective Usage of SAS Enterprise Guide in a SAS 9.4 Grid Manager Environment, Edoardo Riva, SAS
  2. SAS Grid – What They Didn’t Tell You, Manuel Nitschinger, sIT Solutions and Phillip Manschek, SAS
  3. Best Practices for Implementing High Availability for SAS 9.4, Cheryl Doninger, Zhiyong Li and Brian Wolfe, SAS
  4. SAS Installations: So You Want To Install SAS?, Rafi Sheikh, Analytics International
  5. Top 10 Resources Every SAS Administrator Should Know About, Margaret Crevar and Tony Brown, SAS
  6. SAS Grid Manager I/O: Optimizing SAS Application Data Availability for the Grid, Gregg Rohaly and Harry Seifert, IBM
  7. Test for Success: Automated Testing of SAS Metadata Security Implementations, Paul Homes, Metacoda
  8. Integrating Your Corporate Scheduler with Platform Suite for SAS® or SAS® Grid Manager, Paul Northrop, SAS Australia

Last year’s conference was a great opportunity for “seeing SAS Administrators in their natural habitat”.  Many of these sessions were standing room only.  Please submit your paper idea and let’s plan for another great year for administrators!

Post a Comment

Macro quoting made easy

Are there times when you need to pass special characters to a macro variable but cannot find the right technique to accomplish the task?  In this article I’ll discuss the different macro quoting functions and give a simple technique to help you determine which macro quoting function to use.

Why do we need macro quoting?  The SAS macro language is a character-based language. With macro, you can specify special characters as text.  Some of these special characters (for example, a semicolon or percent sign) are part of the SAS language instruction set, so we need a way for the macro processor to interpret a particular special character when it’s being used as text in the macro language.  Macro quoting functions tell the macro processor to treat these special characters as text rather than as part of the macro language.  Without macro quoting functions, you would have no way to mask the real meaning of these special characters or mnemonics.

This post will list some all-purpose functions, tell how to determine when to use each type, and show you how to unmask, or unquote special characters. Read More »

Post a Comment

SAS administrators: please submit your paper idea for SAS Global Forum 2015

The SAS Global Forum 2015 Call for Papers opened the end of last month!  I cannot believe it is time to start getting ready for the conference that will be in Dallas, TX on April 26-29, 2015.

As part of the group from SAS who goes to the conference each year to help the attendees with their administration questions, I would like to issue a personal invitation to all the readers of this blog to submit an abstract on SAS administration tips that have made your role as the SAS Administrator at your site easier, or things that you wish you had known before you took over the role of SAS Administrator.  These tips can be from setting up your hardware infrastructure to deploying your SAS applications to supporting your SAS users. Read More »

Post a Comment

MWSUG 2014 – guided learning, a cruise and more

MWSUG 2014 logo showing Chicago skyline and 25th anniversary bannerThis fall, you’ll be swept off your feet in the Windy City: The 2014 Midwest SAS User’s Group (MWSUG) takes place in Chicago from October 5-7. With 135 presentations in 11 different sections, you’ll have the chance to expand your SAS know-how and network with fellow SAS professionals. Read More »

Post a Comment

SAS administrators--what's on your bookshelf?

I spend a lot of time on support.sas.com looking for resources to share.  The SAS Programmer’s Bookshelf is a handy reference that’s been around for a while, so I asked “Why not a SAS Administrator’s Bookshelf?”

What would you include?

It’s complicated, of course, because the SAS administrator’s role covers a myriad of tasks. Throw into that mix, the fact that many individual software products have their own administration requirements, and you have quite a long list.

About a year ago, I published a post on SAS administrator connections and resources, and all this research made me realize it might be time to update that post with a few new items (noted with *). Read More »

Post a Comment

SESUG 2014—learning, networking, problem-solving

SESUG_2014Sun, surf and SAS await you at this year’s South East SAS Users Group (SESUG) conference. Located at the beautiful Myrtle Beach, South Carolina from October 19-21, the conference offers two full days of learning, networking and problem-solving.

“Every year I’ve attended, I’ve found something that I could immediately use in my job and brought back contacts I could reach out to if I needed help with a new SAS application,” said Barbara Okerson, SESUG Executive Council Member. “Once you come to the conferences and meet other SAS professionals, you realize that it’s a community of friends that are always willing to help each other.”

Read More »

Post a Comment

Partitioning in Hadoop, sorting in SAS--same results, different methods

SAS In-Memory Statistics for Hadoop is a single interactive programming environment for analytics on Hadoop that  integrates analytical data preparation, exploration, modeling and deployment. It’s principle components are the IMSTAT procedure (PROC IMSTAT) and the SAS LASR Analytic Engine (or SASIOLA engine for input-output with LASR).

Within the SAS In-Memory Statistics for Hadoop environment, the SAS LASR Analytic Engine provides most of the functionality we associate with Base SAS; whereas, PROC IMSTAT covers the full analytical cycle from data manipulation and management, through modeling, towards deployment. This duality continues SAS's long standing tradition of  getting the same job done in different ways, to accommodate users' different style, constraints and preferences.

This post is one of several upcoming posts I plan to publish soon that discuss code mapping of key analytical data exercises from traditional SAS programming to SAS In-Memory Statistics for Hadoop. This post today covers sorting and sorting-related BY variables and ordering in SAS In-Memory Statistics for Hadoop. Read More »

Post a Comment

Live Google maps in SAS -- multiple markers

In my prior post, Spice up SAS output with live Google maps, I discussed the idea of delivering live Google map through SAS output and demonstrated the feasibility of doing so in SAS Information Delivery portal. Similarly, it can be done for any SAS HTML output.  You would just need to replace file _webout for SAS portal with any file reference pointing to an HTML file.

Since the first post was well received by the SAS community and generated numerous questions, I have decided to expand on it and show some additional functionality that can be incorporated into SAS output in regards to the Google maps.

In this post, I will demonstrate how to use SAS to generate Google map with multiple points (or markers, using Google terminology) and incorporate it into SAS HTML output. Read More »

Post a Comment

SAS Global Forum 2015: Your presentation is needed!

Do you know something about SAS® software that other SAS users would love to learn?

Of course you do!

Whether you’re a student or a member of the Circle of Excellence, every SAS programming project, every analysis or forecasting model is an opportunity to gain new insights into SAS processing or to develop new techniques.

No matter how you measure your SAS experience—what you’ve learned is valuable to other SAS users. So, start typing up your ideas. The SAS Global Forum 2015 call for content opens today! Read More »

Post a Comment