Counting how many times a web link has been shared on Twitter or Facebook

16

Update 24Nov2015: The methods in this post no longer work for Twitter, as Twitter has discontinued support for its "share count" API that was used within the Twitter share buttons. But the Facebook method still works, and I've described a method for counting LinkedIn shares on another post.

As of this writing, I've posted 34 blogs posts during 2012 (not counting this one), which have been shared on Twitter a total of 368 times and on Facebook a total of 92 times. Those aren't exactly "viral" numbers, but I try not to get too hung up on it.

How do I know these numbers? If you are reading this blog post on our blogs.sas.com site, then you can see the "social share" counts for each individual post. For example, see the header from this recent post:


As it would be a tedious task to view each post and tally up the counts, I wrote a SAS program to do that work for me.

The little social count widgets that you see on our blog site are powered by "count" APIs that are surfaced by Twitter and Facebook servers. Each of these APIs is a little bit different, but they work basically in the same way. You create a request in the form of a URL, which contains the address of the API service plus the web page URL that you want to know the "count" of. You use HTTP to issue the request, and the response comes back in the form of a JSON string. Unlike some web service APIs, these API requests do not require authentication or application permissions, so they are easy to use with basic HTTP.

For example, the following URL request returns the number of times that this blog post has been tweeted (click on the link to see the response, and tweet this blog post to watch it go up!):

http://urls.api.twitter.com/1/urls/count.json?url=https://blogs.sas.com/content/sasdummy/2012/07/06/social-media-counts-in-sas/

The exact count in the response will change over time, but here is an example of an outrageously optimistic result:

{ "count":4032,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/07\/06\/social-media-counts-in-sas\/"
}

JSON has become an ubiquitous data exchange format among web service APIs. The SAS language doesn't yet support native parsing of JSON, but with a little bit of programming you can easily parse simple JSON to get the information that you need.

The SAS program that I wrote uses these techniques to gather the Twitter and Facebook counts:

  • Use a DATA step to create a list of URLs (web pages, or links). In my example I included the URLs for the blog topics I've posted thus far in 2012.
  • Use a SAS macro to loop through each URL record and create a data set with a single record.
  • In the macro loop for each URL, use FILENAME URL to construct filerefs that will connect to Twitter and Facebook using well-formed API requests.
  • In the macro loop for each URL, use a DATA step to "input" the file (producing the API response in JSON form) and record it into a data set.
  • Outside of the macro loop, concatenate all responses into a data set (one for Twitter responses and one for Facebook responses).
  • Use Perl regular expression functions to parse the responses into just the numbers I'm looking for.
  • And finally, join all of the social counts together to create a single data set with the URLs and their current tweet/share counts.

These count values are a snapshot of the current number of shares, so I also "stamp" each record with today's date in case I want to build a historical set of these counts and see the trends over time.

This program contains a few inefficiencies. It passes through the "links" source data many times (once for each link) and creates lots of tiny transient data sets along the way. However, the biggest impact on the program performance is the time it takes for the Twitter and Facebook APIs to respond. In my experience, the Twitter API is usually very fast. The Facebook API is sometimes fast, but sometimes a call can be blocked for several seconds (or longer). Because of their scale, both services are sensitive to overuse or abuse, and may "throttle" or deny requests from IP addresses that use them irresponsibly.

When you see the numbers, you might wonder: are they accurate? Can you rely on these results as the absolute counts of how your web content has been shared/liked via social media? Personally, I believe the answer is No. However, I think that the values are useful for relative comparisons. Without focusing on the absolute numbers, I can still learn which of my posts gathered more or less engagement on Twitter, Facebook, or both.

You can download the complete program (with sample links data) from here:
GetSocialStats_example.sas

I've included the program logic (not the sample data) in the remainder of this post. I added a few comments to the code to help convey what's going on.

/* Get count of URLs in source data                */
/* Here it assumes WORK.LINKS                      */
/* and the page link to check is in the URL column */
proc sql noprint;
  select count(URL) into :urlCount from WORK.LINKS;
quit;

/* trim this for later use in constructing data set names*/
%let urlCount = %trim(&urlCount);

%macro gatherSocialStats;
  /* This loop reads each URL link from the data set */
  /* one at a time, and then uses FILENAME URL       */
  /* to invoke the Twitter and Facebook APIs for     */
  /* Tweet counts and Share counts                   */
  %do i = 1 %to &urlCount;

    /* read the next URL */
    proc sql noprint;
      select trim(url) into :url from work.links(firstobs=&i obs=&i);
    quit;

    %put Checking %str(&url.);

    /* format of the Twitter API call */
    filename twurl url "http://urls.api.twitter.com/1/urls/count.json?url=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    /* format of the Facebook API call */
    filename fburl url "http://graph.facebook.com/?ids=&url." 
      /* supply PROXY= option if behind a corporate firewall/gateway */
    ;

    data twresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking Tweet URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile twurl;
      url = "&url.";
      input line;
    run;

    data fbresult&i;
      /* Running in EG? SYSECHO helps convey progress in status bar */
      SYSECHO "Checking FB URL &i. of &urlCount.";
      length line $ 500 url $ 500;
      infile fburl;
      input line;
      url = "&url.";
    run;

    /* Free up the filerefs */
    filename twurl clear;
    filename fburl clear;
  %end;

  /* concatenate all Twitter counts */
  data twresults;
    set twresult1-twresult&urlCount.;
  run;

  /* concatenate all Facebook share counts */
  data fbresults;
    set fbresult1-fbresult&urlCount.;
  run;

  /* clean up all of the tiny data sets */
  proc datasets lib=work nolist;
    delete twresult1-twresult&urlCount.;
    delete fbresult1-fbresult&urlCount.;
  quit;

%mend;

%gatherSocialStats;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
{
  "https://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/": 
  {
     "id": "https://blogs.sas.com/content/sasdummy/2012/01/25/export-to-excel-got-easier/",
     "shares": 3
  }
}
*/
data work.fbcounts (keep=fbshares url);
  set fbresults;
  length fbshares 8;
  if _n_ = 1 then
    do;
      retain fb_regex;
      /* looking for "shares":<a number> */
      fb_regex = prxparse("/\""shares\""\:([0-9]*)/");
    end;
  position = prxmatch(fb_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(fb_regex, 1, start, length);
      fbshares = substr(line,start,length);
    end;
run;

/* Post processing to PARSE the results, which */
/* are in JSON format */

/* Example result:
  {
  "count":8,
  "url":"http:\/\/blogs.sas.com\/content\/sasdummy\/2012\/01\/27\/achieving-mediocrity\/"
  }
*/
data work.twcounts (keep=tweets url);
  set twresults;
  length tweets 8;
  if _n_ = 1 then
    do;
      retain tw_regex;
      /* looking for "count":<a number> */
      tw_regex = prxparse("/\""count\""\:([0-9]*)/");
    end;
  position = prxmatch(tw_regex,line);
  if (position ^= 0) then
    do;
      call prxposn(tw_regex, 1, start, length);
      tweets = substr(line,start,length);
    end;
run;

/* Join it all together with the URL, tweet counts, */
/* and share counts */
proc sql;
  create table work.socialcounts as 
    select t1.url, 
      ifn(t2.fbshares eq .,0,t2.fbshares) as fbshares, 
      t3.tweets,
      /* date-stamp the value for use in history/comparisons */
      today() as as_of_date format=date9.
  from work.links t1 
    left join work.fbcounts t2 on (t1.url = t2.url) 
    left join work.twcounts t3 on (t1.url = t3.url)
  ;
quit;

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies.  He also hosts the SAS Tech Talk webcasts each year from SAS Global Forum, connecting viewers with smart people from SAS R&D and the impressive work that they do.

16 Comments

  1. But "is" there a "right" or exact count that can be extracted? I get nervous when things don't box, I ran this on two different machines and received two different counts - will the firewall issue cause an error or will one not receive all the data - Thanks, so much -

    • Chris Hemedinger
      Chris Hemedinger on

      Grover, I don't know the answer to that. I find these APIs to be adequate for measuring relative "social-ness", but I try to not rely on the absolute numbers they produce.

      • Thanks Chris - I agree, looking at the comment above, how did you get to the number of -comments to all the posts? and can this also be used to count google, pinterest and linkedin shares as well?

        • Chris Hemedinger
          Chris Hemedinger on

          Grover,
          We host our SAS blogs using WordPress. WordPress is backed using a MySQL database. SAS can access MySQL easily, so I can run reports and tell you that we have over 5089 posts on blogs.sas.com (as of right now) and 7,127 comments (oops, your new one plus my response makes 7,129).

          G+ has a more complex API that I haven't tried to tackle. I'm not sure what Pinterest or LinkedIn offer, but I'm sure they have something.

  2. Thanks for the useful code, although I wouldn't personally recommend going overboard with embedding social widgets on your website. Remember that social networks should be for building traffic to your website, not the other way round.

  3. I don't know if this site can answer my question. I came across the site trying to find an answer. I simply would like to know if all my posts and shared posts appear on my timeline. It would be useful sometimes to be able to check what you have said or posted in hindsight. Not all my posts seem to appear on my newsfeed and I am not sure if they have been shared. I would like to know for sure .if they appear on your timeline without fail it would ,be a way to check this.
    Are you able to help me with this? Thanks.

    • Chris Hemedinger
      Chris Hemedinger on

      Denise, I'm afraid I don't know the answer to that. You have control over which posts/activities appear on your timeline, but I don't know what is visible by default. You can review all of that by visiting http://www.facebook.com/your_facebook_id/allactivity.

  4. Pingback: Using SAS to count the number of LinkedIn “shares” for your article - The SAS Dummy

  5. Great article Chris! Would appreciate some help with a problem I am facing. I am interested in extracting client data (for which I have access), sitting behind Google Analytics, Adwords, FB Page insights, Youtube etc. using SAS. I have tried searching for sample code but have not been successful. Would appreciate any help with this. Thanks.

    • Chris Hemedinger
      Chris Hemedinger on

      Parag,

      If you want to do this work from within a SAS program (connect to a service and download data), you have to do some tricks for authentication. These services all offer APIs, but usually the API needs you to register your "app" and connect using OAuth. My colleague Falko Schulz posted an example for Twitter on sasCommunity.org. You'll have to do some research to see if you can adapt it for these other services.

Leave A Reply

Back to Top