Format your data like a social media superstar

3

In my industry of data and computer science, precision is typically regarded as a virtue. The more exact that you can be, the better. Many of my colleagues are passionate about the idea, which isn't surprising for a statistical software company.

But in social media, precision is a stigma -- especially when applied to the number of followers or connections that you have. Do you have so few connections that it can be represented as an exact number? Well, I wouldn't go bragging about that!

For example, let's look at this new LinkedIn joiner (blurred to protect his identity). As of this writing he has only -- and exactly -- 15 connections! (For the record, I consider this guy to be a rising star; I'm sure that his connections will skyrocket very soon.)

LinkedIn newcomer
Compare this to a more seasoned networker, shown below. LinkedIn won't tell us how many connections he has; perhaps they are beyond measure. We can see only that he has 500 or more LinkedIn connections. It might be 501 or it might be 500 million. It doesn't matter, according to LinkedIn. All you need to know is that this guy is a highly sought-after connection.

linkedin_pro
Let's create a SAS format to count connections, LinkedIn-style. It's simple: for cases of 500 or more, the label should be "500+". For fewer than 500, the label will be the embarrassingly precise actual number.

libname library (work); /* "library" is automatically searched for formats */
proc format lib=library;
 value linkedin    
  500 - high ='500+'
  /* all other values fall through to actual number */
 ;
run;
 
/* test data */
data followers;
  length name $ 40 followers 8 displayed 8;
  format displayed linkedin.;
  infile datalines dsd;
  input name followers;
  displayed = followers;
datalines;
Chris Hemedinger, 776
Kevin Bacon, 100543
Norman Newbie, 3
Colleen Connector, 499
;
run;

Here's the result. Notice how the "displayed" value acts as sort of an equalizer among the super-connected. Once you've achieved a certain magnitude of connections, you're placed on par with all other "superstars".
linkedin_data

Social sites like Instagram and Twitter operate on a different scale. While they still sacrifice precision when displaying a large number of "likes" or "followers", they do at least present a ballpark number that shows the magnitude of the activity. For example, instead of showing the exact follower count for Ronda Rousey (public figure, film star, MMA champ, and daughter of an experienced SAS presenter), Instagram simply truncates the number to the thousands and adds a "K".

rrig
To achieve the same in SAS, we can rely on a PICTURE format. This format contains the templates for what a number should look like, plus the multipliers to apply to get to the figure we want to display. Here's an example program, which I repurposed from the BIGMONEY example in SAS documentation.

proc format lib=library;
picture social (round)
 1E03-<1000000='000K' (mult=.001  )
 1E06-<1000000000='000.9M' (mult=.00001)
 1E09-<1000000000000='000.9B' (mult=1E-08)
 1E12-<1000000000000000='000.9T' (mult=1E-11);
run;
 
/* test data */ 
data likes (keep=actual displayed);
format actual comma20.
 displayed social.;
 do i=1 to 12;
  actual=16**i;
  displayed = actual;
  output;
 end;
run;

Here's the result:
socialcount_output

As you can see, the "displayed" values show you the magnitude of big "likes" in a cool-but-not-so-eagerly-precise way.

Here's my complete test program if you want to try it yourself. Happy networking!

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

3 Comments

  1. This may be a silly question, but is there a reason you used written-out numbers for the social format instead of like this?

    1E03-<1E06='000K' (mult=.001 )
    1E06-<1E09='000.9M' (mult=.00001)
    1E09-<1E12='000.9B' (mult=1E-08)
    1E12-<1E15='000.9T' (mult=1E-11);

    • Chris Hemedinger
      Chris Hemedinger on

      Eric,

      As I mentioned, I repurposed an example from the SAS documentation for PROC FORMAT. The example used the two different styles. I kept it that way because it shows that you can use scientific notation in SAS syntax -- something that's easy to forget.

  2. This kind of format makes great style for for graph scale values where numbers can have a broad range of values

Back to Top