Finding patterns in big data with SAS/GRAPH

When working with "big data" you usually have too many points to view in a plot, and end up subsetting or summarizing the data. But now, in SAS 9.3, you have an alternative!

For example, the following scatter plot of 10,000+ points is just a visual "blob":

Very dense scatter plot, which looks like a black blob.

But using a new SAS 9.3 feature, alpha-transparency, you can make the plot markers semi-transparent, and "see" the hidden patterns. When multiple markers "stack up" in the same location, their transparent colors "add up" and they become darker.

In this case, I had "hidden" a densely-packed 'ring' of points in the data. This pattern is visually obscured when plotted with solid-colored markers (above), but very visible in graph with transparent markers (below):

Same scatter plot, with alpha-transparent colors -- the hidden pattern becomes visible.

Learn more about this example, and the exact code used to create it.

UPDATE: The idea for this post was inspired by a post on Yihui Xie’s website. I originally linked to the post from my ‘info’ page  but a few readers have mentioned that they missed the link in that file, so I wanted to include it here too.

tags: alpha-transparency, big data, gplot, sas/graph, tips and tricks

23 Comments

  1. Posted July 27, 2012 at 12:02 pm | Permalink

    The great thing is now you don't even need SAS/GRAPH for that. sgplot and sgscatter both do semi-transparent scatters!

    • Robert Allison Robert Allison
      Posted July 27, 2012 at 3:02 pm | Permalink

      SAS/GRAPH and the 'SG' procedures (sgplot, sgscatter, etc) have some overlap in their feature sets, but they also each have some unique capabilities that are not available in the other. I would recommend you use them both! :-)

  2. Posted July 27, 2012 at 1:00 pm | Permalink

    The alpha transparency is a useful but relatively minor feature. I've followed Robert Allison's SAS graph site and enjoyed it. However, the title of this brief article "Finding patterns in big data with SAS/GRAPH" promises way too much. Big data is so much more than 10K records, it seems the title is either hype or, especially with the link the first pagagraph, an attempt at SEO. A better title would be "Alpha transparency mitigates overplotting in SAS 9.3."

    • Robert Allison Robert Allison
      Posted August 7, 2012 at 2:34 pm | Permalink

      True - your suggested title would be much more 'correct' and provide a better description. As I write more blogs, you'll notice that I tend to "sensationalize" my titles a bit (and sometimes I'll even make them witty plays-on-words!) It's just my writing style - hopefully you'll grow to like it ;-)

      Hopefully this blog in combination with my future blogs will show that although alpha transparency is a subtle feature, it is very powerful, and allows you to visualize your data in ways that are not possible without transparency. The feature has only been added to SAS/GRAPH for a short time, but I've already found that I "cannot live without it"!

      Per the "big data" ... I hope you'll agree that alpha transparency will allow you to visualize "bigger" data than without it(?) And, therefore, it provides better capabilities for plotting big data than before (ie, without alpha transparency).

  3. Quentin McMullen
    Posted July 27, 2012 at 1:34 pm | Permalink

    Very cool. Even without LOTS of data, can see how this would be useful as an alternative to jittering values when two data points overlap, or showing how two different distributions overlap (yellow+blue=green). Thanks.

    • Robert Allison Robert Allison
      Posted July 27, 2012 at 3:04 pm | Permalink

      Good point Quentin! ... Keep an eye out, and you will likely see several other cool uses of "transparency" (in graphs & maps) in my future blogs!

  4. Jaime
    Posted July 27, 2012 at 4:02 pm | Permalink

    Bout time they get Robert Allison on one of these blogs.
    His super cool plots are the "cowbell" these blogs need.
    More cowbell !!! ; via sas/graphs.

    • Robert Allison Robert Allison
      Posted August 7, 2012 at 2:36 pm | Permalink

      More cowbell indeed!

      Hopefully Christopher Walken would even approve! ;-)

    • Charu Shankar Charu Shankar
      Posted August 10, 2012 at 1:33 pm | Permalink

      I agree--Robert Allison graph posts fill a much wanted need. I kept the cool graphs for just after lunch today in my SAS Enterprise Guide class. Instead of sleepy minds trying to wake up to code, all you could hear was "oohs & aahs"!! as class was thinking of ways to show off on their return back to work on Monday!!

      • Robert Allison Robert Allison
        Posted August 13, 2012 at 1:10 pm | Permalink

        :-)

  5. LeRoy Bessler
    Posted July 28, 2012 at 11:02 am | Permalink

    Yes, always keep an eye out for stuff from Robert Allison
    He is the King of Cool SAS Graphics.
    See his new book "SAS/GRAPH: Beyond the Basics"
    and his web page http://robslink.com/SAS/Home.htm
    But I don't know where his "future blogs" will be posted.

    • Robert Allison Robert Allison
      Posted August 7, 2012 at 2:38 pm | Permalink

      Thanks LeRoy!

      Keep an eye on this blog (The SAS Training Post) - I plan to post several blogs here.

  6. Posted August 2, 2012 at 10:42 am | Permalink

    Yes, but +10k are not "big data". Moreover, it is said in the code that the exercise is based on a previous post about R. Such attribution should be made more apparent to visitors of this page.

  7. Posted August 3, 2012 at 6:56 am | Permalink
    • Robert Allison Robert Allison
      Posted August 7, 2012 at 2:38 pm | Permalink

      Thanks Rick!

  8. k
    Posted August 11, 2012 at 8:18 am | Permalink

    This seems a little strange to me. The SAS code has about 53 lines of actual, while the R code is only about 13 lines of code. I see that some of this is creating the data in SAS but why not download the actual data used and import it. The SAS code also only does about one third of the what the R code does, which is make the code in the first few lines of R. It would be interesting to see how many lines it would take to match functionality exactly. Is it possible to make a movie in a similar manner in SAS? You also mention that this is possible due to advancements in version 9.3, but the R version was made just about four years. I have used SAS and R a lot but in the past year have used SAS everyday with only slight interactions with R. I would imagine the more recent exposure to SAS would allow the codes meaning to jump out at me, but to the contrary it seems almost alien to me. It seems that the SAS code has lots of bulky statements and is very wordy in a somewhat illiterate way while the R code is very literate and succinct. I was wondering how SAS would fair against a more current set of graphics done in R. These are just a few of the interesting graphics I have seen on R-Bloggers over the last week or so. Would the same effects of more code for similar functionality and less readability appear?
    http://www.r-bloggers.com/simulation-the-modellers-laboratory/
    http://www.r-bloggers.com/animated-gif-annual-correlation-of-48-industries-for-50-years/
    http://www.r-bloggers.com/universal-portfolio-part-10/
    http://www.r-bloggers.com/exploring-distributions-of-ensatina-salamander-subspecies-using-rvertnet-by-neil-kelly/
    http://www.r-bloggers.com/48-industries-dendrogram-ordered-over-50-years/
    http://www.r-bloggers.com/genetic-algorithms-a-simple-r-example/

    • Robert Allison Robert Allison
      Posted August 13, 2012 at 1:34 pm | Permalink

      k - thanks for the comment, and the links to these cool samples! This might give me some ideas for future cool graphs! :)

      Per the comparison of SAS and R - that's one of those topics of much debate, similar to religion and politics! LOL I can't really join in the debate, because I'm not an R expert ... but based on examples I've seen I feel safe in saying that both SAS and R are very capable, and both can produce some really useful graphical output.

      Per this particular example, I hadn't really intended it to be a comparison of SAS and R, nor had I tried to use the minimum lines of code. As you mention, I could have re-used their data, but I thought it would make a better example to show how to do that in SAS also - making it a totally self-contained example (nothing else needed, other than what's in the code!) If I had *really* wanted to keep the code to the minimum, I guess I could have created the plot in "about 3 lines" - a libname to reference the data, a symbol statement to specify the alpha-transparent color, and then call 'proc gplot' (taking the default axes and such) ... but that wouldn't have been nearly as much fun! :-)

    • Sanjay Matange Sanjay Matange
      Posted January 5, 2013 at 5:59 pm | Permalink

      I believe many (if not all) of these graphs can be done using the new SGPLOT, SGPANEL or SGSCATTER procedures using very succint code released with SAS 9.2. Other, more complex, graphs can be done using GTL. For many examples, see Graphically Speaking.

  9. Robert Allison Robert Allison
    Posted January 22, 2013 at 9:16 am | Permalink

    Here's one "plausibly real life" example (using imaginary data from a contest) where I used this technique to look for patterns of higher density geographical locations to try to track the origin & spread of an epidemic.

    http://robslink.com/SAS/vastopolis/sas-mc1/

    All of the maps except the gif animation use alpha-transparent colors for the markers.

  10. Dingdang
    Posted October 2, 2013 at 5:05 am | Permalink

    this is really cool. i am using SAS 9.2 and it is such a pity that i cannot see the effect on my pc. this could be very helpful for cluster analysis, gives you a much better idea how the data looks like and help you to choose the right distance measure and algorithm.

    • Robert Allison Robert Allison
      Posted October 2, 2013 at 8:19 am | Permalink

      Perhaps you can use this cool new feature as part of a 'justification' for your company to upgrade to a newer version of SAS!

      SAS 9.2, that you're using, was released in 2008 -- in the computer industry that's "ancient"! ;)

4 Trackbacks

  1. By Top blog comments last week - SAS Voices on August 9, 2012 at 3:25 pm

    [...] have been impressing SAS users and Dashboard professionals for years. His recent post about displaying big data has received a total of 15 comments, including this one from Leroy Bessler: Yes, always keep an eye [...]

  2. By My Homepage on October 18, 2012 at 1:39 pm

    ... [Trackback]...

    [...] There you will find 23378 more Infos: blogs.sas.com/content/sastraining/2012/07/27/finding-patterns-in-big-data-with-sasgraph/ [...]...

  3. By togel online on November 14, 2012 at 8:57 am

    ... [Trackback]...

    [...] Find More Informations here: blogs.sas.com/content/sastraining/2012/07/27/finding-patterns-in-big-data-with-sasgraph/ [...]...

  4. By The best of SAS blogs for 2012 - SAS Voices on December 27, 2012 at 5:24 pm

    [...] Finding patterns in big data with SAS/GRAPH [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>