Finding patterns in big data with SAS/GRAPH

28

When working with "big data" you usually have too many points to view in a plot, and end up subsetting or summarizing the data. But now, in SAS 9.3, you have an alternative!

For example, the following scatter plot of 10,000+ points is just a visual "blob":

Very dense scatter plot, which looks like a black blob.

But using a new SAS 9.3 feature, alpha-transparency, you can make the plot markers semi-transparent, and "see" the hidden patterns. When multiple markers "stack up" in the same location, their transparent colors "add up" and they become darker.

In this case, I had "hidden" a densely-packed 'ring' of points in the data. This pattern is visually obscured when plotted with solid-colored markers (above), but very visible in graph with transparent markers (below):

Same scatter plot, with alpha-transparent colors -- the hidden pattern becomes visible.

Learn more about this example, and the exact code used to create it.

UPDATE: The idea for this post was inspired by a post on Yihui Xie’s website. I originally linked to the post from my ‘info’ page  but a few readers have mentioned that they missed the link in that file, so I wanted to include it here too.

Share

About Author

Robert Allison

The Graph Guy!

Robert has worked at SAS for over a quarter century, and his specialty is customizing graphs and maps - adding those little extra touches that help them answer your questions at a glance. His educational background is in Computer Science, and he holds a BS, MS, and PhD from NC State University.

Related Posts

28 Comments

  1. Pingback: Creating customized graphs for SAS analytic procedures

  2. this is really cool. i am using SAS 9.2 and it is such a pity that i cannot see the effect on my pc. this could be very helpful for cluster analysis, gives you a much better idea how the data looks like and help you to choose the right distance measure and algorithm.

    • Robert Allison
      Robert Allison on

      Perhaps you can use this cool new feature as part of a 'justification' for your company to upgrade to a newer version of SAS!

      SAS 9.2, that you're using, was released in 2008 -- in the computer industry that's "ancient"! ;)

  3. Robert Allison
    Robert Allison on

    Here's one "plausibly real life" example (using imaginary data from a contest) where I used this technique to look for patterns of higher density geographical locations to try to track the origin & spread of an epidemic.

    http://robslink.com/SAS/vastopolis/sas-mc1/

    All of the maps except the gif animation use alpha-transparent colors for the markers.

  4. Pingback: The best of SAS blogs for 2012 - SAS Voices

  5. Pingback: togel online

  6. Pingback: My Homepage

  7. This seems a little strange to me. The SAS code has about 53 lines of actual, while the R code is only about 13 lines of code. I see that some of this is creating the data in SAS but why not download the actual data used and import it. The SAS code also only does about one third of the what the R code does, which is make the code in the first few lines of R. It would be interesting to see how many lines it would take to match functionality exactly. Is it possible to make a movie in a similar manner in SAS? You also mention that this is possible due to advancements in version 9.3, but the R version was made just about four years. I have used SAS and R a lot but in the past year have used SAS everyday with only slight interactions with R. I would imagine the more recent exposure to SAS would allow the codes meaning to jump out at me, but to the contrary it seems almost alien to me. It seems that the SAS code has lots of bulky statements and is very wordy in a somewhat illiterate way while the R code is very literate and succinct. I was wondering how SAS would fair against a more current set of graphics done in R. These are just a few of the interesting graphics I have seen on R-Bloggers over the last week or so. Would the same effects of more code for similar functionality and less readability appear?
    http://www.r-bloggers.com/simulation-the-modellers-laboratory/
    http://www.r-bloggers.com/animated-gif-annual-correlation-of-48-industries-for-50-years/
    http://www.r-bloggers.com/universal-portfolio-part-10/
    http://www.r-bloggers.com/exploring-distributions-of-ensatina-salamander-subspecies-using-rvertnet-by-neil-kelly/
    http://www.r-bloggers.com/48-industries-dendrogram-ordered-over-50-years/
    http://www.r-bloggers.com/genetic-algorithms-a-simple-r-example/

    • Robert Allison
      Robert Allison on

      k - thanks for the comment, and the links to these cool samples! This might give me some ideas for future cool graphs! :)

      Per the comparison of SAS and R - that's one of those topics of much debate, similar to religion and politics! LOL I can't really join in the debate, because I'm not an R expert ... but based on examples I've seen I feel safe in saying that both SAS and R are very capable, and both can produce some really useful graphical output.

      Per this particular example, I hadn't really intended it to be a comparison of SAS and R, nor had I tried to use the minimum lines of code. As you mention, I could have re-used their data, but I thought it would make a better example to show how to do that in SAS also - making it a totally self-contained example (nothing else needed, other than what's in the code!) If I had *really* wanted to keep the code to the minimum, I guess I could have created the plot in "about 3 lines" - a libname to reference the data, a symbol statement to specify the alpha-transparent color, and then call 'proc gplot' (taking the default axes and such) ... but that wouldn't have been nearly as much fun! :-)

    • Sanjay Matange
      Sanjay Matange on

      I believe many (if not all) of these graphs can be done using the new SGPLOT, SGPANEL or SGSCATTER procedures using very succint code released with SAS 9.2. Other, more complex, graphs can be done using GTL. For many examples, see Graphically Speaking.

  8. Pingback: Top blog comments last week - SAS Voices

  9. Yes, but +10k are not "big data". Moreover, it is said in the code that the exercise is based on a previous post about R. Such attribution should be made more apparent to visitors of this page.

  10. LeRoy Bessler on

    Yes, always keep an eye out for stuff from Robert Allison
    He is the King of Cool SAS Graphics.
    See his new book "SAS/GRAPH: Beyond the Basics"
    and his web page http://robslink.com/SAS/Home.htm
    But I don't know where his "future blogs" will be posted.

    • Robert Allison
      Robert Allison on

      Thanks LeRoy!

      Keep an eye on this blog (The SAS Training Post) - I plan to post several blogs here.

  11. Bout time they get Robert Allison on one of these blogs.
    His super cool plots are the "cowbell" these blogs need.
    More cowbell !!! ; via sas/graphs.

    • Charu Shankar
      Charu Shankar on

      I agree--Robert Allison graph posts fill a much wanted need. I kept the cool graphs for just after lunch today in my SAS Enterprise Guide class. Instead of sleepy minds trying to wake up to code, all you could hear was "oohs & aahs"!! as class was thinking of ways to show off on their return back to work on Monday!!

  12. Quentin McMullen on

    Very cool. Even without LOTS of data, can see how this would be useful as an alternative to jittering values when two data points overlap, or showing how two different distributions overlap (yellow+blue=green). Thanks.

    • Robert Allison
      Robert Allison on

      Good point Quentin! ... Keep an eye out, and you will likely see several other cool uses of "transparency" (in graphs & maps) in my future blogs!

  13. The alpha transparency is a useful but relatively minor feature. I've followed Robert Allison's SAS graph site and enjoyed it. However, the title of this brief article "Finding patterns in big data with SAS/GRAPH" promises way too much. Big data is so much more than 10K records, it seems the title is either hype or, especially with the link the first pagagraph, an attempt at SEO. A better title would be "Alpha transparency mitigates overplotting in SAS 9.3."

    • Robert Allison
      Robert Allison on

      True - your suggested title would be much more 'correct' and provide a better description. As I write more blogs, you'll notice that I tend to "sensationalize" my titles a bit (and sometimes I'll even make them witty plays-on-words!) It's just my writing style - hopefully you'll grow to like it ;-)

      Hopefully this blog in combination with my future blogs will show that although alpha transparency is a subtle feature, it is very powerful, and allows you to visualize your data in ways that are not possible without transparency. The feature has only been added to SAS/GRAPH for a short time, but I've already found that I "cannot live without it"!

      Per the "big data" ... I hope you'll agree that alpha transparency will allow you to visualize "bigger" data than without it(?) And, therefore, it provides better capabilities for plotting big data than before (ie, without alpha transparency).

  14. The great thing is now you don't even need SAS/GRAPH for that. sgplot and sgscatter both do semi-transparent scatters!

    • Robert Allison
      Robert Allison on

      SAS/GRAPH and the 'SG' procedures (sgplot, sgscatter, etc) have some overlap in their feature sets, but they also each have some unique capabilities that are not available in the other. I would recommend you use them both! :-)

Back to Top