Need test data? Use SAS to extract it from these surprising sources

Everyone is always looking for test data. Business analysts want it for demos and prototypes. Software developers want it for development and unit testing. Testers want it for system and integration testing.

I’ve written many programs to generate test data over the years, as have many other SAS users. Generated data can be great, but it’s always nice to have real data when you can get it. It demos better. It highlights issues better (Believe me, real people will enter things you never dreamed of including in your generated data). It even helps you be a better software designer since you can actually look at real data and react to what you’re seeing.

So how do you get real data when individuals, businesses and organizations are increasingly protective of it?

Well, actually, in some areas of business that seems to be changing. In an effort to increase transparency and visibility, many organizations make some data available for use by outsiders. For example, Amazon, Google Shopping, and BestBuy all make their product and review data available for integration into other web sites.

This product data is meant for web integration so PROC HTTP is the key to downloading, and it could hardly be easier!

Let’s look at getting some BestBuy product review data. Looking at the documentation for the Reviews API, we are told to format our request like this:

Here are the key parameters:

  • sku parameter denotes the particular product being reviewed
  • apiKey parameter denotes your authentication key (you must register to get it)
  • show parameter denotes which columns to return

From there, downloading the data is just a matter of putting the request into PROC HTTP and changing the parameters to meet your needs. Here’s some code I ran to get iPad 2 Air reviews:

filename out temp;

PROC HTTP out=out url=’,sku,comment,rating’
method=”get” ;

libname out xmlv2;

proc append base=iPadReviews; run;

BestBuy places a limited of 100 records (pagesize parameter) per request, so I wrote the code so it could be used iteratively. To do so, simply create a SAS macro from this example and increment the page parameter via macro substitution to read through all the reviews.

Now, let’s look our new iPad 2 Air product review test data!


Post a Comment

SAS timer - the key to writing efficient SAS code

New Year to me is always a stark reminder of the inexorability of Time. In a day-to-day life, time is measured in small denominations - minutes, hours, days… But come New Year, and this inescapable creature – Time – makes its decisive leap – and in a single instant, we become officially older and wiser by the entire year’s worth.

What’s a better time to re-assess ourselves, personally and professionally! What’s a better time to Resolve to improve your SAS programming skills, as skillfully crafted by Michael A. Raithel in his recent blog post.

I thought I could write a post showing how to be efficient and kill two birds with one stone.  The birds here are two New Year’s Raithel’s proposed resolutions:

#2 Volunteer to help junior SAS programmers.

#12 Reduce processing time by writing more efficient programs.

To combine the two, I could have titled this post “Helping junior SAS programmers to reduce processing time by writing more efficient programs”. However, I am not going to “teach” you efficient coding techniques which are a subject deserving of a multi-volume treatise. I will just give you a simple tool that is a must-have for any SAS programmer (not just junior) who considers writing efficient SAS code important. This simple tool has been the ultimate judge of any code’s efficiency and it is called timer.

What is efficient?

Setting aside hardware constraints and limitations (which are increasingly diminishing nowadays), efficient means fast or at least fast enough not to exceed ever-shrinking user tolerance of wait time.

Of course, if you are developing a one-time run code to generate some ad-hoc report or produce results for uniquely custom computations, your efficiency criteria might be different, such as “as long as it ends before the deadline” or at least “does not run forever”.

However, in most cases, SAS code is developed for some applications, in many cases interactive applications, where many users run the code over and over again. It may run behind the scenes of a web application with a user waiting (or rather not wanting to wait) for results. In these cases, SAS code must be really fast, and any improvement in its efficiency is multiplied by the number of times it is run.

What is out there?

SAS provides the following SAS system options to measure the efficiency of SAS code:

STIMER. You may not realize that you use this option every time you run a SAS program. This option is turned on by default (NOSTIMER to turn it off) and controls information written to the SAS Log by each SAS step. Each step of a SAS program by default generates the following sample NOTE in SAS Log:

NOTE: DATA statement used (Total process time):
      real time           1.31 seconds
      cpu time            1.10 seconds

FULLSTIMER. This option (NOFULLSTIMER to turn it off) provides much more information on used resources for each step. A sample Log output of a FULLSTIMER option for a SAS Data Step is listed below:

NOTE: DATA statement used:
real time                   0.06 seconds
user cpu time               0.02 seconds
system cpu time             0.00 seconds
Memory                      88k
Page Faults                  10
Page Reclaims                 0
Page Swaps                    0
Voluntary Context Switches   22
Involuntary Context Switches  0
Block Input Operations       10
Block Output Operations      12

While the FULLSTIMER option provides plenty of information for SAS code optimization, in many cases it is more than you really need. On the other hand, STIMER may provide quite valuable information about each step, thus identifying the most critical steps of your SAS program.

Get your own SAS timer

If your efficiency criteria is how fast your SAS program runs as a whole, than you need an old-fashioned timer, with start and stop events and time elapsed between them. To achieve this in SAS programs, I use the following technique.

  1. At the very beginning of your SAS program, place the following line of code that effectively starts the timer and remembers the start time:
  2. /* Start timer */
    %let _timer_start = %sysfunc(datetime());

  3. At the end of your SAS program place the following code snippet that captures the end time, calculates duration and outputs it to the SAS Log:
  4. /* Stop timer */
    data _null_;
      dur = datetime() - &_timer_start;
      put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';

    The resulting output in the SAS log will look like this:

     TOTAL DURATION:   0:01:31.02

    Despite its utter simplicity, this little timer is a very convenient little tool to improve your SAS code efficiency. You can use it to compare or benchmark your SAS programs in their entirety.

    Warning. In the above timer, I used the datetime() function, and I insist on using it instead of the time() function as I saw in many online resources. Keep in mind that the time() function resets to 0 at midnight. While time() will work just as well when start and stop times are within the same date, it will produce completely meaningless results when start time falls within one date and stop time falls within another date. You can easily trap yourself in when you submit your SAS program right before midnight while it ends after midnight, which will result in an incorrect, even negative, duration.

    I hope using this SAS timer will help you writing more efficient SAS programs.

Post a Comment

Cloud: 4 deployment models

cloud4modelsThis is the last of my series of posts on the NIST definition of cloud computing. As you can see from this Wikipedia definition, calling anything a “cloud” is likely to be the fuzziest way of describing it.

In meteorology, a cloud is a visible mass of liquid droplets or frozen crystals made of water or various chemicals suspended in the atmosphere above the surface of a planetary body. These suspended particles are also known as aerosols and are studied in the cloud physics branch of meteorology.

Not that there is anything wrong with the label “cloud”--it’s a shortcut that allows us to quickly convey an idea. But for anything beyond that, when talking about functionality, we would be well advised to define and describe “cloud” in as much detail as possible so that all people involved have the same picture in their mind, and not whatever it is they think of when they think of “cloud”.

The NIST definitions help us narrow down features, functionality and models, but those are still only broad categories that leave certain gaps in which misunderstandings can easily sprout. I encourage you to use these definitions, but also to go further and describe cloud architectures by using terms that are as precise as possible.

In recent posts, I talked about the five characteristics of cloud, as well as the three service models. In this final installment of the series, I will discuss the four cloud deployment models. Read More »

Post a Comment

SAS Grid Manager, shared-nothing storage and Hadoop: Part 1

When designing a SAS Grid Manager architecture, there is a requirement that has always been a critical component: a clustered file system. Over the years, vendors have released versions of these systems that are more robust and SAS has increased the minimum IO requirements, but the basic design has never changed—until now.

Any guess who the driver of this change could be? I heard a yellow elephant somewhere? Yes, Hadoop, but not only! File systems are now available that support SAS Grid computing in other shared-nothing storage architectures.

Let’s take a step back to understand how new file system options can facilitate your SAS Grid deployment. In this post, I’ll start with a quick review of storage architectures for SAS Grid Manager and what other vendors are doing. In a subsequent blog post, I’ll dive more specifically into the interaction of Hadoop and SAS Grid Manager. Read More »

Post a Comment

When to use NFS with SAS

With the growing use of SAS on commodity hardware, many organizations are running lots of SAS servers on separate instances of operating system in a SAS infrastructure. This configuration is great for optimizing resources, but when these SAS servers have to share data, then SAS recommends the use of a clustered file system.

This recommendation presents an issue for some companies. Because clustered file systems are not part of their standard operating system, it is an additional expense. So, to avoid driving up the cost of the hardware infrastructure for SAS, some IT administrators are proposing the use of NFS to share files among the SAS servers running on different instances of an operating system. Let’s look in more detail at the pros and cons for NFS as a shared file system with SAS. Read More »

Post a Comment

SAS Global Forum: I want to volunteer!

Most of us have volunteered in some capacity -- as a guinea pig in a magic show, as a helping hand with a charity, or perhaps as a coach for a child’s sports team. Why do we do it?

I think the cliché holds true. We get as much, or more, benefit from the experience as the people we are helping. We feel better about ourselves, we make new friends, and we expand our comfort zone or learn something new. Sometimes we are “giving back” to an organization that previously helped us--we “pay it forward”. At other times, we simply want to spend time with other volunteers.

Volunteers advance much of society’s support services. But what about volunteering for professional organizations, societies, associations or communities? They rely on volunteers too. Read More »

Post a Comment

SAS Backup and Recovery Tool

In researching material for an upcoming project, I investigated the SAS Deployment Backup and Recovery Tool available in SAS 9.4. Here are some of my findings on identifying what directories are included in a Backup and Recovery session and how to add custom directories to a backup configuration.

The SAS Backup and Recovery Tool is designed to provide an integrated method for backing up and recovering your SAS content across multiple tiers and machines. Note that this utility is a system-wide tool. It’s intended to back up an entire environment as opposed to a single machine in a multi-machine deployment. Read More »

Post a Comment

SAS Global Forum: Give back, become more

Life is a gift, and it offers us the privilege, opportunity, and responsibility to give something back by becoming more.
- Anthony Robbins

Every year at SAS Global Forum, SAS professionals from all over the globe gather, reconnect and learn from each other. And wherever they gather, SAS Global Forum attendees demonstrate their appreciation for the opportunity to become more by giving back to the host community.

December is the season of giving. In that spirit, let's take a look at how SAS Global Forum 2015 will be giving back to the community of Dallas. Read More »

Post a Comment

Have you created Scalable Vector Graphics with SAS?

SAS Technical Support Problem SolversIf you haven’t tried them for your web applications and other graphics needs, you’ll want to read further!

Scalable Vector Graphics (SVG) output is vector graphics output you can display with most (if not all) modern web browsers. Because SVG graphic output is scalable, you can zoom in on the graphics output without losing resolution. Unlike bit-mapped images such as PNG or GIF output, they can be resized or transformed without compromising the clarity, eliminating the need to produce multiple versions of the same image! There are other advantages for using Scalable Vector Graphics like their ability to zoom in to view details, their smaller output file size and their usefulness for producing graphics for a range of display sizes and types. Read More »

Post a Comment

Exporting events from SAS Environment Manager

As SAS administrators, I know you are as excited as I am by the ability of SAS Environment Manager to monitor, in detail, the performance of their SAS environments. Now, we have a robust tool to monitor, measure and report on the performance of the various SAS components. An added bonus—with each maintenance release of SAS 9.4, more features are added to SAS Environment Manager tool set.

As nice as SAS Environment Manager is, some of you may already be invested in other system monitoring tools. As a result, SAS Professional Services consultants are often asked how to integrate monitoring information from SAS Environment Manager into their existing monitoring systems, especially when it comes to notifying administrators of potential problems. Fortunately, SAS Environment Manager 2.4 now includes an event exporting service that makes it quite simple to integrate with most any third-party monitoring tool. Read More »

Post a Comment