Everyone is always looking for test data. Business analysts want it for demos and prototypes. Software developers want it for development and unit testing. Testers want it for system and integration testing.
I’ve written many programs to generate test data over the years, as have many other SAS users. Generated data can be great, but it’s always nice to have real data when you can get it. It demos better. It highlights issues better (Believe me, real people will enter things you never dreamed of including in your generated data). It even helps you be a better software designer since you can actually look at real data and react to what you’re seeing.
So how do you get real data when individuals, businesses and organizations are increasingly protective of it?
Well, actually, in some areas of business that seems to be changing. In an effort to increase transparency and visibility, many organizations make some data available for use by outsiders. For example, Amazon, Google Shopping, and BestBuy all make their product and review data available for integration into other web sites.
This product data is meant for web integration so PROC HTTP is the key to downloading, and it could hardly be easier!
Let’s look at getting some BestBuy product review data. Looking at the documentation for the Reviews API, we are told to format our request like this:
Here are the key parameters:
- sku parameter denotes the particular product being reviewed
- apiKey parameter denotes your authentication key (you must register to get it)
- show parameter denotes which columns to return
From there, downloading the data is just a matter of putting the request into PROC HTTP and changing the parameters to meet your needs. Here’s some code I ran to get iPad 2 Air reviews:
filename out temp; PROC HTTP out=out url=’http://api.remix.bestbuy.com/v1/reviews(sku=3315023)?apiKey=your-authentication-key&page=1&pageSize=100&show=id,sku,comment,rating’ method=”get” ; run; libname out xmlv2; proc append base=iPadReviews data=out.review; run;
BestBuy places a limited of 100 records (pagesize parameter) per request, so I wrote the code so it could be used iteratively. To do so, simply create a SAS macro from this example and increment the page parameter via macro substitution to read through all the reviews.
Now, let’s look our new iPad 2 Air product review test data!