Build your Pokémon library using SAS and the Pokéapi

5
Definitely NOT a copyrighted Pokémon
Definitely NOT a copyrighted Pokémon
Today is #EmbraceYourGeekness day, and you are either reveling in this new crazy town inhabited by Pokémon GO, or you are hiding in your house trying to avoid all of the Pokémon GO zombies wandering around.

But since I'm living in SAS these days -- not just the place (at SAS headquarters), but the software -- I decided to see if I could use my SAS tools to "find" some Pokémon in my work. Thanks to PROC HTTP and fantastic service called the Pokéapi, I've managed some success.

Calling the Pokéapi REST API with SAS

PROC HTTP is the the SAS procedure that you can use to call REST APIs. And the Pokéapi site is a REST API that yields on-demand information about our new favorite creatures. Here's a quick example:

/* utility macro to put file contents to SAS log */
%macro echoResp(fn=);
data _null_;
 infile &fn;
 input;
 put _infile_;
run;
%mend;
 
filename resp temp;
 
/* Call the Pokeapi to list all available Pokemon */
proc http 
  url="http://pokeapi.co/api/v2/pokemon/?limit=1000"
  out=resp
  method="GET";
run;
 
%echoResp(fn=resp);

Here's a snippet of my "Pokémon log":

pokelog
I need a DATA step to read and parse some of the API response, which is in JSON. I'm using a simple INFILE with SCANOVER to parse out just a few bits and create a data set of all the character names (811 of them). The API response is basically one huge line of text, so I'm using the @@ directive to keep the INPUT statement working on the same "record."

data pokemon;
 infile resp lrecl=65635 scanover truncover;
 length name $ 20;
 input @'"name":' name $quote20. @@;
run;

If you're using the free SAS OnDemand for Academics, this code should work there too!

pokenames
I can also use PROC HTTP and the API to gather an incredible amount of detail about each character. I found Jigglypuff at record 39, so here's my code to retrieve and parse some more details. Note that there are hundreds of attributes available for each character, and I'm pulling just a couple of them.

proc http 
  url="http://pokeapi.co/api/v2/pokemon/39"
  out=resp
  method="GET";
run;
 
data jiggly;
 infile resp lrecl=500000 scanover truncover;
 length weight 8 base_experience 8;
 input @'"weight":' weight 2. @@;
 input  @'"base_experience":' base_experience 2. @@;
run;

And the results:

jiggly

Going to "the source" for raw Pokémon data

Parsing JSON using SAS is fun and all, but sometimes you just want access to the raw data. And it turns out that the Pokéapi folks have a project on GitHub with everything we need. We can use PROC HTTP to get to that too! And then use SAS to join and analyze/visualize the results! These calls are to the GitHub site to access the "raw" view of data files in the repository.

Update 15Jul2016: Since I originally published this post, I've heard from Pokémon experts and data visualization experts. They correctly pointed out that my default PROC FREQ plot did not represent the best of SAS graphics nor Pokémon abilities. I've adjusted my code and republished with these changes:

  • Used PROC SGPLOT to show the PROC FREQ output with just the 20 most common abilities, instead of cramming ALL abilities into a single chart.
  • Added GUESSINGROWS=MAX to my PROC IMPORT steps to ensure variables are assigned the proper lengths. Robert Allison pointed out that some names were being truncated.
  • Added a tabular report of the less-common abilities -- those that show up in just one Pokémon character each.
  • Cleaned up the SAS code for readability here, and updated the entire thing on my GitHub Gist.
/* Location of the PokeAPI project on GitHub */
%let githubRoot=https://raw.githubusercontent.com/PokeAPI/pokeapi/master/data/v2/csv;
 
/* temp location for CSV files, works on UNIX or Windows */
%let csvOut = %sysfunc(getoption(WORK));
 
filename pk_csv "&csvOut./pokemon.csv";
 
proc http
 url="&githubRoot./pokemon.csv"
 method="GET"
 out=pk_csv;
run;
 
proc import file=pk_csv out=pokemon dbms=csv replace;
guessingrows=max;
run;
 
filename pk_ab "&csvOut./pokemon_ab.csv";
 
proc http
 url="&githubRoot./pokemon_abilities.csv"
 method="GET"
 out=pk_ab;
run;
 
proc import file=pk_ab out=abilities dbms=csv replace; 
guessingrows=max;
run;
 
filename pk_abn "&csvOut./pokemon_abnames.csv";
 
proc http
 url="&githubRoot./abilities.csv"
 method="GET"
 out=pk_abn;
run;
 
proc import file=pk_abn out=abnames dbms=csv replace;
guessingrows=max;
run;
 
/* Join the 3 data sets */
proc sql;
   create table work.withabilities as 
   select t3.identifier as pokemon, 
          t1.identifier as ability
      from work.abilities t2, work.pokemon t3, work.abnames t1
      where (t2.pokemon_id = t3.id and t2.ability_id = t1.id);
quit;
 
/* Frequency of Abilities among the characters */
proc freq data=work.withabilities noprint
	order=freq
;
	tables ability / nocum  
        scores=table 
        out=ability_freq;
run;
 
/* Create a more readable plot based on the most common abilities */
ods graphics on / height=800 width=800;
title "Most common abilities among Pokemon";
proc sgplot data=ability_freq (obs=20);
  hbar ability / response=count barwidth=.4; 
  yaxis discreteorder=data display=(nolabel);
  xaxis label="# Pokemon who possess it" grid ;
run;
 
/* Join the FREQ output with pokemon names and abilities */
/* for a report on "rare" abilities                      */
proc sql; 
  create table rareabilities
    as select t1.pokemon, t1.ability from
      withabilities t1 inner join ability_freq t2 on 
      (t1.ability=t2.ability AND t2.count=1)
    order by t1.ability;
quit;
 
title "Rare abilities among Pokemon (each possessed by only one character)";
proc print data=rareabilities noobs;
var ability pokemon;
run;

Here's what PROC FREQ and SGPLOT shows about how common some of the abilities are among the Pokémon. "Levitate" appears to be common (good thing, because I'm not sure that they all have legs).

pokeabilities
And the table of less common abilities and who has them? Simple to show with PROC PRINT. I see that "slow start" is uncommon (but that's an ability that I think I can claim for myself...).
pokerare
Full code: I placed all code presented here in a public Gist on GitHub. Enjoy!

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

5 Comments

  1. Rick Wicklin

    Your last graph demonstrates a common data visualization problem. You have dozens of discrete categories, but only 600 pixels to display them. By default, PROC SGPLOT will thin the categories. For 30 or 50 categories you can try to see them all by (1) Making the graph taller, (2) Making the font smaller, and (3) Use FITPOLICY=NONE. For more than 50 categories you can thin (as you've done) or create a plot that shows most frequent categories while grouping the less frequent categories into an "Others" category.

    • Chris Hemedinger
      Chris Hemedinger on

      Thanks Rick! I saw the limitations, but was rushing to get something published in this crazy PokemonGO news cycle! I might come back with improvements.

    • Chris Hemedinger
      Chris Hemedinger on

      There - fixed! Well, at least the chart looks better. I'm sure that there are many more interesting visualizations possible, especially if we pulled in more data...

  2. Pingback: Pokémon: Gotta graph 'em all! - SAS Learning Post

  3. Pingback: Is "La Quinta" Spanish for "Next to Denny's"? - The DO Loop

Back to Top