World Statistics, FTW!

Yesterday, I was in the #raganSAS audience as David Pogue told me What's New and What's Next in the world of technology. David is a great presenter, and he really had the audience engaged as he talked about augmented reality, his world according to Twitter, and an iPhone app that comes pretty close to teaching the world to sing in perfect harmony (plus a cheater app that helps the world to sing like T. Pain).

On the world-harmony-for-profit theme, he shared information about web sites such as Kiva.org that facilitate microfinancing around the world. There are other microfinance sites that help people closer to home (for us in the USA), but as Pogue said, only Kiva.org can give you that "rosy glow" when you know you're helping people in developing countries.

Kiva.org opens financial doors for people who might not have another source of funding; but it also presents a
platform rich in data for analysis and reporting. The folks at Kiva.org support web services that allow you to build applications that reference the data that they collect. They also offer "data snapshots": downloadable versions of all of the data they have on the loans, loan recipients, and the lenders who participate.

If you could get this data into SAS, what insights could you glean? What cool stats could you produce? What stories could you tell with charts and plots?

So, now we come to your homework assignment...if you choose to accept it. I've already done the grunt work of writing a SAS program that transforms the raw data (from its XML format) into SAS data sets. I've even written a sample step that produces a simple chart based on the current data.

My plot with SGPANEL

What can you do with this data using SAS? There are two data sets: lenders (over 400,000 records) and loans (over 165,000 records). They contain columns relating to geography (location of lenders and loan recipients), quantity (how many loans, what amounts), categories (loan purpose/industry, gender of recipient), and time (when the loan was granted/funded). You can read about the data on Kiva.org, and then create interesting reports using SAS.

Bonus assignment: can you improve my SAS program that pulls the data into SAS? I promise you: there is lots of room for optimization. (If I held off of this post until I perfected it, it would be ready for World Statistics Day 2011.) My implementation uses the XML libname engine, DATA step, and PROC SQL. It could be more automated (download the zip file with FILENAME URL, extract and process) and more efficient (faster appends, perhaps joining and summarizing for easier analysis). The program encounters a few errors when it runs, probably due to character encoding in the XML data. What would you do differently?

Here's how you can get started:

  • Download my SAS program and XML map files from this ZIP file here (small, just about 3K).
  • Extract the ZIP file to a new folder that your SAS session can access
    as the Kiva "root" folder (example: "C:publicKiva" or "/u/userid/Kiva").
  • Download the data snapshot from Kiva.org (big, about 150MB ZIP file). You need the XML format (not the JSON format).
  • Extract the data snapshot files into your Kiva "root" folder.
  • Modify my kivaProgram.sas file to set the Kiva data root folder, and set the number of
    loan XML files and lender XML files (as described in the comments in the program).

(By the way, I wrote this program entirely using SAS Enterprise Guide 4.3. So I know that you can run it from there, or within whatever SAS 9.2 environment you have access to.)

What better way to celebrate World Statistics Day than to compute some statistics for the world? Post your experiences back here in the comments, or use sasCommunity.org to share more details and post the link.

tags: gptw, kiva.org, ragan communications, SAS business analytics, SAS programming, SGPANEL, World Statistics Day, XML libname

3 Comments

  1. AnnMaria
    Posted October 9, 2010 at 4:28 pm | Permalink

    This is really awesome. I'm in the middle of a couple of projects right now but I will get back to check out the KIva data for sure, probably sooner rather than later.

  2. Eric Hill
    Posted October 12, 2010 at 4:09 pm | Permalink

    Since today marks the launch of JMP 9, I took Chris's SAS code and ran it in JMP to generate the lenders and loans data sets and then brought them into JMP to see if JMP could discover anything about the data. Here are a couple screencasts just to give you an idea how you might get started analyzing this data with JMP 9:

    http://screencast.com/t/dsIu4vjEoT
    http://screencast.com/t/h8O3fgo2

    Thanks,

    Eric

  3. Audi
    Posted October 14, 2010 at 8:13 am | Permalink

    Very nice Eric, impressive JMP 9!

    P.S. My first time to see Jing. Thanks for sharing.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>