Geocoding US Street Addresses and Plotting Them with PROC SGMAP

12

I met many SAS programmers at the 2019 SAS Global Forum who geocode addresses using ArcGIS. Did you know that street address locations and other things can be found using the SAS GEOCODE procedure?

PROC GEOCODE gives you coordinates for address, ZIP codes, ZIP+4 codes and cities. You can even geocode IP addresses and do custom geocoding for things like sales territories.

Starting with SAS 9.4M5, you get PROC GEOCODE with Base SAS or the SAS University Edition. Before SAS 9.4m5, you had to order SAS/GRAPH to get the procedure.

In this blog, you will setup and run PROC GEOCODE to find the latitude and longitude of street addresses in the United States.

Here is the PROC GEOCODE and PROC SGMAP code for this example.

SET UP FOR STREET GEOCODING

PROC GEOCODE uses United States Census TIGER data for street geocoding, but this data must be preprocessed so SAS can use it.

You can download preprocessed TIGER data from SAS. Get the prebuilt U.S. street lookup data from the SAS Maps and Geocoding website.

You will download a ZIP file includes ten CSV files, a read me file and a SAS program to create the SAS lookup data. These CSV files will take about 15 Gb of disk space and the final data sets will take about 16 Gb.

CREATE YOUR LOOKUP DATA SETS

Unzip the StreetLookupData.zip file downloaded to a folder you choose. Next, bring up SAS and edit the ImportCSVfiles.sas program. This program will create your street lookup data for PROC GEOCODE.

Change these macro variables settings in ImportCSVfiles.sas:
/*--- Edit macro variables to specify locations on your system.            */                                                           
%let PATHIN=C:\Geocode;       /* Directory with files from the zip archive */                                                           
%let PATHOUT=C:\Geocode\Data; /* Location to write geocoding data sets     */

Set PATHIN and PATHOUT to directories you want, but make sure that you create the directory specified in PATHOUT before running ImportCSVfiles.sas.

Next, use information from the ReadMe.txt file for these values:

/*--- Get metadata from the ReadMe.txt file. */                                                                             
%let source= US Census Bureau TIGER/Line files; /* Original source for the lookup data   */                                                                             
%let release=2018;                              /* Year original data published          */

Change the name of the lookup data sets, or leave them set to USM, USS and USP:

/*--- Set data set names.                                */                                                                             
%let MDS=USM;   /* First geocoding lookup data set name  */                                                                             
%let SDS=USS;   /* Second geocoding lookup data set name */                                                                             
%let PDS=USP;   /* Third geocoding lookup data set name  */

Now, run the SAS program with your settings. This can take several minutes to process the CSV files.

After the program finishes, a library named Lookup is ready to geocode street addresses.

GEOCODING STREET ADDRESSES

You can find some interesting data on the city of Chicago Data Portal. Here is a data set for the 2019 movies in the parks.

Use PROC IMPORT to read the CSV file and create a SAS data set:

proc import datafile="C:\movies\Chicago_Park_District__Movies_in_the_Parks_2019_-_Calendar.csv"                                  
     out=movies                                                                                                                         
     dbms=csv                                                                                                                           
     replace;                                                                                                                           
     datarow=2;       /* start reading the data in the second row       */                                                              
     getnames=yes;    /* generate SAS variable names from first row     */                                                              
     guessingrows=20; /* 20 is the default, it works fine for this data */                                                              
run;

You look at the data and realize that city, state and ZIP code variables are not included. Add city and state using a data step:

data movies; set movies;                                                                                                                
  city="Chicago";                                                                                                                       
  state="IL";                                                                                                                           
run;

With any geocoding, the more information you provide, the better your results. If you have street addresses with the city, state and ZIP code, include them in your input data.

You can now run PROC GEOCODE. If you have your variable names use the PROC GEOCODE defaults, there is not much to code. The code below outlines all of the options:

proc geocode                                                                                                                            
  method=street              /* street geocoding used here */                                                                                                           
 
  /* lookup data sets */                                                                                                                
  lookupstreet=lookup.usm    /* preprocessed TIGER lookup data set    */                                                                                
  lookupcity=sashelp.zipcode /* set this if you do not have SAS/GRAPH */                                                                
 
  /* input data and variables */                                                                                                        
  data=movies              /* input data set */                                                                                         
  addressvar=park_address  /* set this if "address" is not the variable name */                                                         
  /* addresscityvar=          set this if "city" is not the variable name    */                                                         
  /* addressstatevar=         set this if "state" is not the variable name   */                                                         
  /* addresszipvar=           set this if "zip" is not the variable name     */                                                         
 
  /* include variables from the lookup.uss data in your output */                                                         
  attributevar=(side)      /* include the side of the street in your output  */                                                         
 
  out=geocoded;            /* output data set */                                                                                        
run;

A Geocoding Summary is written to the SAS log:

    _________ Geocoding Summary _____________________________
    Address data:            WORK.MOVIES
    Output data:             WORK.GEOCODED
    STREET lookup data:      LOOKUP.USM
    CITY lookup data:        SASHELP.ZIPCODE
    ZIP lookup data:         SASHELP.ZIPCODE
    Geocoding method:        Street level
    Run date:                09May2019
    Obs processed:           210
    Elapsed time:            00:00:01
    Obs per minute:          7,094
    Street matches:          209
    ZIP matches:             0
    City matches:            1
    Not matched:             0
    _________________________________________________________

That looks pretty good. You only had one non-street match. Examine your output data:

Sample of Geocoded Addresses

In addition to latitude and longitude, PROC GEOCODE provides variables to help you determine if good matches are found: _MATCHED_, _NOTES_ and _SCORE_. You also get your geocoded matches: M_ADDR, M_CITY, M_STATE, M_ZIP.

In the data above, you see that 3035 E. 130st St. did not get a street match. This is because the street should be 130th street instead of 130st street.
Notice how _SCORE_ is higher when a ZIP code is provided for 6205 N. Sheridan Rd.

Also, notice that 3200 N. Lake Shore Dr. received a lower score and the M_ADDR did not match the input address. The _NOTES_ may give you some hints about what occurred. In your output ENDNM indicates that a street number at the end of the street is found, and NODPM says that the direction prefix did not match. See the GEOCODE Procedure documentation for more information.

You may want to check the geocoded output for any notes that begin with “NO” to make sure they are not bad matches.

In your example, you want to take the kids to see an outdoor movie. Limit the data to only include the G-rated films, and use PROC SGMAP to map the locations.

/* limit data to G-rated movies */                                                                                                      
data geocoded2; set geocoded;                                                                                                           
  where rating='G';                                                                                                                     
run;                                                                                                                                    
 
%let url = http://services.arcgisonline.com/arcgis/rest/services;                                                                       
 
title1 'Locations of Outdoor G-rated Movies in Chicago';                                                                                
 
proc sgmap plotdata=geocoded2;                                                                                                          
  esrimap url="&url/World_Topo_Map";                                                                                                    
  scatter x=x y=y /                                                                                                                     
    group=title                                                                                                                         
    markerattrs=(symbol=circlefilled size=15);                                                                                          
run;

PROC SGMAP SCATTER of Movie Locations

Then, use PROC PRINT to list the date, title and address for the movies.

PROC GEOCODE Output of Movie Information

You created a movie guide that includes a map and a list of summer movies using geocoded addresses.

In the next blog you will download your own TIGER files and preprocess them, so your lookup data will be smaller and more up to date.

For more information, here are some SAS Global Forum white papers:

“PROC GEOCODE: Finding Locations Outside the U.S.” - SAS Global Forum 2013
“PROC GEOCODE: Now with Street-Level Geocoding” - SAS Global Forum 2010
“Distances: Let SAS® Do the Heavy Lifting” - SAS Global Forum 2017

Share

About Author

Kelly Mills

Principal Test Engineer

Kelly Mills is a Principal Test Engineer with over 30 years of manual and automated testing experience in computer, communications and analytics software industries. He's worked at companies such as Alcatel, IBM and SAS Institute.

Related Posts

12 Comments

  1. Peter Lancashire on

    Does SAS offer anything outside the USA? Do you have any suggestions how we might geocode addresses globally?

    • Kelly Mills

      Hi Peter,

      You can also do street geocoding for Canada, but not other countries. PROC GEOCODE will do postal code geocoding for other countries. See the SAS Maps Online website for more information: http://support.sas.com/rnd/datavisualization/mapsonline/html/geocode.html#zip

      There are several websites and programs that do street geocoding for other countries. I do not have any suggestions, but the World Geocoder for ArcGIS looks like a good alternative.

      If you have a limited number of addresses, you may want to try some web sites and build your data set. For example, I found the latitude and longitude of a McDonald's in Paris (140 Av. des Champs-Élysées, 75008 Paris, France) using this site: https://gps-coordinates.org/

      Kelly

  2. Pingback: Esri integration with SAS Visual Analytics: Geocoding - SAS Users

  3. Is there any sample code available for subsetting the resulting datasets? E.g., keeping only 1 state's data? My guess it that would utilize the MapIDNameAbrv, First & Last fields in the USM file and the N & Start fields from the USS file.

  4. Hi Jonathan,

    The GEOCODE procedure does not do reverse geocoding. That is if you have the latitude and longitude, address or city, SAS does not have a procedure that gives you the county, state or country.

    Kelly

  5. Do you know how often the Street Lookup data gets refreshed? Is 2018 the latest data? Will a 2020 version be created after the Census is compiled?

    • Currently the 2018 data is the latest we have available. We hope to have updated data available soon.

  6. Vaishali Zilpe on

    Great Article. I recently implemented this in one of my code and it was simple to execute and very well explained.

Back to Top