I met many SAS programmers at the 2019 SAS Global Forum who geocode addresses using ArcGIS. Did you know that street address locations and other things can be found using the SAS GEOCODE procedure?
PROC GEOCODE gives you coordinates for address, ZIP codes, ZIP+4 codes and cities. You can even geocode IP addresses and do custom geocoding for things like sales territories.
Starting with SAS 9.4M5, you get PROC GEOCODE with Base SAS or the SAS University Edition. Before SAS 9.4m5, you had to order SAS/GRAPH to get the procedure.
In this blog, you will setup and run PROC GEOCODE to find the latitude and longitude of street addresses in the United States.
SET UP FOR STREET GEOCODING
PROC GEOCODE uses United States Census TIGER data for street geocoding, but this data must be preprocessed so SAS can use it.
You can download preprocessed TIGER data from SAS. Get the prebuilt U.S. street lookup data from the SAS Maps and Geocoding website.
You will download a ZIP file includes ten CSV files, a read me file and a SAS program to create the SAS lookup data. These CSV files will take about 15 Gb of disk space and the final data sets will take about 16 Gb.
CREATE YOUR LOOKUP DATA SETS
Unzip the StreetLookupData.zip file downloaded to a folder you choose. Next, bring up SAS and edit the ImportCSVfiles.sas program. This program will create your street lookup data for PROC GEOCODE.
Change these macro variables settings in ImportCSVfiles.sas: /*--- Edit macro variables to specify locations on your system. */ %let PATHIN=C:\Geocode; /* Directory with files from the zip archive */ %let PATHOUT=C:\Geocode\Data; /* Location to write geocoding data sets */
Set PATHIN and PATHOUT to directories you want, but make sure that you create the directory specified in PATHOUT before running ImportCSVfiles.sas.
Next, use information from the ReadMe.txt file for these values:
/*--- Get metadata from the ReadMe.txt file. */ %let source= US Census Bureau TIGER/Line files; /* Original source for the lookup data */ %let release=2018; /* Year original data published */
Change the name of the lookup data sets, or leave them set to USM, USS and USP:
/*--- Set data set names. */ %let MDS=USM; /* First geocoding lookup data set name */ %let SDS=USS; /* Second geocoding lookup data set name */ %let PDS=USP; /* Third geocoding lookup data set name */
Now, run the SAS program with your settings. This can take several minutes to process the CSV files.
After the program finishes, a library named Lookup is ready to geocode street addresses.
GEOCODING STREET ADDRESSES
Use PROC IMPORT to read the CSV file and create a SAS data set:
proc import datafile="C:\movies\Chicago_Park_District__Movies_in_the_Parks_2019_-_Calendar.csv" out=movies dbms=csv replace; datarow=2; /* start reading the data in the second row */ getnames=yes; /* generate SAS variable names from first row */ guessingrows=20; /* 20 is the default, it works fine for this data */ run;
You look at the data and realize that city, state and ZIP code variables are not included. Add city and state using a data step:
data movies; set movies; city="Chicago"; state="IL"; run;
With any geocoding, the more information you provide, the better your results. If you have street addresses with the city, state and ZIP code, include them in your input data.
You can now run PROC GEOCODE. If you have your variable names use the PROC GEOCODE defaults, there is not much to code. The code below outlines all of the options:
proc geocode method=street /* street geocoding used here */ /* lookup data sets */ lookupstreet=lookup.usm /* preprocessed TIGER lookup data set */ lookupcity=sashelp.zipcode /* set this if you do not have SAS/GRAPH */ /* input data and variables */ data=movies /* input data set */ addressvar=park_address /* set this if "address" is not the variable name */ /* addresscityvar= set this if "city" is not the variable name */ /* addressstatevar= set this if "state" is not the variable name */ /* addresszipvar= set this if "zip" is not the variable name */ /* include variables from the lookup.uss data in your output */ attributevar=(side) /* include the side of the street in your output */ out=geocoded; /* output data set */ run;
A Geocoding Summary is written to the SAS log:
_________ Geocoding Summary _____________________________ Address data: WORK.MOVIES Output data: WORK.GEOCODED STREET lookup data: LOOKUP.USM CITY lookup data: SASHELP.ZIPCODE ZIP lookup data: SASHELP.ZIPCODE Geocoding method: Street level Run date: 09May2019 Obs processed: 210 Elapsed time: 00:00:01 Obs per minute: 7,094 Street matches: 209 ZIP matches: 0 City matches: 1 Not matched: 0 _________________________________________________________
That looks pretty good. You only had one non-street match. Examine your output data:
In addition to latitude and longitude, PROC GEOCODE provides variables to help you determine if good matches are found: _MATCHED_, _NOTES_ and _SCORE_. You also get your geocoded matches: M_ADDR, M_CITY, M_STATE, M_ZIP.
In the data above, you see that 3035 E. 130st St. did not get a street match. This is because the street should be 130th street instead of 130st street.
Notice how _SCORE_ is higher when a ZIP code is provided for 6205 N. Sheridan Rd.
Also, notice that 3200 N. Lake Shore Dr. received a lower score and the M_ADDR did not match the input address. The _NOTES_ may give you some hints about what occurred. In your output ENDNM indicates that a street number at the end of the street is found, and NODPM says that the direction prefix did not match. See the GEOCODE Procedure documentation for more information.
You may want to check the geocoded output for any notes that begin with “NO” to make sure they are not bad matches.
In your example, you want to take the kids to see an outdoor movie. Limit the data to only include the G-rated films, and use PROC SGMAP to map the locations.
/* limit data to G-rated movies */ data geocoded2; set geocoded; where rating='G'; run; %let url = http://services.arcgisonline.com/arcgis/rest/services; title1 'Locations of Outdoor G-rated Movies in Chicago'; proc sgmap plotdata=geocoded2; esrimap url="&url/World_Topo_Map"; scatter x=x y=y / group=title markerattrs=(symbol=circlefilled size=15); run;
Then, use PROC PRINT to list the date, title and address for the movies.
You created a movie guide that includes a map and a list of summer movies using geocoded addresses.
In the next blog you will download your own TIGER files and preprocess them, so your lookup data will be smaller and more up to date.
For more information, here are some SAS Global Forum white papers:
“PROC GEOCODE: Finding Locations Outside the U.S.” - SAS Global Forum 2013
“PROC GEOCODE: Now with Street-Level Geocoding” - SAS Global Forum 2010
“Distances: Let SAS® Do the Heavy Lifting” - SAS Global Forum 2017