PROC SGMAP using CSV Files, GINSIDE and GEODIST to Plot Places of Interest

1

As mentioned in other PROC SGMAP blogs, several SAS/GRAPH procedures have been moved to 9.4M6 Base SAS to be used with PROC SGMAP. You can use these to create PROC SGMAP output even when using the free SAS University edition.

In this blog you will use the:

  • IMPORT procedure for CSV files
  • GINSIDE procedure
  • GEODIST function

You will also combine datasets and use multiple PROC SGMAP statements to plot data onto background maps.

Here is the complete code for this example.

Importing CSV Data

SAS Institute has a training center located on 7th Avenue in New York, but lunch is not provided. Plot the free Wi-Fi hotspots and some sidewalk cafés near the training center so you can have lunch and browse the internet.

New York City provides many data sets on the NYC Open Data website. If you do a search for Wi-Fi Hotspot, a page for the NYC public WiFi hotspot locations is displayed and you export the data in CSV format.

Download the data and notice that first row of the data set are the column names, and the data starts on the second row. Use the PROC IMPORT DATAROW and GETNAMES statements to create a data set with well named variables.

proc import datafile="u:\hotspots\NYC_Free_Public_WiFi_03292017.csv"
     out=hotspots
     dbms=csv
     replace;
     datarow=2;       /* start reading the data in the second row       */
     getnames=yes;    /* generate SAS variable names from first row     */
     guessingrows=20; /* 20 is the default, it works fine for this data */
run;

To see the hotspots, plot the data on an OpenStreetMap:

title1 'New York City Wi-Fi Hotspots';
proc sgmap plotdata=hotspots noautolegend;
  openstreetmap;
  scatter x=lon y=lat;
run;

All of the Wi-Fi Hotspots in New York
That is a lot of Wi-Fi hotspots.

PROC GINSIDE to Limit Data

SAS Institute has a training center located on 7th Avenue in New York County. Try limiting the data with PROC GINSIDE using a polygon provided by New York City.

You can use a CSV file for polygon data, but it is much easier to use a shapefile if it is available. Download the shapefile for the Neighborhood Tabulation Areas and use PROC GINSIDE. Note that PROC GINSIDE requires coordinates be named X and Y, so you will use a data step to rename these.

proc ginside 
  data=hotspots_xy      /* input dat set */
  map=boros             /* map polygon used to compare to input data */
  out=hotspots_xy       /* output data set */
  insideonly;           /* only points inside the map polygon are kept */
  id boro_code;         /* variable from map data set */ 
run;

SGMAP SCATTER Data Limited by PROC GINSIDE
That helped, but there are still many hotspots displayed. You could get better results with a smaller polygon or by using the GEODIST function.

GEODIST Function to Limit Data

The GEODIST function allows you to set a distance variable in your data using the latitude and longitude of the SAS Training Center like this:

data hotspots (keep=lat lon provider distance);
  set hotspots;
  distance = geodist(lat, lon, 40.7618356, -73.98218050000002, 'DM');
run;

The latitude and longitude variables, lat and lon, from the input data set are compared to the SAS latitude and longitude to create the distance variable. The DM option specifies that the input values are in degrees and the output value is in miles.

Add the SAS location using a data step, and limit the data to a quarter mile using the distance from the SAS Training Center:

data both; set both;
  where distance le 0.25; /* miles */
run;

Plot the data using PROC SGMAP:

/* This is the beginning of the Esri URL */
%let url = http://services.arcgisonline.com/arcgis/rest/services;
 
proc sgmap plotdata=both noautolegend;
  esrimap url=”&url/World_Topo_Map”;
 
  /* this plots the locations of the hotspots */
  scatter x=lon y=lat / 
    group=provider nomissinggroup
    markerattrs=(symbol=triangledown size=10 color=red);
 
  /* this is to indicate the location of SAS */
  text x=lon y=lat text=saslabel /
    textattrs=(color=blue size=20);
 
run;

SGMAP SCATTER Data Limited by the GEODIST Function
Now you know where the free Wi-Fi hotspots are located close to the SAS Training Center.

Next, add some sidewalk cafés locations so you can use the free Wi-Fi while having lunch. New York City also provides data for sidewalk café licenses.

See the SAS program for importing the café CSV data and add the distance from the SAS Training Center as you did before.

When you combine the datasets, different variables are created for each item plotted in PROC SGMAP. In the dataset below, PROVIDER is for the hotspot locations, sas is for the SAS Training Center, and BUSINESS_NAME2 is for the sidewalk cafés.

You can use the missing values to help plot locations. Here is a snapshot of the data:

Data Set Showing Missing Values
Use multiple statements to plot the data in PROC SGMAP, using GROUP with NOMISSINGGROUP in the SCATTER statements to keep missing data from being plotted.

proc sgmap plotdata=both;
  esrimap url="&url/World_Topo_Map";
 
  /* this plots the locations of the hotspots */
  scatter x=lon y=lat / 
    group=provider nomissinggroup
    markerattrs=(symbol=triangledown size=10 color=red);
 
  /* this plots the locations of the cafes */
  scatter x=lon y=lat / name='cafes'
    group=business_name2 nomissinggroup
    markerattrs=(symbol=starfilled size=20);	
 
  /* this is to indicate the location of SAS */
  text x=lon y=lat text=saslabel /
    textattrs=(color=blue size=20);
 
  /* only include cafes in the legend */
  keylegend 'cafes' / title='Sidewalk Cafes';
run;

SGMAP SCATTER Data Limited by the GEODIST Function
Now you know where to go eat lunch and get free Wi-Fi, but the map is not centered where the sidewalk cafés are located.

Use the coordinates to better zoom into the sidewalk cafés and run PROC SGMAP again:

data both; set both;
  if lat ge 40.7618356 and
     lon ge -73.983654
  then output;
run;

SGMAP SCATTER Data Limited by the GEODIST Function
From this example, you can see that the GEODIST function is a powerful tool for limiting what you plot on maps.

A future blog will include an example of PROC GREDUCE to limit data plotted by PROC SGMAP.

Share

About Author

Kelly Mills

Principal Test Engineer

Kelly Mills is a Principal Test Engineer with over 30 years of manual and automated testing experience in computer, communications and analytics software industries. He's worked at companies such as Alcatel, IBM and SAS Institute.

1 Comment

Back to Top