In a discussion on a Reddit map group, someone claimed "Maine is the US state closest to Africa." Is that true? Can I use my SAS mapping tools to confirm, or bust, this myth? Follow along, as I dive in! ...

## The Map in my Head (Wrong!)

My gut instincts tell me that the United States is in the Northern Hemisphere, and Africa is in the Southern Hemisphere, and therefore the southeastern US states will be closest to Africa - right?!?

OK - I know that's not really true, because much of Africa is actually north of the equator. But yet, that's how I think of the two areas in relation to each other, in the (misguided) mental map in my head. Below is what the map in my head looks like (note that I applied huge latitude and longitude shifts to Africa, to create this inaccurate map).

## Actual Map

Rather than relying on the (inaccurate) map in my head, let's plot the US and Africa together on an actual map.

I grabbed a copy of the US map (and created a new ID variable with the state names), and grabbed a copy of the Africa continent map (and created a new ID variable with the country names), and combined them into one map. I then used Proc SGmap to plot the combined map polygons with an Openstreetmap background map. This map shows that much of Africa is on the same latitude as the US, and it's plausible that perhaps Maine really is the closest state to Africa.

proc sgmap maprespdata=both_attr mapdata=both_maps noautolegend;
openstreetmap;
choromap id / discrete mapid=id;
run;

## Actual Distances

Even with a proper map, it's still difficult to determine exactly/definitively which US state is closest to Africa. You might be tempted to print the map, draw lines between each state and Africa, and then measure the lines with a ruler. But your results would be inaccurate! ...

You can't just measure and compare distances on that map, because when you create a flat rectangular map from a sphere/globe, you have to stretch some areas more than others. In technical terms, we call this 'stretching' projecting the map - the Openstreetmap above uses a Pseudo-Mercator projection to control how they stretch their map, for example. And doing simple measurements of the stretched/projected map is not going to provide numbers you can compare in a fair way, because some areas are stretched more than others.

But there's an alternative! We can use the SAS geodist() function to calculate the geodetic distance between two latitude/longitude coordinates. Therefore, we can pick a lat/long coordinate along the edge of Maine, and a coordinate along the edge of Africa, and then use the geodist() function to calculate the distance between them. It's difficult to know which coordinates to use, therefore I perform the calculation for all of them, and then pick the shortest (the "brute force" solution is often the easiest, when there's plenty of fast & cheap computing power available!)

First, I created a dataset, pairing every point along each of the US state with every point along each African country border. This ran in about 5 seconds on my laptop, and created several million pairs of points.

proc sql noprint;
create table distances as
select unique
states.id as id_from, states.lat as lat_from, states.long as long_from,
africa.id as id_to, africa.lat as lat_to, africa.long as long_to
from states, africa;
quit; run;

I then used the geodist() function in a data step to calculate the distance between each of the pairs of coordinates (excluding the points with missing values - those are special data points to denote lakes/holes and such). My laptop plowed through the several million pairs of points, and calculated the distance between every pair, in about 3 seconds.

data distances; set distances (where=(lat_from^=. and long_from^=. and lat_to^=. and long_to^=.));
distance_miles=geodist(lat_from, long_from, lat_to, long_to, 'DM');
run;

To find the pair with the minimum distance, I sorted the list by distance, and took the first pair (obs=1). And while creating the dataset, I restructure the data a bit, so the pair of points become the endpoints for a line I'll overlay on the map.

proc sort data=distances out=distances;
by distance_miles;
run;
data shortest; set distances (obs=1);
lat=lat_from; long=long_from; output;
lat=lat_to; long=long_to; output;
run;

Here's what the 1 observation (obs=1) of the data with shortest distance looks like:

And here's what it looks like with the data modified into two obsns, so I can draw a line between the lat/long coordinates:

## Plotting Results

Well, technically the text data in the table above tells me what I need to know - Maine is indeed the closest US state to Africa. And we even have the extra information of knowing exactly what lat/long coordinates along the borders are the closest, and what the distance is (in miles). But it sure would be a lot more satisfying to see it plotted on a map, rather than just seeing text and numbers, eh?!?

So I added a tiny bit more code to my previous map example to overlay a line. I specify a plotdata= dataset, and then add a series statement to plot the lat/long coordinates joined with a line.

proc sgmap maprespdata=both_attr mapdata=both_maps plotdata=shortest noautolegend;
openstreetmap;
choromap id / discrete mapid=id lineattrs=(thickness=1 color=gray88);
series x=long y=lat / lineattrs=(color=red);
run;

## You Want More?

If you like this kind of geographical trivia topic, you might also enjoy my blog post about 27 US states being farther north than Canada! And for those of you who are SAS programmers, here's the complete SAS code I used to create this example.

Share

The Graph Guy!

Robert has worked at SAS for over a quarter century, and his specialty is customizing graphs and maps - adding those little extra touches that help them answer your questions at a glance. His educational background is in Computer Science, and he holds a BS, MS, and PhD from NC State University.