Santa's route in SAS Visual Analytics

11
Being so close before Christmas I thought it would be a good idea to see what route Santa Claus is planning this year. Not just because I'm living in Australia and Santa usually comes in t-shirts & shorts but also because it's a long way to get down here. So maybe SAS can help here by applying some advanced analytics to optimize his busy tour schedule?

So where to start. A website called Travel by GPS has a web page dedicated to tracking Santa. They also offer to download Santa's route as Google Earth KML file. Perfect! So let's have a look what the current route looks like.

Creating XML map files for Google Earth KML

The simplest way of importing XML documents into SAS is by using the XML Libname Engine. The way the engine maps XML tags into individual columns is by using a XML map file. The following code snippet creates a temporary map file which is referenced by the XML libname statement.

/* Generate the XML map to import graph attribute keys from the XML */
filename keymap temp;
data _null_; 
  infile datalines truncover; 
  file keymap; 
  input line $1000.; 
  put line; 
datalines4; 
<?xml version="1.0" encoding="utf-8"?>
<SXLEMAP name="KMLMAP" version="2.1">
  <TABLE name="route">
    <TABLE-PATH syntax="XPathENR">/kml/Document/Placemark</TABLE-PATH>

    <COLUMN name="name">
      <PATH syntax="XPathENR">/kml/Document/Placemark/name</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>50</LENGTH>
    </COLUMN>
    <COLUMN name="description">
      <PATH syntax="XPathENR">/kml/Document/Placemark/description</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>500</LENGTH>
    </COLUMN>
    <COLUMN name="coordinates">
      <PATH syntax="XPathENR">/kml/Document/Placemark/Point/coordinates</PATH>
      <TYPE>character</TYPE>
      <DATATYPE>string</DATATYPE>
      <LENGTH>50</LENGTH>
    </COLUMN>
  </TABLE>
</SXLEMAP>
;;;;
filename graph "Naughty_and_Nice.kml" encoding="utf-8";
libname graph xmlv2 xmlmap=keymap access=readonly;

Import Google Earth KML data into SAS

Since the libname already gives us access to the XML document as SAS data set - it would be just a matter of referencing the table via a Data Step. However looking at the actual content - it appears columns such as description contain some unnessary HTML tags.

So let's extend a data step here to perform some minor data manipulation using Perl Regular Expressions. We are also going to split the coordinates into two individual numeric columns so we can reference these later within SAS Visual Analytics.

data route;
  length stop 8.;
  set graph.route end=last;
  stop = _n_;
 
  /* remove carriage returns */
  description = prxchange('s/(\n|\r)//', -1, description);
 
  /* create regular expression to catch the text between the tags*/
  retain re;
 
  if _N_ = 1 then
    do;
      re = prxparse('/(?)(.*?)(?=)/');
    end;
 
  /* extract the text */
  if prxmatch(re, description) then
    do;
      description = prxposn(re, 1, description);
    end;
 
  /* split coordinates in lat/lng */
  length lat 8. lng 8.;
  lng = input(scan(coordinates,1,","),8.);
  lat = input(scan(coordinates,2,","),8.);
 
  /* add a link to the next stop */
  length next_stop 8.;
 
  if not last then
    next_stop = stop + 1;
 
  /* throw away things we don't need */
  drop re coordinates;
run;

Calculate distance between two route points

To understand Santa's route it would be beneficial to see the actual distance he needs to travel between two points. SAS provides a great geodist function to do so. And finally we are also going to create a route from one stop to the next by joining the table with itself.

proc sql noprint;
	create table Naughty_and_Nice as
	select 	a.stop label="Stop",
		a.next_stop label="Next Stop",
		a.name label="Client",
		a.description label="Description",
		a.lat label="Latitude",
		a.lng label="Longitude",
		b.lat as next_lat label="Next Latitude",
		b.lng as next_lng label="Next Longitude",
		geodist(a.lat, a.lng, b.lat, b.lng) as distance label="Distance (km)"
	from route a left join route b on a.next_stop eq b.stop
	order by a.stop
;quit;

Explore the data with SAS Visual Analytics

Loading the data into SAS Visual Analytics Explorer and applying coordinates to our route stops reveals the un-optimized route Santa would take if he would simply go from one stop to the next according to the order in our data base. Based on the color legend it appears the maximum distance between route stops can be as long as 18,000 km. Considering the circumference of the earth at the equator is 40,076 km (24,902 mi) it would suggest Santa needs to travel halfway around the world to reach the next stop - clearly we should be able to optimize this route!

Optimizing Santas Route

SAS provides network optimization algorithms as part of SAS/OR. So let's apply the typical Traveling Salesman Problem to our Santa route. The first SQL statement creates a list of all possible pairs of potential stops. The following PROC OPTNET call will take the challenge and create the optimal route for Santa.

/* Create a list of all the possible pairs of potential stops */
proc sql noprint;
	create table Naughty_and_Nice as
	select 	a.stop label="Stop",
		a.name label="Client",
		a.description label="Description",
		a.lat label="Latitude",
		a.lng label="Longitude",
		b.stop as next_stop label="Next Stop",
		b.lat as next_lat label="Next Latitude",
		b.lng as next_lng label="Next Longitude",
		geodist(a.lat, a.lng, b.lat, b.lng) as distance label="Distance (km)"
	from route as a, route as b
	where a.stop ne b.stop
	order by a.stop
;quit;
 
/* Find optimal tour for Santa using OPTNET */
proc optnet
   data_links = data.naughty_and_nice
   out_nodes  = TSPTourNodes;
   data_links_var
      from    = stop
      to      = next_stop
      weight  = distance;
    tsp
      out     = TSPTourLinks;
run;
My slightly ageing computer managed to solve this puzzle in a bit more than an hour.

NOTE: ------------------------------------------------------------------------------------------------
NOTE: Running OPTNET version 13.1.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Data input used 0.23 (cpu: 0.23) seconds
 
NOTE: The number of nodes in the input graph is 551.
NOTE: The number of links in the input graph is 151525.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Processing the traveling salesman problem.
NOTE: The initial TSP heuristics found a tour with cost 189274.24153 using 3.77 (cpu: 3.42) seconds.
NOTE: The MILP presolver value NONE is applied.
NOTE: The MILP solver is called.
NOTE: The MILP solvers symmetry detection found 146616 orbits. The largest orbit contains 15
      variables.
NOTE: The MILP solver added 155 cuts with 542405 cut coefficients at the root.
NOTE: Optimal within relative gap.
NOTE: Objective = 179509.55266.
NOTE: Processing the traveling salesman problem used 4116.06 (cpu: 4090.08) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: Data output used 0.02 (cpu: 0.00) seconds.
NOTE: ------------------------------------------------------------------------------------------------
NOTE: The data set WORK.TSPTOURNODES has 551 observations and 2 variables.
NOTE: The data set WORK.TSPTOURLINKS has 551 observations and 3 variables.
NOTE: PROCEDURE OPTNET used (Total process time):
      real time           1:08:36.57
      cpu time            1:08:31.34
 
NOTE: There were 551 observations read from the data set WORK.TSPTOURLINKS.
NOTE: The data set DATA.TSPTOURLINKS has 551 observations and 3 variables.

Optimal Santa route in SAS Visual Analytics

Reloading our new optimized data set shows the new route Santa should take.

A closer look to North America shows how Santa is travelling across the states.

Combining advanced visual analytics with network optimization algorithms show the great potential a system can deliver. The challenge to find the shortest possible route applies to many industries including planning, logistics, manufacturing, genomics, etc. Although solving the routing problem for Santa Claus has its own special meaning.

Merry Christmas Everyone!

Share

About Author

Falko Schulz

Principal Software Developer, Business Intelligence R&D

Falko Schulz is Principal Software Developer in the SAS Business Intelligence Research and Development division. He works actively on products such as SAS® Visual Analytics further enhancing user experience and analytical capabilities. Falko has a strong background in delivering business analytical applications in a wide range of industries.

Prior to joining the R&D division Falko worked in customer facing roles responsible for critical technical implementations and BI architectural designs. During his 15 years at SAS, Falko worked out of various countries including Germany, Australia and US.

11 Comments

  1. Nice post. Curious about the general workstream of setting up things in VA. Sounds like you had good ol' SAS code which crunched data, then you uploaded the data to the VA (LASR?) server, where it becomes available to users for analytics. Is that right?

    In a real world application (not that santa isn't real), is there typically some sort of ETL job (perhaps even a DI studio job?) in the background which is loading data to the VA server?

    • Falko Schulz

      Hi Quentin, Yes, I just used good old SAS code to perform the data preparation steps as I wanted to show the detail steps involved (and I know how much SAS users love SAS code ;-)). In a typical SAS Visual Analytics environment you would likely use the SAS Visual Analytics Data Builder application to perform most of these steps including loading data into memory (SAS LASR Analytic Server). While you can use any other SAS application (SAS DI Studio, SAS Enterprise Guide or even SAS Base) the very intuitive data builder application will be the easiest one as it is tightly integrated with the rest of the VA suite and helps getting data into VA in a very ETL-fashion way (UI query design, persist queries as job, scheduling, etc.). But yes, to answer your question - after data is loaded into LASR it becomes immediately available for consumption. Hope this helps. Cheers, Falko

  2. Pingback: SAS Tech Talk: Developing the SAS Visual Analytics Explorer - The SAS Dummy

  3. Matthew Galati

    Very nice post Falko! A quick comment on the performance of PROC OPTNET at 13.1. It looks like it took ~1 hour to find the optimal routing. If you are satisfied with 'near optimal' you can use the tsp relobjgap= option. If you set it to relobjgap=0.01, this will stop when the route found is within 1% of provable optimal. For this data, on my machine, PROC OPTNET will find a 'near optimal' solution in about 35 seconds. It spends the rest of the time closing that gap to find the true optimal. For practical problems (like Santa), 'near optimal' is usually "good enough".

  4. Pingback: Santa’s information dashboard - The SAS Training Post

  5. Pingback: SAS/OR - Santa's Gift Assignment Problem

Leave A Reply

Back to Top