Using SAS to convert IP addresses into numerical IP values

11

As citizens of the Internet, we are all familiar with IP addresses -- probably more so than our Internet founding fathers had ever intended. These addresses are typically represented in a 4-piece segmented list of numbers separated by dots. Here is an example: "149.173.5.120".

Each segment is called an octet because it contains 8 (count 'em, eight!) bits. The four-octect IP address is part of the IPv4 standard.

Note: There is a newer IPv6 standard (featuring 16 octets) that many newer networks use and which allows for more addresses. This has become necessary because all new consumer products are required by law to connect to the Internet. (I think that each of my daughter's "Polly Pocket" pieces can connect to WiFi.) But in this article I'm ignoring IPv6.

UPDATE 2020-Sep-04: My colleague Agata Bogacki has written a newer article that includes IPv6 and IPv4 and techniques to map to a geolocation.

The easy-to-read segmented IP address is actually a 32-bit number, and sometimes it is useful to convert the display value into its numeric form. For example, consider the databases that help you to map an IP address to a geographic location in the world. These databases use a numerical range to map an address to a country or city. (For more on range-based geocoding, see this topic in the PROC GEOCODE documentation.)

SAS code for converting IPv4 values to numbers

Here is a SAS program that converts a character-based, 4-segment IP address into its equivalent numeric value. It uses the SCAN function, a DATA step ARRAY, and a little bit of math to do the work:

/* Calculate the numerical IP from "segmented" IP address            */
/* Example: (from Right to Left)                                     */
/* 1.2.3.4 = 4 + (3 * 256) + (2 * 256 * 256) + (1 * 256 * 256 * 256) */
/*   is 4 + 768 + 13,1072 + 16,777,216 = 16,909,060                  */
data ip_numbers (keep=ip_address ip_numeric);
  infile datalines dsd;
  length ip_address $ 20 ip_numeric 8;
  input ip_address;
  array ip_part {4};
  do i = 1 to 4;
    ip_part{i} = scan(ip_address,i,'.');
  end;
  ip_numeric = ip_part{4} +
    (ip_part{3} * 256) +
    (ip_part{2} * 256 * 256) +
    (ip_part{1} * 256 * 256 * 256);
datalines;
115.85.65.148
117.203.114.198
118.96.201.156
119.247.220.11
12.201.116.58
128.2.38.96
128.204.197.27
128.204.207.83
134.102.237.2
141.155.113.98
169.2.124.79
172.16.26.231
172.16.30.229
173.234.211.69
176.63.76.232
178.157.198.132
178.32.145.44
178.32.177.184
178.33.174.213
178.63.199.204
184.82.208.149
188.165.187.71
;
run;

Here's the output:

Mapping IP address to a geo location

With this mapping, I can then combine my collection of IP addresses with one of the IP-to-geolocation databases that are available. (SAS provides a macro to help work with MaxMind, which you can learn about in this topic.) Here's a sample result:

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

11 Comments

  1. Neat piece of code! I didn't know about the MaxMind autocall macro. As usual, thanks for the tip!

    One observation though... The variable name, ip_number in your output dataset screenshot doesn't match the variable name, ip_numeric in your keep= data set option...

    • Chris Hemedinger
      Chris Hemedinger on

      Good catch Michelle! The discrepancy is because I built the DATA step as a standalone example, whereas the second image is from my actual production use of this technique. I create reports on our blogs.sas.com activity, including a graphical map that shows where the comments are coming from.

      Chalk another one up for Queensland, AU!

  2. Chris -- we actually used this as part of a cybersecurity project we did with the Navy back in 2004/2005. We were working with large volumes of network sensor data processed through an event correlation engine. We used DI Studio to administer all of our data management activities and the target data store was SPDS. Our IP addresses came across in IP/Port combinations in the format x.x.x.x:y where x is the IP octet and y is the port. We reported on a number of things including country of origin so the numeric equivalent of the IP was important. Here was the code used for that customer implementation:

    data spdsdata.dest_ip_pair_new;
     set spdsdata.dest_ip_port_pair_summary_old;
     length dip $15; label dip="Destination IP";
     length dport $10; label dport="Destination Port";
     length dipnum 8; label dipnum="Destination IP Number";
     length severity 4.; label severity="Severity";
     dip = scan(dip_port,1,":");
     dport = scan(dip_port,2,":");
     dipnum = 16777216*(input(scan(dip_port,1,".:"),8.)) + 65536* 
      (input(scan(dip_port,2,".:"),8.)) + 256*(input(scan(dip_port,3,".:"),8.)) + 
      (input(scan(dip_port,4,".:"),8.)); 
    run;
    

  3. Here are complimentary PROC FCMP functions I wrote to convert back and forth between numeric and dotted IP addresses. You can remove the INPUT and PUT functions if you don't mind getting the conversion log messages.

    PROC FCMP;
       FUNCTION IPAddressStrConvert( ipaddr $);
          RETURN( BLSHIFT( INPUT( SCAN( ipaddr, 1, '.'), BEST4.), 24) +
                  BLSHIFT( INPUT( SCAN( ipaddr, 2, '.'), BEST4.), 16) +
                  BLSHIFT( INPUT( SCAN(  ipaddr, 3, '.'), BEST4.), 8) +
                  INPUT( SCAN( ipaddr, 4, '.'), BEST4.) );
       ENDSUB;
    
       FUNCTION IPAddressNbrConvert( ipaddr) $ 15;
          RETURN( CATX( '.', PUT( BRSHIFT( ipaddr, 24), 3.0),
                             PUT( BAND( BRSHIFT( ipaddr, 16), 255), 3.0),
                             PUT( BAND( BRSHIFT( ipaddr, 8), 255), 3.0),
                             PUT( BAND( ipaddr, 255), 3.0) ) );
       ENDSUB;
    RUN;
    

  4. The initial datastep can easily be simplified as follows.
    It demonstrates that you can easily re-read the same line both as a character string and as four distinct number simultaneously as follows:

    1. using the @1 you start reading again the same line
    2. by use of DLM='.' you define separator between adjacent columns
    3. by re-writing your formula you only multiply 3 times i.s.o. 6 times.

    data IP_numbers (keep = IP IPnum); infile datalines dlm='.' missover; input IP $char20. @1 p1 p2 p3 p4; IPnum = ((p1 * 256 + p2) * 256 + p3) * 256 + p4; datalines; 115.85.65.148 117.203.114.198 118.96.201.156 ... ...

  5. Chris,
    Thanks for sharing spiffing tip!
    Could you show the detailed SAS code to get your final result in terms of "Mapping IP address to a geo location"?

    • Chris Hemedinger
      Chris Hemedinger on

      Ethan,

      When I did this years ago for IPv4, I used the MaxMind data and the IP ranges mapped to countries. I made a custom format in SAS that would map an IP address based on the range to a country.

      DATA WORK._EG_CFMT;
        SET IP2LOC.CITYBLOCKS /* from MaxMind macro */
        (KEEP=startIpNum endIpNum locId 
        RENAME=(startIpNum=start endIpNum=end locId=label));
          RETAIN fmtname "ip2loc" type "N";
      RUN;
      
      /* --------------------------------------------------------------------
         Creates a new format based on the data values contained within the
         source data set.
         -------------------------------------------------------------------- */
      PROC FORMAT LIBRARY=WORK CNTLIN=WORK._EG_CFMT;
      RUN;
      

      But if I were doing this today, I'd use the technique shared by Agata -- which covers IPv6 as well.

Back to Top