Compute the centroid of a polygon in SAS

Recently I blogged about how to compute a weighted mean and showed that you can use a weighted mean to compute the center of mass for a system of N point masses in the plane. That led me to think about a related problem: computing the center of mass (called the centroid) for a planar polygon.

If you cut a convex polygon out of stiff cardboard, the centroid is the position where the polygon would balance on the tip of a needle. However, for non-convex polygons, the centroid might not be located in the interior of the polygon.

For a general planar figure, the mathematical formula for computing the centroid requires computing an integral. However, for polygons there are some well-known equations that involve only the locations of the vertices (provided that the vertices are enumerated in a consistent clockwise or counter-clockwise manner). The Wikipedia article about the centroid give a useful computational formula. It turns out that the (signed) area of a simple polygon with vertices (x₀, y₀), (x₁, y₁), ..., (x_N-1, y_N-1), is given by

In the formula, the vertex (x_N, y_N) is identified with the first vertex, (x₀, y₀). The centroid of the polygon occurs at the point (C_x, C_y), where

A polygon whose vertices are enumerated in a counter-clockwise manner will have a positive value for A. The formula for A will be negative when vertices are enumerated in a clockwise manner.

In the SAS language, arrays use 1-based indexing by convention. Therefore, when implementing the formula in SAS, it is customary for the vertices to be enumerated from 1 to N. The limits in the summation formulas are modified similarly. The (N+1)st vertex is identified with the first vertex.

Centroids are sometimes used as a position to label a country or state on a map. For SAS customers who have a license for SAS/GRAPH, the official SAS %CENTROID macro computes the centroid of a polygonal region. It is designed for use with map data sets, which have variables named ID, X, and Y. My colleague Sanjay Matange provides a similar macro in his blog post "A Macro for Polygon Area and Center."

A vectorized SAS/IML function that computes a centroid

Implementing the centroid formulas in SAS/IML is a good exercise in vectorization. Recall that function is vectorized if it avoids unnecessary loops and instead uses vector and matrix operations to perform the computations. For the area and centroid formulas, you can form a vector that contains all of the values that are to be summed. You can use the elementwise multiplication operator (#) to compute the elementwise product of two vectors. The SUM function adds up all elements in a vector.

The following function computes the centroid of a single polygon in a vectorized manner. The first few statements build the vectors from the values of the vertices; the last four statements compute the area and centroid by using vectorized operations. There are no loops.

proc iml;
/* _PolyCentroid: Return the centroid of a simple polygon.
   P  is an N x 2 matrix of (x,y) values for consectutive vertices. N > 2. */
start _PolyCentroid(P);
   lagIdx = 2:nrow(P) || 1;              /* vertices 2, 3, ..., N, 1 */
   xi   = P[ ,1];       yi = P[ ,2];     /* x_i and y_i for i=1..N   */
   xip1 = xi[lagIdx]; yip1 = yi[lagIdx]; /* x_{i+1} and y_{i+1}, x_{N+1}=x_1 */
   d = xi#yip1 - xip1#yi;                /* vector of difference of products */
   A = 0.5 * sum(d);                     /* signed area of polygon */
   cx = sum((xi + xip1) # d) / (6*A);    /* X coord of centroid */
   cy = sum((yi + yip1) # d) / (6*A);    /* Y coord of centroid */
   return( cx || cy );
finish;
 
/* test the function by using an L-shaped polygon */
L = {0 0, 6 0, 6 1, 2 1, 2 3, 0 3};
centroid = _PolyCentroid(L);
print centroid;

For this L-shaped polygon, the centroid is located outside the polygon. This example shows that the centroid might not be the best position for a label for a polygon. Examples of this phenomenon are easy to find on real maps. For example, the states of Louisiana and Florida have centroids that are near the boundaries of the states.

Compute the centroid of many polygons

The %CENTROID macro can compute the centroids for multiple polygons, assuming that there is an identifying categorical variable (called the ID variable) whose unique values distinguish one polygon from another.

If you want to compute the centroids of multiple polygons in SAS/IML, you can use the UNIQUEBY function to extract the rows that correspond to each polygon. For each polygon, the previous _PolyCentroid function is called, as follows:

/* If the polygon P has two columns, return the centroid. If it has three 
   columns, assume that the third column is an ID variable that identifies
   distinct polygons. Return the centroids of the multiple polygons. */
start PolyCentroid(P);
   if ncol(P)=2 then  return( _PolyCentroid(P) );
   ID = P[,3];
   u = uniqueby(ID);         /* starting index for each group */
   result = j(nrow(u), 2);   /* allocate vector to hold results */
   u = u // (nrow(ID)+1);    /* append (N+1) to end of indices */
   do i = 1 to nrow(u)-1;    /* for each group... */
      idx = u[i]:(u[i+1]-1); /* get rows in group */
      result[i,] = _PolyCentroid( P[idx, 1:2] ); 
   end;
   return( result );
finish;
 
/* test it on an example */
rect = {0 0,  2 0,  2 1,  0 1};
ID = j(nrow(L), 1, 1) // j(nrow(rect), 1, 2);
P = (L // rect) || ID;
centroids = PolyCentroid(P);
print centroids;

In summary, you can construct an efficient function in SAS/IML to compute the centroid of a simple polygon. The construction shows how to vectorize a computation by putting the vertices in vectors, rather than the less efficient alternative, which is to iterate over the vertices in a loop. Because polygons are often used to display geographical regions, I also included a function that iterates over polygons and computes the centroid of each.

10 Comments

Sanjay Matange on January 13, 2016 9:27 am

It is useful to note that the %CENTROID macro does NOT compute the real centroid of a polygon, but the "best location" for labeling a map polygon. This is clarified in my article referenced in this article.

- Rick Wicklin on January 13, 2016 9:32 am
  
  My mistake. Thanks for the correction.
  
Tom on January 15, 2016 11:11 am

Very interesting. Do you know how much it would cost to add SAS/IML to SAS/STAT package?

- Rick Wicklin on January 15, 2016 11:44 am
  
  No, I don't have that information, but you can go to the "How to Buy" page on the sas.com website, or contact your SAS representative directly.
  
Pingback: Create a package in SAS/IML - The DO Loop
Kishore Kumar on March 22, 2017 9:59 am

Hi ,
I have a small query related to the output of %centroid macro. I need to know the units of the X-Y output from %centroid. Actually I want latitude & longitude of the centroid. For those reasons, I need to convert X-Y coordinates to polar coordinates. Do you have any idea how can I convert output form %centroid macro to polar coordinates?

- Rick Wicklin on March 22, 2017 10:14 am
  
  Although I mentioned the %CENTROID macro, you will notice that I do not use it in this article. Since it is distributed as part of SAS/GRAPH software, I suggest you ask your question at the SAS Support Community for Graphics. My guess is that the macro returns projected coordinates.

Steven Chipps on December 31, 2021 6:40 pm

Hi Rick,

In the first function for computing a centroid, you mention that "..... the last four statements compute the area and centroid by using vectorized operations. There are no loops". After running the function, I get output (coordinates) for the centroid of the L-shaped polygon, but was wondering how to get output for the area (A) of the polygon? I'm sure I am missing something simple here.... Thanks for your help.

Rick Wicklin on January 1, 2022 7:54 am

The area is computed as a local variable inside the function. You could return it as part of the output of _Centroid, but it probably makes more sense to create a new function:

/* Area: Return the area of a simple polygon.
   P  is an N x 2 matrix of (x,y) values for consectutive vertices. N > 2. */
start Area(P);
   lagIdx = 2:nrow(P) || 1;              /* vertices 2, 3, ..., N, 1 */
   xi   = P[ ,1];       yi = P[ ,2];     /* x_i and y_i for i=1..N   */
   xip1 = xi[lagIdx]; yip1 = yi[lagIdx]; /* x_{i+1} and y_{i+1}, x_{N+1}=x_1 */
   d = xi#yip1 - xip1#yi;                /* vector of difference of products */
   A = 0.5 * sum(d);                     /* signed area of polygon */
   return( A );
finish;
 
Area = Area(L);
print Area;

Steven Chipps on February 16, 2022 4:48 pm

Hi Rick,

Forgot to thank you for the solution. It works great -- and I'm off and running (for better or worse).

thanks again for your help.

Blogs

Blogs

Compute the centroid of a polygon in SAS

A vectorized SAS/IML function that computes a centroid

Compute the centroid of many polygons

About Author

10 Comments

Leave A Reply Cancel Reply