Ah! The joys of sets!

It is easy to test whether two vectors are equal in SAS/IML software. It is only slightly more challenging to test whether two sets are equal.

Recall that A and B are equal as sets if they contain the same elements. Order does not matter. For example, the set {1,2,3} is equal to the set {3,1,2}. Furthermore, elements can be repeated within a set, but that does not change the set. For example, the set {3,2,3,1,1} is also equal to the set {1,2,3}.

The SAS/IML language supports the following set operations:

1. Union: The UNION function computes the union of sets.
2. Intersection: The XSECT function computes the intersection of sets.
3. Difference: The SETDIF function computes the difference between two sets.
4. Subset: The ELEMENT function returns an indicator variable that specifies which elements of one vector are contained in another.
Furthermore, the UNIQUE function returns the unique ordered elements of a vector or matrix, which is a way of representing a set in a standard form.

You can use any of these functions to test for the equality of sets. However, you need to call the functions twice because you need to test that A⊆B and B⊆A in order to conclude that A = B. I had several useful conversations with Ian Wakeling about the most efficient way to test sets for equality. (Thanks, Ian!) Initially, I thought that using SETDIF twice was the simplest technique: you test whether SETDIF(A,B) and SETDIF(B,A) are both empty. However, after more thought, here's the technique that I like the best:

```proc iml; start SetEq(A,B); u1 = unique(A); /* unique elements in A */ u2 = unique(B); /* unique elements in B */ if ncol(u1) ^= ncol(u2) then return(0); /* number of elements differ */ return( all(u1=u2) ); /* unique elements of A = unique elements of B */ finish;```

The function compares the unique elements in the two sets. If the unique elements are the same, then the sets are equal. The function returns 1 if the sets are equal, and 0 otherwise.

You can use the following examples to test the SetEq function:

```A = {1 2 3}; B = {3 1 2}; C = {3 2 3 1 1}; D = {4 1 2}; AeqB = SetEq(A,B); AeqC = SetEq(A,C); AeqD = SetEq(A,D); print AeqB AeqC AeqD;```

The output shows that the sets A, B, and C are equal, but the set D is not equal to the set A.

The UNIQUE function is one of my favorite SAS/IML functions. I've blogged about it many times, including using it to test whether a sequence is increasing. And, of course, the UNIQUE function is half of the UNIQUE-LOC technique for analyzing groups in the SAS/IML language.

Share

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.