Over at the SAS/IML Discussion Forum, someone posted an interesting question about how to create a special matrix that contains all combinations of zeros and ones for a given size.
Specifically, the problem is as follows. Given an integer n ≥ 1, produce a matrix with 2n rows and n columns that contains all combinations of zeros and ones of length n. For example, the following matrix is a solution for n = 3:
I can think of three ways to accomplish this task. The first algorithm builds the matrix by columns; the other algorithms build the matrix by rows and, although they are less efficient, they have the advantage of being suitable for a DATA step implementation.
EDIT: As of SAS/IML 12.3, you can use the EXPANDGRID function to create a binary matrix that contains all binary combinations, For example, x = ExpandGrid(0:1, 0:1, 0:1);.
Building the Matrix by Columns
If you study the example matrix, you will discover a pattern to the columns:
- For the first column, the first half of the entries contains zeros and the second half contains ones.
- For the second column, one-fourth of the entries are zeros, then one-fourth ones, and then the pattern repeats.
- For the third column, one-eighth of the entries are zeros, and then one-eighth ones, and then the pattern repeats until all rows are filled.
This suggests an algorithm: for each column, build up a vector that consists of a number of zeros followed by the same number of ones, and then repeat this pattern until all rows are filled. The length of the pattern depends on the column number. The following module implements this algorithm:
proc iml; start GetAllZeroOneComb(n); rows = 2##n; /** construct one column at a time **/ x = j(rows, n); do j = 1 to n; PatLength = 2##(n-j); /** length of each pattern **/ PatRepl = 2##(j-1); /** number of patterns **/ pattern = j(PatLength,1,0) // j(PatLength,1,1); x[,j] = repeat(pattern, PatRepl); end; return( x ); finish; x = GetAllZeroOneComb(3); |
You can use the GetAllZeroOneComb module to generate matrices of any size. However, because the number of rows grows as 2n, the algorithm will run out of memory for moderate-sized values of n.
Building the Matrix by Rows
If you study the example matrix, you will also discover a pattern to the rows: the ith row is the binary representation of the integer i – 1. Therefore an algorithm is to generate the binary representations of the sequence 0:2##n-1.
I could write a SAS/IML module that takes a number and returns a row vector that contains the binary representation of the number, but instead I'm going to leverage the power of SAS formats. SAS has a BINARYw. format which converts a number into a string that contains the binary representation of the number. (You can use the PUTN function to apply a SAS format to elements of a SAS/IML matrix.)
You can then use the SUBSTR function to pick off each character in the binary sequence and the seldom-used NUM function to convert the character into a numeric zero or one. The following module implements this algorithm:
start GetAllZeroOneCombByRow(n); rows = 2##n; /** construct one row at a time **/ x = j(rows, n); format = "binary" + strip(char(n)) + "."; do i = 1 to rows; s = putn(i-1, format); do j = 1 to n; x[i,j] = num(substr(s,j,1)); end; end; return( x ); finish; |
I don't like this algorithm for SAS/IML programming because there are no vectors in sight. This is an algorithm that plods through the matrix cell-by-cell and fills in the appropriate zero or one. It is better suited for the DATA step than for SAS/IML. In fact, the DATA step implementation is almost identical, except would use an array of variables instead of columns of a matrix.
Building the Matrix by Using All Combinations
If the order of rows in the matrix is not important, then you can implement another row-wise algorithm. I won't implement it here, but you can use the ALLCOMB function to look at all combinations of n elements taken k at a time, and use those combinations as locations for the ones in the matrix. For example, one of the "3 choose 1" combinations is (2), and this combination corresponds to the row {0 1 0} which has a one in the second location. One of the "3 choose 2" combinations is (1, 3), and this combination corresponds to the row {1 0 1}.
Perhaps you can think of another algorithm? If so, describe it in the comments or on the original thread.
14 Comments
When I wrote the section on the BINARYw. format, I kept thinking, "this seems familiar." Turns out that I have used the NUM(SUBSTR(...)) trick before: http://blogs.sas.com/content/iml/2010/12/07/placing-the-digits-of-a-number-into-a-vector/
no algorithms input; just use Cartesian product to get all the 2*2*2 combinations in SQL:
---
/*just create three data sets (all hold 0,1)*/
data a;
input x@@;
datalines;
0 1
;
data b;
input y@@;
datalines;
0 1
;
data c;
input z@@;
datalines;
0 1
;
/*use Cartesian product to get all the 2*2*2 combinations*/
proc sql;
create table all as
select *
from a,b,c
order by x,y,z
;
quit;
proc print nobs;run;
That's an interesting approach, and one that I had not considered.
Your method DOES use an algorithm, but that the algorithm is performed automatically by the SELECT/FROM/ORDER statements in PROC SQL.
I suppose that you could create a macro function that generalizes your method and creates a matrix with n columns and 2n rows, but I would personally find that task difficult to implement because I am not highly skilled at macro programming.
The following is the more generalized version to get 2^n combinations, which heavily takes advantage of SQL. It is just a straightforward approach, and not as elegant as the vector operations.
data base;
input num;
datalines;
0
1
;
%macro comb(n);
%do i=1 %to &n;
data __d&i;
set base;
rename num=__d&i;
run;
%end;
proc sql;
select memname into:d separated by ','
from dictionary.tables
where upcase(libname)="WORK" and upcase(substr(memname,1,3))="__D"
;
quit;
proc sql;
create table D&n as
select *
from &d
order by &d
;
quit;
proc print data=D&n noobs;
run;
/**
proc datasets lib=work nolist;
delete __d:;
quit;
**/
%mend comb;
%comb(3)
%comb(4)
Thank you Jiangtang, this has been very helpful!
What exactly is this part of the code doing?
proc sql;
select memname into:d separated by ','
from dictionary.tables
where upcase(libname)="WORK" and upcase(substr(memname,1,3))="__D"
;
quit;
It creates a macro variable (&d) that contains a comma-separated list of values. See the article, "Create a SAS macro variable that contains a list of values."
I wanted to create a random matrix of zeros and ones. I wrote the following:
Can someone please tell me how to translate this code to Rstudio? I am trying to fill a matrix of 12(columns) x 144(rows). With all possible combinations of ones and zeros. I would really appreciate any guidance.
A 12-digit binary sequence requires 2^12 = 4096 rows. There are dozens of ways to do this in SAS and R. The simplest is probably
Thank you very much for the quick reply, it works very well!
Pingback: Visualize the Cantor function in SAS - The DO Loop
I have a matrix of 5(rows)*32(columns) which is initialized by all ones. With subsequent iterations i want to change few states of matrix from 1 to 0. I want to generate all the possible states of 1s and 0s a 5 by 32 matrix can have. Please if somebody can help me that will be grateful. Thank you.
Post your question to the SAS/IML Support Community.
Pingback: Create patterns of missing data - The DO Loop