Reading Hierarchical Data

This post is the third and final in a series that illustrates three different solutions to "flattening" hierarchical data. Don't forget to catch up with Part 1 and Part 2.

Solution 2, from my previous post, created one observation per header record, with detail data in a wide format, like this:

Detail Approach: One observation per header record
 
Obs    Family     Employee    Spouse    Child1    Child2    Child3
 
1     Jones       Bob        Carol     Sally     Alice
 
2     Sanchez     Mary
 
3     Smith       Nancy      Harold

Today's Solution 3, unlike Solution 2, has no arbitrary limit to the number of detail items, because it stores the detail data in a tall, rather than wide, format, as shown below, with one observation per detail record, rather than one observation per header record.

Detail Approach: One observation per detail record
 
Obs    Family     Person    Type
 
 
1     Jones      Bob       Employee
 
2     Jones      Carol     Spouse
 
3     Jones      Sally     Child
 
4     Jones      Alice     Child
 
5     Sanchez    Mary      Employee
 
6     Smith      Nancy     Employee
 
7     Smith      Harold    Spouse

The solution 3 program below is the simplest yet!

data tall;
   retain Family;
   infile 'c:\temp\families.txt';
   input Type: $1. Person $;
   if type='F' 
      then Family=Person;
      else output;	* Detail records only;
run;

The RETAIN statement holds the value of the FAMILY variable in memory.

The IF/THEN statement overwrites the retained value of the FAMILY variable with the current value of the PERSON variable when TYPE is "F".

When TYPE is not "F", the ELSE statement outputs the detail record.

That was too easy! OK, pop quiz!

How would you take the "tall" dataset above, and transpose it back to "wide" format?

The solution below is similar to the Solution 2 program from my previous post, except simpler, because we're reading a SAS dataset, rather than a text file, so we can exploit the BY statement and FIRST. / LAST. variables, like this:

data wide;
 
	keep   Family Employee Spouse Child1-Child3;
	retain Family Employee Spouse Child1-Child3;
        length Family Employee Spouse Child1-Child3 $ 8;
 
	array  C{3} $ 8 Child1-Child3;
 
	set tall;
	by family notsorted;
 
	if first.family then do;
		Employee=Person;	
		N=0;			* Number of children;
		call missing(of Spouse Child1-Child3);
	end;
 
	else if type='S' then Spouse=Person;
 
	else if type='C' then do; * Child record;
		N+1;		  * Count children;
		C{N}=Person;	  * Assign each child into array;
	end;
 
	if last.family;
 
run;

The NOTSORTED option on the BY statement avoids the requirement for sorted data, as long as the families are grouped.

Last question, then we'll end this series. How would you take the "wide" dataset above, and transpose it back to "tall" format?

The solution below generates at least one output observation from every input observation, and up to four additional output observations, depending on family size:

data tall;
 
	keep Family Person Type;
	array  C{3} $ 8 Child1-Child3;
 
	set wide;
 
	Type='E';
	Person=Employee;
	output;
 
	if Spouse ne ' ' then do;
	    Type='S';
	    Person=Spouse;
	    output;
	end;
 
	do N=1 to 3;
	    Type='C';
	    Person=C{N};
	    if person ne ' ' 
               then output;
               else leave;
	end;
 
run;