This post is the third and final in a series that illustrates three different solutions to "flattening" hierarchical data. Don't forget to catch up with Part 1 and Part 2.
Solution 2, from my previous post, created one observation per header record, with detail data in a wide format, like this:
Detail Approach: One observation per header record Obs Family Employee Spouse Child1 Child2 Child3 1 Jones Bob Carol Sally Alice 2 Sanchez Mary 3 Smith Nancy Harold |
Today's Solution 3, unlike Solution 2, has no arbitrary limit to the number of detail items, because it stores the detail data in a tall, rather than wide, format, as shown below, with one observation per detail record, rather than one observation per header record.
Detail Approach: One observation per detail record Obs Family Person Type 1 Jones Bob Employee 2 Jones Carol Spouse 3 Jones Sally Child 4 Jones Alice Child 5 Sanchez Mary Employee 6 Smith Nancy Employee 7 Smith Harold Spouse |
The solution 3 program below is the simplest yet!
data tall; retain Family; infile 'c:\temp\families.txt'; input Type: $1. Person $; if type='F' then Family=Person; else output; * Detail records only; run; |
The RETAIN statement holds the value of the FAMILY variable in memory.
The IF/THEN statement overwrites the retained value of the FAMILY variable with the current value of the PERSON variable when TYPE is "F".
When TYPE is not "F", the ELSE statement outputs the detail record.
That was too easy! OK, pop quiz!
How would you take the "tall" dataset above, and transpose it back to "wide" format?
The solution below is similar to the Solution 2 program from my previous post, except simpler, because we're reading a SAS dataset, rather than a text file, so we can exploit the BY statement and FIRST. / LAST. variables, like this:
data wide; keep Family Employee Spouse Child1-Child3; retain Family Employee Spouse Child1-Child3; length Family Employee Spouse Child1-Child3 $ 8; array C{3} $ 8 Child1-Child3; set tall; by family notsorted; if first.family then do; Employee=Person; N=0; * Number of children; call missing(of Spouse Child1-Child3); end; else if type='S' then Spouse=Person; else if type='C' then do; * Child record; N+1; * Count children; C{N}=Person; * Assign each child into array; end; if last.family; run; |
The NOTSORTED option on the BY statement avoids the requirement for sorted data, as long as the families are grouped.
Last question, then we'll end this series. How would you take the "wide" dataset above, and transpose it back to "tall" format?
The solution below generates at least one output observation from every input observation, and up to four additional output observations, depending on family size:
data tall; keep Family Person Type; array C{3} $ 8 Child1-Child3; set wide; Type='E'; Person=Employee; output; if Spouse ne ' ' then do; Type='S'; Person=Spouse; output; end; do N=1 to 3; Type='C'; Person=C{N}; if person ne ' ' then output; else leave; end; run; |
The statement below
else leave; |
exits the DO loop if there are fewer than 3 children.