UpSet Plot using GTL

2

An UpSet plot is used to visualize intersections of sets. In this post, we will illustrate techniques to create this plot using the Graph Template Language (GTL). We assume that you are familiar with GTL.

From the point of view of construction, we leverage the LATTICE layout available in GTL using which can create a panel of graphs. We break down the layout into the following pieces:

  1. The main barchart at the top that represents the intersection sizes.
  2. The horizontal barchart at the bottom that represents the univariate sizes.
  3. The bottom matrix that represents the composition of intersection.

About the data: We have used imaginary data for all of the respective pieces described above and then combined them to create a final dataset that can be used for plotting purpose. This includes:

  • Summarized data for both barcharts described in the first and second pieces above.
  • For the third piece, we need to set up the data to plot the matrix layout. In order to do this, we need the data about the composition of intersection. Each observation in this dataset represents whether a covariate or category is present in a certain set. Assuming the nature of data for covariates to be of binary type, we can assign the values ‘1’ and ‘0’ that are mapped to their respective levels, for example, 'Present' and 'Absent'. Subsequently, we process this dataset further to add dummy group variables to control display of the connect lines.

Let us now look at the template code for the plot.

As a first step, we will define a discrete attribute map that can be used to control the visual attributes of the graph. In this map, we specify the attributes for marker colors, line thickness and fill color on the VALUE statement within the DiscreteAttrMap block. The DiscreteAttrVar statements are used to associate the attribute map with appropriate dataset variables. Once the map is defined, this can be consumed by one or more plots within the template. Below is the block of code for the discrete attribute map.

DiscreteAttrVar attrvar=MYID_VALUE var=VALUE attrmap="__ATTRMAP__MYID";
DiscreteAttrVar attrvar=MYID_JOIN var=JOIN attrmap="__ATTRMAP__MYID";
DiscreteAttrVar attrvar=MYID_GROUP var=GROUP attrmap="__ATTRMAP__MYID";
 
DiscreteAttrMap name="__ATTRMAP__MYID" /;
  Value "0" / markerattrs=( color=CXD3D3D3) lineattrs=( thickness=0);
  Value "1" / markerattrs=( color=CX000000) lineattrs=( thickness=2);
  Value "Treatment" / fillattrs=( color=BIBG) ;
  Value "Placebo" / fillattrs=( color=BIGB) ;
EndDiscreteAttrMap;

We then define a 2x2 panel of graphs within the LATTICE layout. Keeping the first cell empty cell, we use the second cell to plot the vertical bar chart. We can use appropriate options in the nested OVERLAY layout statement to control the display for the border, wall and axes of the chart. In addition to the layout and axis options, the GROUP option in barchart is used to create the stacked bars for different groups. We use the appropriate discrete attribute variable for the GROUP option that we defined earlier.

layout lattice / rows=2 columns=2 rowweights=(0.7 0.3) columnweights=(0.2 0.8);
cell;
layout overlay / border=false walldisplay=none xaxisopts=(display=none) yaxisopts=(label="Intersection Size");
  barchart X='xlabel1'n Y='count1'n / display=(fill) barlabel=true
    Group=MYID_GROUP name="BAR1" groupdisplay=stack includemissinggroup=False;
  discretelegend "BAR1" / border=False location=inside autoalign=(topright);
endlayout;
endcell;

The next cell block can be used to plot the horizontal barchart with suitable options.

cell;
layout overlay / border=false walldisplay=none xaxisopts=(reverse=True label="Total") y2axisopts=(display=none);
  barchart X='xlabel2'n Y='count2'n / display=(fill) displaybaseline=off fillattrs=(color=orange)
  name="BAR2" orient=horizontal yaxis=y2;
endlayout;
endcell;

The final cell in the LATTICE layout is used to create the matrix panel. A scatter plot is used to plot the dots representing the covariates. The markers of the scatter plot are color coded by the cardinality of the covariate. The connect lines are created by overlaying a series plot on the scatter plot. The discrete attribute variable created earlier is consumed by the GROUP option in both the plots to get the desired visual attributes.

cell;
layout overlay / border=false walldisplay=none 
    yaxisopts=(display=(tickvalues) discreteopts=(colorbands=odd colorbandsattrs=(color=lightgray transparency=0.6))) 
    xaxisopts=(display=none );
  scatterplot X='xlabel3'n Y='ylabel3'n / subpixel=off primary=true 
    group=MYID_VALUE Markerattrs=( symbol=circlefilled size=12) 
    legendLabel="ylabel3" 
    name="SCATTER";
  seriesplot X='xlabel3'n Y='ylabel3'n / group=MYID_JOIN 
    lineattrs=( Color=CX000000) legendLabel="ylabel3" 
    name="SERIES";
endlayout;
endcell;

You can check out the complete code here.

Tags
Share

About Author

Debpriya Sarkar

Senior Software Specialist

Debpriya Sarkar has been a SAS user for more than 14 years. He works in the area of ODS Graphics and is interested in data visualization and statistics.

Related Posts

2 Comments

    • Debpriya Sarkar
      Debpriya Sarkar on

      Cell blocks make it easier to see the cell boundary in the code. It may add some length to the code but it may also help to keep it organized. Besides, users can also add cellheaders if needed (although I don’t have any in the example).

Leave A Reply

Back to Top