This article is by guest contributor Lelia McConnell, SAS Tech Support.
Several users have called recently to ask the question, “Can I reorder the legend entries on the bar chart that I created with PROC SPLOT?”
Although there is no option that does this directly in PROC SGPLOT, the answer to this question is “YES, you can define the order of your legend entries.”
In this post, I present an example that illustrates the syntax that you would use to define the order of your legend entries. For this example, we begin by sub setting the data set SASHELP.CARS to include only observations in which TYPE is not equal to HYBRID. To do this, we use the WHERE option in the SET statement, thus creating the data set CARS.
By bringing the CARS data set into PROC SGPLOT, I can create a vertical bar chart of the values of ORIGIN, where the height of the bars is based on the mean values of the variable MPG_CITY and the group variable is TYPE.
data cars; set sashelp.cars(where=(type ne 'Hybrid')); run; proc sgplot data=cars; vbar origin / response=mpg_city group=type groupdisplay=cluster stat=mean ; xaxis display=(nolabel); title 'Mileage by Origin and Type'; run; |
Instead of the default order of SUV, Sedan, Sports, Truck, and Wagon, I want the order of the legend entries to be Wagon, Sports, SUV, Truck, and Sedan. To make this change, I create a numeric variable that contains the values 1-5, based on the order in which I want my vehicle types to be displayed.
I need to create a format in order to display the values of TYPE in the legend instead of 1-5. The most efficient way to do this is to create a control data set that I can use with PROC FORMAT. Since I need only one observation for each value of MYTYPE in this data set, I will sort the data by MYTYPE so that I can use the FIRST logic in the DATA step that follows. In the DATA step, I need to create the columns FMTNAME, START, and LABEL. These are used to define the format name, original value, and format values, respectively. When you create a format, the automatic variable TYPE defines the variable as numeric or character, so we need to include the statement DROP TYPE to remove the variable TYPE from the control data set.
The CNTLIN option in the PROC FORMAT statement specifies the SAS data set from which PROC FORMAT builds the format.
Now I can resubmit my original PROC SGPLOT code along with the FORMAT statement to create the legend in the correct order and the KEYLEGEND statement with the TITLE option in order to keep the original title in my legend.
data newcars; set cars; if type='Wagon' then mytype=1; else if type='Sports' then mytype=2; else if type='SUV' then mytype=3; else if type='Truck' then mytype=4; else if type='Sedan' then mytype=5; run; proc sort data=newcars out=sortcars; by mytype; run; data myfmt; set sortcars; by mytype; if first.mytype then do; fmtname='typefmt'; start=mytype; label=type; drop type; output; end; run; proc format cntlin=myfmt; run; proc sgplot data=newcars; vbar origin / response=mpg_city group=mytype groupdisplay=cluster stat=mean; xaxis display=(nolabel); keylegend /title='Type'; format mytype typefmt.; title 'Mileage by Origin and Type'; run; |
Full SAS 9.3 SGPLOT code: Legend_93
7 Comments
How about using PROC SQL to build the CNTLIN table for PROC FORMAT?
proc sql noprint;
create table myfmt as
select distinct
'typefmt' as fmtname,
case type
when 'Wagon' then 1
when 'Sports' then 2
when 'SUV' then 3
when 'Truck' then 4
when 'Sedan' then 5
end as start,
type as Label
from cars
order by 2,3;
quit;
I understand the value of coding in a general way rather than hard-coding, but this example is based on an arbitrary ordering of five categories, and therefore then method of generation of the format is overkill. I would just make what is going on perfectly clear by using:
proc format;
value typefmt ="Wagon" 2="Sports" 3="SUV" 4="Truck" 5="Sedan";
run;
Howard and Alan have both shown alternative methods for building the format. The example in the code as well as the sample format methods they have shown will all produce the same results. That's one of the cool things about SAS, having more than one way to accomplish the same thing and being able to select the method you prefer.
Could you have used an index, rather than sorting the format dataset?
Yes, you can use an index instead of sorting the data set.
Any chance we (eventually) get an option like PRELOADFMT in proc means/etc. (most CLASS enabled procs) that allows to use the order from a NOTSORTED format, so we don't have to create a separate variable? IE, it would allow the formatted value to be used, but in the order of the format buckets - not sorted afterwards, so still ORDER=DATA.
If the first vbar occurence has a missing group value, legend order seems to be modified and therefore, not reflect the format.
e.g. if you add "if origin = 'Asia' and type = 'Sports' then delete;" in 1st data step to subset data, legend will be displayed as : Wagon > SUV > truck > Sedan > Sports instead of Wagon > Sports > SUV > Truck > Sedan.
Is there any way to avoid this ?