The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the means of the two groups are equal. Will the t test be able to detect a difference in the two group means (if it exists) by rejecting the null hypothesis?
The ability to detect differences in the group means depends on the sample sizes in the study, but it is safe to say that the t test is unlikely to detect small differences in the students' mean scores, and it is more likely to detect larger differences.
In general, the power of a test is the probability that the test will reject the null hypothesis when a specified alternative is true. For the t test, power means the probability that the test can detect a mean difference of a specified magnitude. In general, this can be a difficult computation because it requires knowing the sampling distribution of the test statistic under the alternative hypothesis. For simple tests (such as the two-sample t test), the sampling distribution is known, but for more complicated statistical tests the power computation might be available only by using simulation methods.
This article describes how to use simulation to estimate the power of the t test. For this simple test, we can check the simulation by using the exact answer, as provided by the POWER procedure in SAS. However, the simulation will illustrate general ideas that you can use to estimate the power of more complicated statistical tests. This article is based on material in Chapter 5 of Simulating Data with SAS.
Formulation of the problem
In this simulation, I assume that each population is normally distributed. For simplicity, assume the population for Group 1 is N(0, 1) and the population for Group 2 is N(δ, 1), where δ > 0 is the difference between the population means.
The null hypothesis for the t test is that δ = 0. Given two samples, the t test will either reject the null hypothesis at the α = 0.05 significance level or it won't. In the simulation approach, you simulate many samples, and estimate the probability of rejecting the null hypothesis by using the empirical proportion of simulated samples that rejected the null hypothesis. (This uses the law of large numbers.)
The following steps estimate the power for the two-sample pooled t test:
- Simulate data from the model for each group's population. These are the samples. The populations are chosen so that the true difference between the population means is δ > 0. (The null hypothesis is false.)
- Run the TTEST procedure on the samples. For each sample, record whether the t test rejects the null hypothesis.
- Count how many times the t test rejects the null hypothesis. This proportion is an estimate for the power of the test.
Simulating the data
The following SAS statements define parameters for the simulation and use the DATA step to simulate 5,000 simulated trials. All of the data are in a single data set, and the SampleID variable identifies which observations belong to which simulated trial. In this simulation, each group has 10 observations and the true difference between the population means is 1.2, which is a little larger than the standard deviations of the populations.
%let n1 = 10; /* group sizes*/ %let n2 = 10; %let NumSamples = 5000; /* number of simulated samples */ %let Delta = 1.2; /* true size of mean difference in population */ data PowerSim(drop=i); call streaminit(321); do SampleID = 1 to &NumSamples; do i = 1 to &n1; c=1; x = rand("Normal", 0, 1); /* Group 1: x ~ N(0,1) */ output; end; do i = 1 to &n2; c=2; x = rand("Normal", &Delta, 1); /* Group 2: x ~ N(Delta, 1) */ output; end; end; run;
Analyzing the Simulated Data
As I have written previously, use BY-group processing to carry out efficient simulation and analysis in SAS. Also, be sure to suppress the display of tables and graphs during the analysis by using the the %ODSoff and macro. The following SAS statements define the %ODSOff and %ODSOn macros, and analyze all data for the simulated trials:
%macro ODSOff(); /* Call prior to BY-group processing */ ods graphics off; ods exclude all; ods noresults; %mend; %macro ODSOn(); /* Call after BY-group processing */ ods graphics on; ods exclude none; ods results; %mend; /* 2. Compute (pooled) t test for each sample */ %ODSOff proc ttest data=PowerSim; by SampleID; class c; var x; ods output ttests=TTests(where=(method="Pooled")); run; %ODSOn
The TTEST procedure creates an output data set that contains 5,000 rows, one for each simulated trial. The data set includes a variable named Probt, which gives the result of the t test on each trial.
Count the rejections to estimate power
The last step in the simulation is to estimate the power, which is the probability of rejecting the null hypothesis. The following SAS statements create an indicator variable that has the value 1 if the t test rejected the null hypothesis at the 0.05 significance level, and the value 0 otherwise. (Alternatively, you could define a SAS format.) You can then use PROC FREQ to count the proportion of trials for which the t test rejected the null hypothesis.
/* Construct indicator var for obs that reject H0 at 0.05 significance */ data Results; set TTests; RejectH0 = (Probt <= 0.05); run; /* 3. Compute proportion: (# that reject H0)/NumSamples and CI */ proc freq data=Results; tables RejectH0 / nocum binomial(level='1'); run;
The FREQ procedure indicates that the power of the two-sample t test is about 72%. The 95% confidence interval for that estimate is [0.708, 0.733]. This estimate is for the scenario of samples of sizes 10, where one sample is drawn from N(0,1) and the other is drawn from N(1.2, 1).
As mentioned earlier, you can use PROC POWER to find the exact power for the two-sample t test, as follows:
proc power; twosamplemeans power = . /* missing ==> "compute this" */ meandiff=1.2 stddev=1 ntotal=20; /* 20 obs in the two samples */ run;
The estimate from the simulation is very close to the true power of the t test.
Next week I will extend this simulation to estimate points along a power curve.