Four essential sampling methods in SAS

1

Many simulation and resampling tasks use one of four sampling methods. When you draw a random sample from a population, you can sample with or without replacement. At the same time, all individuals in the population might have equal probability of being selected, or some individuals might be more likely than others. Consequently, these four common sampling methods are shown in the following 2 x 2 table.

Four sampling methods in SAS: Sampling with and without replacement, with equal and unequal probability

The SURVEYSELECT procedure in SAS/STAT is one way to generate random samples. The previous table lists the four sampling methods, summarizes the SURVEYSELECT syntax for each method, and shows how to use the SAMPLE function in SAS/IML.

Sampling without replacement

When you sample without replacement, the size of the sample cannot exceed the number of items.

Simple random sampling (SRS): Survey statisticians use "SRS" for sampling without replacement and with equal probability. Dealing cards from a 52-card deck is an example of SRS. Use the METHOD=SRS option in PROC SURVEYSELECT to request simple random sampling.

Probability proportional to size (PPS): Survey statisticians use "PPS" for sampling without replacement and with unequal probability. As an example, suppose that you want to draw samples of colored marbles from an urn that contains colors in different proportions. The proportion of each color in the urn determines the expected count for each color in the sample. In PROC SURVEYSELECT, you can use method=PPS in conjunction with the SIZE statement to specify the relative sizes (or the probabilities) for the colors in the urn.

Sampling with replacement

When you sample with replacement, the size of the sample can be arbitrarily large.

Unrestricted random sampling (URS): Survey statisticians use "URS" for sampling with replacement and with equal probability. Rolling a six-sided die and recording the face that appears is an example of URS. Use the METHOD=URS option in PROC SURVEYSELECT to request unrestricted random sampling.

Probability proportional to size with replacement: Survey statisticians use "PPS with replacement" for sampling with replacement and with unequal probability. As an example, suppose that you want to repeatedly toss two dice and record the sum of the faces. The sum will be 2 (or 12) with probability 1/36. The sum will be 3 (or 11) with probability 2/36, will be 4 (or 10) with probability 3/36, and so forth. In PROC SURVEYSELECT, you can use the SIZE statement to specify the probability for each sum. You can use the METHOD=PPS_WR option (PPS sampling with replacement) to simulate random sums from a pair of dice.

These four sampling methods are useful to the statistical programmer because they are often used in simulation studies. For information about using the SAS DATA step and PROC SURVEYSELECT for basic sampling, see "Selecting Unrestricted and Simple Random with Replacement Samples Using Base SAS and PROC SURVEYSELECT (Chapman 2012)." See the PROC SURVEYSELECT documentation for a detailed explanation of these and many other sampling methods.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Back to Top