Many simulation and resampling tasks use one of four sampling methods. When you draw a random sample from a population, you can sample with or without replacement. At the same time, all individuals in the population might have equal probability of being selected, or some individuals might be more likely than others. Consequently, these four common sampling methods are shown in the following 2 x 2 table.
The SURVEYSELECT procedure in SAS/STAT is one way to generate random samples. The previous table lists the four sampling methods, summarizes the SURVEYSELECT syntax for each method, and shows how to use the SAMPLE function in SAS/IML.
Sampling without replacement
When you sample without replacement, the size of the sample cannot exceed the number of items.
Simple random sampling (SRS): Survey statisticians use "SRS" for sampling without replacement and with equal probability. Dealing cards from a 52-card deck is an example of SRS. Use the METHOD=SRS option in PROC SURVEYSELECT to request simple random sampling.
Probability proportional to size (PPS): Survey statisticians use "PPS" for sampling without replacement and with unequal probability. As an example, suppose that you want to draw samples of colored marbles from an urn that contains colors in different proportions. The proportion of each color in the urn determines the expected count for each color in the sample. In PROC SURVEYSELECT, you can use method=PPS in conjunction with the SIZE statement to specify the relative sizes (or the probabilities) for the colors in the urn.
- SURVEYSELECT example: Use PROC SURVEYSELECT as in this example, but change the option to METHOD=PPS.
- SAS/IML example: Use the SAMPLE function to sample without replacement and with unequal probability.
Sampling with replacement
When you sample with replacement, the size of the sample can be arbitrarily large.
Unrestricted random sampling (URS): Survey statisticians use "URS" for sampling with replacement and with equal probability. Rolling a six-sided die and recording the face that appears is an example of URS. Use the METHOD=URS option in PROC SURVEYSELECT to request unrestricted random sampling.
- SURVEYSELECT example: Use PROC SURVEYSELECT (or the DATA step) for URS.
- SAS/IML example: Use the SAMPLE function to sample with replacement and with equal probability.
Probability proportional to size with replacement: Survey statisticians use "PPS with replacement" for sampling with replacement and with unequal probability. As an example, suppose that you want to repeatedly toss two dice and record the sum of the faces. The sum will be 2 (or 12) with probability 1/36. The sum will be 3 (or 11) with probability 2/36, will be 4 (or 10) with probability 3/36, and so forth. In PROC SURVEYSELECT, you can use the SIZE statement to specify the probability for each sum. You can use the METHOD=PPS_WR option (PPS sampling with replacement) to simulate random sums from a pair of dice.
- Example: Use PROC SURVEYSELECT or the SAMPLE function for sampling with unequal probability and with replacement
- Second example: Use PROC SURVEYSELECT (or the DATA step) for generating samples from the multinomial distribution
These four sampling methods are useful to the statistical programmer because they are often used in simulation studies. For information about using the SAS DATA step and PROC SURVEYSELECT for basic sampling, see "Selecting Unrestricted and Simple Random with Replacement Samples Using Base SAS and PROC SURVEYSELECT (Chapman 2012)." See the PROC SURVEYSELECT documentation for a detailed explanation of these and many other sampling methods.
2 Comments
Pingback: Stratified random sample: What's efficient? - SAS Learning Post
Pingback: Implement five sampling methods in the SAS DATA step - The DO Loop