I think everyone can agree that being able to debug programs is an important skill for SAS programmers. That’s why Susan Slaughter and I devoted a whole chapter to it in The Little SAS® Book. I don’t know about you, but I think figuring out what’s wrong with my program is the fun part of programming. If every program I wrote ran correctly the first time, I would be bored. So when Susan and I recently teamed up with Rebecca Ottesen to write Exercises and Projects for The Little SAS® Book, Fifth Edition, you can be sure that we included a chapter on debugging.
Our new book has multiple choice, short answer, and programming exercises that correspond to each chapter in The Little SAS® Book. The following programming exercise is similar to exercises contained in the book. So go ahead – test your debugging skills and see if you can find all the mistakes in the program. I’ll give you a hint (and, yes, hints are also provided in the book) – there are eight problems with this program.
1. You are provided with the names and nationality for ten top golfers on the PGA tour along with the year they turned professional and the age when they turned pro. In addition, you have data about the final scores for the same players for the US Open tournament for the years 2013-2015. If the player did not enter or finish the tournament, then the score is missing for that player and year. You have assigned a new employee the task of writing a SAS program that will read in the data and produce a scatter plot of years of experience versus US Open score using a different colored filled circle for each player. The program should also produce a table showing the name, US Open year, and score for all players with final scores of 280 or below (in golf the lowest score wins). The following is the program your new employee writes.
DATA golfers; INPUT Player Name $ 6-30 Country $ YearTurnedPro $ AgeTurnedPro; DATALINES; 1 Rory McIlroy NIR 2007 18 2 Jordan Spieth USA 2012 19 3 Bubba Watson USA 2002 24 4 Dustin Johnson USA 2007 23 5 Rickie Fowler USA 2009 21 6 Jim Furyk USA 1992 22 7 Henrik Stenson SWE 1999 23 8 Justin Rose ENG 1998 18 9 Jason Day AUS 2006 19 10 Sergio Garcia ESP 1999 19 ; RUN; DATA USopen; INPUT Year Player Score @@; DATALINES; 2015 1 280 2015 2 275 2015 3 . 2015 4 276 2015 5 . 2015 6 287 2015 7 285 2015 8 285 2015 9 280 2O15 10 283 2014 1 286 2014 2 284 2014 3 . 2014 4 281 2014 5 279 2014 6 283 2014 7 281 2014 8 283 2014 9 281 2014 10 288 2013 1 294 2013 2 . 2013 3 293 2013 4 297 2013 5 287 2013 6 . 2013 7 291 2013 8 281 2013 9 283 2013 10 295 ; RUN; PROC SORT DATA = USopen; BY Player; RUN; DATA USopen2; MERGE golfers (KEEP= Player Name) USopen; BY Player; YearsExperience = Year - YearTurnedPro; RUN; PROC SGPLOT DATA = USopen SCATTER X = YearsExperience Y = Score / GROUP = Player MARKERATTERS = (SYMBOL = CIRCLEFILLED); TITLE 'US Open Scores by Years on the PGA Tour'; RUN; PROC PRINT DATA = USopen2 NONUMS; WHERE Score <= 280; VAR Name Year Score; TITLE "US Open Scores of 280 or Less'; RUN; |
a) Identify and correct any problems with the preceding code so that the program will run correctly.
b) Add comments to the revised program for each fix so that your employee can understand her mistakes.
We hope that you found this information helpful. Visit the book page for additional information, reviews, and a free book excerpt.
5 Comments
Data step #1
- no reason to read year as a string (suppress the $ after yearturnedpro).
Data step #2
- misspelling : 2o15 (o) in place of 2015 (zero) => year will be missing.
Data step #3
- YearsExperience will be missing for all, due to the error of data step #1.
Proc print
- conversion from >= to <= (HTML).
/* Corrected Program update --now includes comments for all changes made*/
DATA golfers;
INPUT Player 1-5 Name $6-30 Country $31-36 YearTurnedPro 37-41 AgeTurnedPro 42-43 ;
/* used column pointers for all variables */
DATALINES;
1 Rory McIlroy NIR 2007 18
2 Jordan Spieth USA 2012 19
3 Bubba Watson USA 2002 24
4 Dustin Johnson USA 2007 23
5 Rickie Fowler USA 2009 21
6 Jim Furyk USA 1992 22
7 Henrik Stenson SWE 1999 23
8 Justin Rose ENG 1998 18
9 Jason Day AUS 2006 19
10 Sergio Garcia ESP 1999 19
;
RUN;
DATA USopen;
INPUT Year Player Score @@;
DATALINES;
2015 1 280 2015 2 275 2015 3 . 2015 4 276 2015 5 .
2015 6 287 2015 7 285 2015 8 285 2015 9 280 2015 10 283
2014 1 286 2014 2 284 2014 3 . 2014 4 281 2014 5 279
2014 6 283 2014 7 281 2014 8 283 2014 9 281 2014 10 288
2013 1 294 2013 2 . 2013 3 293 2013 4 297 2013 5 287
2013 6 . 2013 7 291 2013 8 281 2013 9 283 2013 10 295
;
/* replaced character O in 9th obs to number 0 */
RUN;
PROC SORT DATA = USopen;
BY Player;
RUN;
DATA USopen2;
MERGE golfers (KEEP= Player Name YearTurnedPro) USopen;
/* included YearTurnedPro in KEEP */
BY Player;
YearsExperience = Year - YearTurnedPro;
RUN;
proc print data = usopen2;
/* changed data set name */
run;
PROC SGPLOT DATA = USopen2
(where=(Score <= 280 and Score NE .));
/*added where statement*/
SCATTER X = YearsExperience Y = Score /
GROUP = Name MARKERATTRS = (SYMBOL = CIRCLEFILLED);
/* changed Player to Name and corrected spelling to MARKERATTRS*/
TITLE 'US Open Players with Scores <= 280 by Years Experience on the PGA Tour';
/* changed title to be more descriptive and changed unbalanced quote */
RUN;
/* Stevina code did not work for me*/
/* Corrected Program */
DATA golfers;
INPUT Player 1-5 Name $6-30 Country $31-36 YearTurnedPro 37-41 AgeTurnedPro 42-43 ;
/* used column pointers for all variables */
DATALINES;
1 Rory McIlroy NIR 2007 18
2 Jordan Spieth USA 2012 19
3 Bubba Watson USA 2002 24
4 Dustin Johnson USA 2007 23
5 Rickie Fowler USA 2009 21
6 Jim Furyk USA 1992 22
7 Henrik Stenson SWE 1999 23
8 Justin Rose ENG 1998 18
9 Jason Day AUS 2006 19
10 Sergio Garcia ESP 1999 19
;
RUN;
DATA USopen;
INPUT Year Player Score @@;
DATALINES;
2015 1 280 2015 2 275 2015 3 . 2015 4 276 2015 5 .
2015 6 287 2015 7 285 2015 8 285 2015 9 280 2015 10 283
2014 1 286 2014 2 284 2014 3 . 2014 4 281 2014 5 279
2014 6 283 2014 7 281 2014 8 283 2014 9 281 2014 10 288
2013 1 294 2013 2 . 2013 3 293 2013 4 297 2013 5 287
2013 6 . 2013 7 291 2013 8 281 2013 9 283 2013 10 295
;
/* replaced character O in 9th obs to number 0 */
RUN;
PROC SORT DATA = USopen;
BY Player;
RUN;
DATA USopen2;
MERGE golfers (KEEP= Player Name YearTurnedPro) USopen;
/* included YearTurnedPro in KEEP */
BY Player;
YearsExperience = Year - YearTurnedPro;
RUN;
proc print data = usopen2;
/* changed data set name */
run;
PROC SGPLOT DATA = USopen2
(where=(Score <= 280 and Score NE .));
/*added where statement*/
SCATTER X = YearsExperience Y = Score /
GROUP = Name MARKERATTRS = (SYMBOL = CIRCLEFILLED);
/* corrected spelling to MARKERATTRS
TITLE 'US Open Players with Scores <= 280 by Years Experience on the PGA Tour';
/* changed title to be more descriptive */
RUN;
In Proc Print the statement
WHERE Score <= 280;
should be
WHERE Score <= 280 and Score NE . ;
DATA golfers;
INPUT Player Name $ 6-30 Country $ YearTurnedPro $ AgeTurnedPro;
DATALINES;
1 Rory McIlroy NIR 2007 18
2 Jordan Spieth USA 2012 19
3 Bubba Watson USA 2002 24
4 Dustin Johnson USA 2007 23
5 Rickie Fowler USA 2009 21
6 Jim Furyk USA 1992 22
7 Henrik Stenson SWE 1999 23
8 Justin Rose ENG 1998 18
9 Jason Day AUS 2006 19
10 Sergio Garcia ESP 1999 19
;
RUN;
DATA USopen;
INPUT Year Player Score @@;
DATALINES;
2015 1 280 2015 2 275 2015 3 . 2015 4 276 2015 5 .
2015 6 287 2015 7 285 2015 8 285 2015 9 280 2015 10 283
2014 1 286 2014 2 284 2014 3 . 2014 4 281 2014 5 279
2014 6 283 2014 7 281 2014 8 283 2014 9 281 2014 10 288
2013 1 294 2013 2 . 2013 3 293 2013 4 297 2013 5 287
2013 6 . 2013 7 291 2013 8 281 2013 9 283 2013 10 295
;
RUN;
/*In 2 line in observation "2015 10 283" 2015 had o inplace of zero*/
PROC SORT DATA = USopen;
BY Player;
RUN;
DATA USopen2;
MERGE golfers (KEEP= Player Name YearTurnedPro) USopen;
/*YearTurnedPro should be a part of keep statement as it has to be used latter on in the program*/
BY Player;
YearsExperience = Year - YearTurnedPro;
RUN;
PROC SGPLOT DATA = USopen2;
/* The above DATA statement indicted wrong data set usopen in place of USopen2 as well as missed a semicolon*/
SCATTER X = YearsExperience Y = Score /
GROUP = Player MARKERATTRS = (SYMBOL = CIRCLEFILLED);
/*MARKERATTRS was miss spelled as MARKERATTERS*/
TITLE 'US Open Scores by Years on the PGA Tour';
RUN;
PROC PRINT DATA = USopen2 ;
/* NONUM option is used with Proc FSLIST and not with Proc print*/
WHERE Score <= 280;
VAR Name Year Score;
TITLE "US Open Scores of 280 or Less";
RUN;
/* The title statement had unbalanced quotation mark*/