This week, I posted the 100th article to The DO Loop. To celebrate, I'm going to analyze the content of my first 100 articles.
In December 2010, I compiled a list of The DO Loop's most-read posts, so I won't repeat that exercise. Instead, I thought it would be interesting to analyze what I've said in my blog by looking at the frequency of words over the first 100 posts.
I concatenated the first 100 posts into a single file, and I used SAS software (specifically, PROC FREQ) to analyze the most frequent terms. Then I created a Wordle word cloud, which uses size to represent the frequency of occurrence of the top terms.
I like this word cloud because it summarizes my objectives for The DO Loop:
- Data is central to my blog.
- Data is surrounded by terms related to statistical programming: matrix, function, SAS, SAS/IML, program, and compute.
- Surrounding those terms are keywords in the SAS/IML language: USE, READ, DO, IF/THEN, PRINT, and CALL.
- Interspersed throughout the cloud are terms related to matrix computations: row, column, values, and vector.
- Statistical terms are also interspersed: random, variables, probability, distribution, mean, and covariance.
The word cloud interlaces these different objectives, just like I attempt to write articles that interleave statistics, data analysis, and programming.
In that spirit, I can't think of a better way to celebrate Post #101 than to write a SAS program that uses the SGPLOT procedure to visualize the general topics of my blog posts. (Counts do not add to 100 because some articles belong to multiple categories.)
title "The DO Loop: Top Categories"; proc sgplot data=Categories; hbar Category / freq=Freq; xaxis grid integer; yaxis discreteorder=data; run;
Thanks to my colleagues, Alison, Anne, and Chris, for their support of this blog. Thanks to all of my readers for your patronage, comments, and encouragement. I look forward to writing the next 100 articles, and I hope you look forward to reading them.