I published 118 blog posts in 2014. This article presents my most popular posts from 2014 and late 2013.
2014 will always be a special year for me because it was the year that the SAS University Edition was launched. The University Edition means that SAS/IML is available to all students and adult learners all over the world. My article "10 tips for learning the SAS/IML language" has been popular as new programmers take advantage of this free opportunity to learn the SAS/IML language for matrix computations and data analysis.
General math and statistics articles
Although I mostly write about statistical programming, several of my general articles on math and statistics were very popular:
- "Does this kurtosis make my tail look fat?" explains kurtosis and why long tails and fat tails matter when modeling data.
- Have you heard of an Ulam Spiral? It is a graphical way to visualize the distributions of prime numbers.
- Do you know the fundamental theorem of calculus? How about the Fundamental Theorem of Statistics?
- Speaking of fundamentals, in his 2014 ASA President's Invited Address, historian and statistician Stephen Stigler answers the question "What is statistics?" by proposing seven pillars of statistical wisdom.
Statistical Graphics and Data Visualization
Who doesn't like to learn better ways to visualize data in SAS? The following posts generated some interesting discussions.
- You can use box plots to visualize the distributions of 100 variables in a single plot.
- When your data values range over several orders of magnitude, it is often useful to log-transform the data. I wrote about how to create a scatter plot with log-log axes and how to use the log-plus-one transformation to handle 0s in the data.
- In a similar way, you can apply a log-modulus transformation to visualize data that are both positive and negative and range over orders of magnitude.
- A frequent question on discussion forums is how to create a stacked bar chart so that each category sums to 100%. I wrote a program that shows how to create a stacked bar chart in SAS by using PROC FREQ and PROC SGPLOT.
An important visualization is to visualize the data in a rectangular matrix by using a heat map:
- You can use a Graph Template Language (GTL) template to visualize a data matrix that contains a small number of unique values. The SAS/IML language makes it easy to create the heat map by using the built-in HEATMAPDISC subroutine.
- In a similar way, you can use a heat map with a continuous color ramp to visualize a matrix with many values.. In SAS/IML, you can create the heat map by using the built-in HEATMAPCONT subroutine.
- When creating a heat map or a choropleth map, it can be a challenge to choose a palette of colors. Read about how to choose a color scheme that presents an unbiased view of the data.
Regression and data smoothers
Everyone loves a good regression tip!
- A common regression task is to evaluate a model on new explanatory variables in order to predict responses. SAS software provides five ways to "score" a regression model.
- Some people like to use cubic splines, which were popular in the 1970s, to smooth their data. Learn how to fit a cubic spline to two-dimensional data and add the spline curve to a scatter plot.
- A drawback of the cubic spline is that the smoothing parameter must be manually chosen. However, you can use nonparametric statistical techniques to add a smooth curve to a scatter plot, and these techniques provide an automated way to select the smoothing parameter.
Start your new year by (re-)reading one of these 14 popular posts from 2014. Happy New Year to all my readers!