Quantile regression: Better than connecting the sample quantiles of binned data

I often see variations of the following question posted on statistical discussion forums: I want to bin the X variable into a small number of values. For each bin, I want to draw the quartiles of the Y variable for that bin. Then I want to connect the corresponding quartile [...]
Post a Comment

The role of statistics in the top public health achievements of the 20th century

In this International Year of Statistics, I’d like to describe the major role of statistics in public health advances. In our modern society, it is sometimes difficult to recall the huge advances in health and medicine in the 20th century. To name a few: penicillin was discovered in 1928, risk [...]
Post a Comment

A statistician reads the newspaper: Forecasting rising sea levels

This is a third post on newspaper stories that I recently read. Today’s post deals with science, politics, and rising sea levels. Incidentally, the title is a blatant reference to John Allen Paulos’s brilliant book, A Mathematician Reads the Newspaper. Senate approves law that challenges sea-level science The NC legislature [...]
Post a Comment

A statistician reads the newspaper: Academic fraud

This is my second post on some newspaper articles that I recently read. Today’s post deals with academic fraud. Questions linger in academic fraud case Over the past year, the News and Observer has occasionally reported on a scandal at the University of North Carolina at Chapel Hill in which [...]
Post a Comment

A statistician reads the newspaper: The Secret Service scandal

This past weekend was Father’s Day, so I took some time to relax and read the newspaper. I found several stories that suggested interesting statistical questions. Unfortunately, the data are not available for analysis. Nevertheless, the stories are worth sharing. Over the next few days, I’ll post my thoughts on [...]
Post a Comment

Convergence or divergence? A simple iteration with a random component

A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS: data A(drop=m); call streaminit(12345); m = 2; x = 0; do i = 1 to 5000; x = m*x [...]
Post a Comment

The curse of dimensionality: How to define outliers in high-dimensional data?

After my post on detecting outliers in multivariate data in SAS by using the MCD method, Peter Flom commented “when there are a bunch of dimensions, every data point is an outlier” and remarked on the curse of dimensionality. What he meant is that most points in a high-dimensional cloud [...]
Post a Comment

What is Mahalanobis distance?

I previously described how to use Mahalanobis distance to find outliers in multivariate data. This article takes a closer look at Mahalanobis distance. A subsequent article will describe how you can compute Mahalanobis distance. Distance in standard units In statistics, we sometimes measure “nearness” or “farness” in terms of the [...]
Post a Comment

Explaining coincidence

I was on vacation when a family member sidled up to me. “Rick, you’re a statistician…” he began. I knew I was in trouble. He proceeded to tell me the story of Joseph “Newsboy” Moriarty, a New Jersey mobster who rose to prominence and became known as the bookie who [...]
Post a Comment

American pre-WW2 attitudes about Germany and Allies

Yesterday, December 7, 1941, a date which will live in infamy… – Franklin D. Roosevelt Today is the 70th anniversary of the Japanese attack on Pearl Harbor. The very next day, America declared war. During a visit to the Smithsonian National Museum of American History, I discovered the results of [...]
Post a Comment