It is well known that classical estimates of location and scale (for example, the mean and standard deviation) are influenced by outliers. In the 1960s, '70s, and '80s, researchers such as Tukey, Huber, Hampel, and Rousseeuw advocated analyzing data by using robust statistical estimates such as the median and the

## Tag: **time series**

When data contain outliers, medians estimate the center of the data better than means do. In general, robust estimates of location and sale are preferred over classical moment-based estimates when the data contain outliers or are from a heavy-tailed distribution. Thus, instead of using the mean and standard deviation of

For ordinary least squares (OLS) regression, you can use a basic bootstrap of the residuals (called residual resampling) to perform a bootstrap analysis of the parameter estimates. This is possible because an assumption of OLS regression is that the residuals are independent. Therefore, you can reshuffle the residuals to get

A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer

A moving average is a statistical technique that is used to smooth a time series. My colleague, Cindy Wang, wrote an article about the Hull moving average (HMA), which is a time series smoother that is sometimes used as a technical indicator by stock market traders. Cindy showed how to

It looks like we've finally recovered from the Great Recession, and there are even claims of record-low unemployment in several U.S. states. Of course claims like that make my data-radar go off, and I wanted to see the numbers for myself. And it's a great excuse for me to create

Dementia describes different brain disorders that trigger a loss of brain function. These conditions are all usually progressive and eventually severe. Alzheimer's disease is the most common type of dementia, affecting 62 percent of those diagnosed. Other types of dementia include; vascular dementia affecting 17 percent of those diagnosed, mixed

For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, the DIF function in the DATA step computes differences between observations. The DIF function

Time series machine learning techniques show great promise for the analysis of health care wearable data. As our busy lifestyles render continuous monitoring more and more essential, the need to analyze data to find correlations between these data streams becomes even more important, because they can provide important cues to

Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of

Many computations in elementary probability assume that the probability of an event is independent of previous trials. For example, if you toss a coin twice, the probability of observing "heads" on the second toss does not depend on the result of the first toss. However, there are situations in which

I have previously shown how to overlay basic plots on box plots when all plots share a common discrete X axis. It is interesting to note that box plots can also be overlaid on a continuous (interval) axis. You often need to bin the data before you create the plot.

Last week I discussed how to create spaghetti plots in SAS. A spaghetti plot is a type of line plot that contains many lines. Spaghetti plots are used in longitudinal studies to show trends among individual subjects, which can be patients, hospitals, companies, states, or countries. I showed ways to

What is a spaghetti plot? Spaghetti plots are line plots that involve many overlapping lines. Like spaghetti on your plate, they can be hard to unravel, yet for many analysts they are a delicious staple of data visualization. This article presents the good, the bad, and the messy about spaghetti

We recently had a flooding event at Jordan Lake where the water rose almost 20 feet above normal. This blog details that flooding event in both photos and graphs. If you're intrigued by weather, boats, or lakes then this blog's for you! In NC's Research Triangle Park area, there are basically two

Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed

A common question on SAS discussion forums is how to compute a moving average in SAS. This article shows how to use PROC EXPAND and contains links to articles that use the DATA step or macros to compute moving averages in SAS. In a previous post, I explained how to

A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling

In SAS, the aspect ratio of a graph is the physical height of the graph divided by the physical width. Recently I demonstrated how to set the aspect ratio of graphs in SAS by using the ASPECT= option in PROC SGPLOT or by using the OVERLAYEQUATED statement in the Graph

Macroeconometrics is not dead: (and I wish I had paid better attention in my time series course): I wrote this on the way to see one of our manufacturing clients in Austin, Texas, anticipating a discussion how to use vector autoregressive models in process control. It is a typical use

With a major election coming next year, I was wondering if there have been any shifts & changes in the voters in my state. This seems like an interesting opportunity for some data analysis, eh!?! To get you into the spirit of elections, here's an "I Voted" sticker from my friend

Gartner has stated that there are nearly five billion connected devices throughout the world today and predicts that there will be more than 25 billion by 2020, making the potential of this technology unlimited. The connected devices in industrial settings, in personal devices, and in our homes are creating a

You've probably heard of a random walk, but have you heard about the drunkard's walk? I've previously written about how to simulate a one-dimensional random walk in SAS. In the random walk, you imagine a person who takes a series of steps where the step size and direction is a

I saw an interesting graph on dadaviz.com that claimed Italians had gone from drinking twice as much as Americans in 1970, to less than Americans in recent years. The data analyst in me just had to "independently verify" this factoid ... But before I get into the technical part of this

The date of Easter influences our leisure activities Different from many other public holidays, Easter is a so-called movable holiday. This means that the Easter bunny brings more than just eggs for the statistician - he brings special Easter forecasting challenges. In the year 325 CE the Council for Nicea

I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data

Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis

After the legalization of recreational marijuana use in Colorado in 2012, it has been a much more frequent news topic than before - even from a data analysis perspective... I was recently looking for 'interesting' data to analyze with SAS, and I noticed some articles about the increasing potency of marijuana in

You’ve heard about the smart grid, but what is it that makes the grid smart? I’ve been working on a project with Duke Energy and NC State University doing time-series analysis on data from Phasor Measurement Units (PMUs) that illustrates the intelligence in the grid as well as an interesting

This post will violate the “what happens in Vegas stays in Vegas” rule, because last week I had the pleasure of attending and participating in the Analytics 2014 event there and want to share some of what I heard for those who couldn’t attend. I was joined by over 1,000