For a time series { y1, y2, ..., yN }, the difference operator computes the difference between two observations. The kth-order difference is the series { yk+1 - y1, ..., yN - yN-k }. In SAS, the DIF function in the DATA step computes differences between observations. The DIF function

## Tag: **time series**

Time series machine learning techniques show great promise for the analysis of health care wearable data. As our busy lifestyles render continuous monitoring more and more essential, the need to analyze data to find correlations between these data streams becomes even more important, because they can provide important cues to

Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of

Many computations in elementary probability assume that the probability of an event is independent of previous trials. For example, if you toss a coin twice, the probability of observing "heads" on the second toss does not depend on the result of the first toss. However, there are situations in which

I have previously shown how to overlay basic plots on box plots when all plots share a common discrete X axis. It is interesting to note that box plots can also be overlaid on a continuous (interval) axis. You often need to bin the data before you create the plot.

Last week I discussed how to create spaghetti plots in SAS. A spaghetti plot is a type of line plot that contains many lines. Spaghetti plots are used in longitudinal studies to show trends among individual subjects, which can be patients, hospitals, companies, states, or countries. I showed ways to

What is a spaghetti plot? Spaghetti plots are line plots that involve many overlapping lines. Like spaghetti on your plate, they can be hard to unravel, yet for many analysts they are a delicious staple of data visualization. This article presents the good, the bad, and the messy about spaghetti

We recently had a flooding event at Jordan Lake where the water rose almost 20 feet above normal. This blog details that flooding event in both photos and graphs. If you're intrigued by weather, boats, or lakes then this blog's for you! In NC's Research Triangle Park area, there are basically two

Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed

A common question on SAS discussion forums is how to compute a moving average in SAS. This article shows how to use PROC EXPAND and contains links to articles that use the DATA step or macros to compute moving averages in SAS. In a previous post, I explained how to

A moving average (also called a rolling average) is a statistical technique that is used to smooth a time series. Moving averages are used in finance, economics, and quality control. You can overlay a moving average curve on a time series to visualize how each value compares to a rolling

In SAS, the aspect ratio of a graph is the physical height of the graph divided by the physical width. Recently I demonstrated how to set the aspect ratio of graphs in SAS by using the ASPECT= option in PROC SGPLOT or by using the OVERLAYEQUATED statement in the Graph

Macroeconometrics is not dead: (and I wish I had paid better attention in my time series course): I wrote this on the way to see one of our manufacturing clients in Austin, Texas, anticipating a discussion how to use vector autoregressive models in process control. It is a typical use

With a major election coming next year, I was wondering if there have been any shifts & changes in the voters in my state. This seems like an interesting opportunity for some data analysis, eh!?! To get you into the spirit of elections, here's an "I Voted" sticker from my friend

Gartner has stated that there are nearly five billion connected devices throughout the world today and predicts that there will be more than 25 billion by 2020, making the potential of this technology unlimited. The connected devices in industrial settings, in personal devices, and in our homes are creating a

You've probably heard of a random walk, but have you heard about the drunkard's walk? I've previously written about how to simulate a one-dimensional random walk in SAS. In the random walk, you imagine a person who takes a series of steps where the step size and direction is a

I saw an interesting graph on dadaviz.com that claimed Italians had gone from drinking twice as much as Americans in 1970, to less than Americans in recent years. The data analyst in me just had to "independently verify" this factoid ... But before I get into the technical part of this

The date of Easter influences our leisure activities Different from many other public holidays, Easter is a so-called movable holiday. This means that the Easter bunny brings more than just eggs for the statistician - he brings special Easter forecasting challenges. In the year 325 CE the Council for Nicea

I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data

Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis

After the legalization of recreational marijuana use in Colorado in 2012, it has been a much more frequent news topic than before - even from a data analysis perspective... I was recently looking for 'interesting' data to analyze with SAS, and I noticed some articles about the increasing potency of marijuana in

You’ve heard about the smart grid, but what is it that makes the grid smart? I’ve been working on a project with Duke Energy and NC State University doing time-series analysis on data from Phasor Measurement Units (PMUs) that illustrates the intelligence in the grid as well as an interesting

This post will violate the “what happens in Vegas stays in Vegas” rule, because last week I had the pleasure of attending and participating in the Analytics 2014 event there and want to share some of what I heard for those who couldn’t attend. I was joined by over 1,000

It's easy to plot events that happened at a certain time, but what about events that extended over a range of dates, such as recessions? ... This blog post teaches you a nice trick to use for that! Let's say you have a plot of the labor force participation rate

In light of the recent reports that glaciers in Antarctica are melting, what SAS graphs might be useful in analyzing the data?... When floating sea ice melts (such as at the North Pole), it doesn't raise the sea level - but when ice on land melts (such as glaciers at

Vector languages such as SAS/IML, MATLAB, and R are powerful because they enable you to use high-level matrix operations (matrix multiplication, dot products, etc) rather than loops that perform scalar operations. In general, vectorized programs are more efficient (and therefore run faster) than programs that contain loops. For an example

Wavelet analysis is an exciting and relatively new field of study that enables one to extract underlying patterns either from spatially varying or temporally varying data. Pixel values representing the relative brightness and color that constitute an image are an example of spatially varying data, and daily variations of financial

Finding the maximum value of a function is an important task in statistics. There are three approaches to finding a maxima: When the function is available as an analytic expression, you can use an optimization algorithm to find the maxima. For example, in the SAS/IML language, you can use any

Don’t worry! This is not an excerpt from a romantic love letter. The title of this blog post is an allusion to my talk on "Missing Values", at the A2013 conference in June in London. There is not much time for emotions: dealing with missing values in analysis is not

To a statistician, the DIF function (which was introduced in SAS/IML 9.22) is useful for time series analysis. To a numerical analyst and a statistical programmer, the function has many other uses, including computing finite differences. The DIF function computes the difference between the original vector and a shifted version