A colleague recently posted an article about how to use SAS Visual Analytics to create a circular graph that displays a year's worth of temperature data. Specifically, the graph shows the air temperature for each day in a year relative to some baseline temperature, such as 65F (18C). Days warmer
Search Results: sgplot (964)
I'm a bit of a boat paddling enthusiast, as you might have guessed from some of my previous paddling blog posts. The amount of exertion in race-paddling is similar to running - the longest race I've paddled in a race so far was 13 miles (half-marathon distance). But in the
The ROC curve is a graphical method that summarizes how well a binary classifier can discriminate between two populations, often called the "negative" population (individuals who do not have a disease or characteristic) and the "positive" population (individuals who do have it). As shown in a previous article, there is
一般使用者常常希望能以不同之統計圖,展示資料之特性。運用SAS/GRAPH 新版之SG系列程序,可以讓使用者非常容易的將資料,以更有效率之統計圖,展示資料內容。本文再進一步,提供以下SAS /GRAPH之SG系列程序使用範例。 <<範例一>> << 程式 : 製作水平盒鬚圖 >> 程式說明如下 : 1. ODS LISTING CLOSE: 將SAS/GRAPH 輸出視窗關閉。 2. ODS HTML PATH=’C:OUTPUT’ BODY=’BOX.HTML’: PATH宣告儲存之子目錄,BODY宣告儲存之檔名。將盒鬚圖以HTML格式儲存於 C:OUTPUT BOX.HTML之檔案。 3. PROC SGPLOT:運用SGPLOT 程序,製作水平盒鬚圖。 4. HBOX MPG_CITY: 以MPG_CITY 變數為分析變數,製作水平盒鬚圖。 5. CATEGORY=TYPE: 以TYPE 變數為水平組別,製作水平盒鬚圖。 6. ODS HTML CLOSE: 將SAS/GRAPH 輸出至HTML之檔案關閉。 7. ODS LISTING: 將SAS/GRAPH 輸出視窗再度開啟,以便後續輸出之用。 水平盒鬚圖,結果如下:
A previous article shows how to interpret the collinearity diagnostics that are produced by PROC REG in SAS. The process involves scanning down numbers in a table in order to find extreme values. This can be a tedious and error-prone process. Friendly and Kwan (2009) compare this task to a
A SAS programmer wanted to create a graph that illustrates how Deming regression differs from ordinary least squares regression. The main idea is shown in the panel of graphs below. The first graph shows the geometry of least squares regression when we regress Y onto X. ("Regress Y onto X"
The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution (which models bounded distributions), and the SU distribution (which models unbounded distributions). Note that 'B' stands for 'bounded' and 'U' stands for 'unbounded.' A previous article explains the purpose of
The flu season has started here in the U.S., and according to the Centers for Disease Control and Prevention (CDC) data it has caused 214 deaths in the first week of 2020. Is this number higher, or lower, than usual? When does the flu season start, and how long does
From the early days of probability and statistics, researchers have tried to organize and categorize parametric probability distributions. For example, Pearson (1895, 1901, and 1916) developed a system of seven distributions, which was later called the Pearson system. The main idea behind a "system" of distributions is that for each
Did you add "learn something new" to your list of New Year's resolutions? Last week, I wrote about the most popular articles from The DO Loop in 2019. The most popular articles are about elementary topics in SAS programming or univariate statistics because those topics have broad appeal. Advanced topics
Many SAS procedures can automatically create a graph that overlays multiple prediction curves and their prediction limits. This graph (sometimes called a "fit plot" or a "sliced fit plot") is useful when you want to visualize a model in which a continuous response variable depends on one continuous explanatory variable
If someone proposes a bet to you, then you should be suspicious that they already know they're going to win. And one frequent topic of such bets is the weather... What if I bet you there's a city in Canada with a warmer average January temperature than Raleigh, NC? You
Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category. SAS programming tips Create training, testing, and validation data
The Rise of Skywalker, the final movie in the third set of the three Star Wars trilogies, will finally be released tomorrow (December 20, 2019). That's 9 movies, in about 42 years. And, if the first movies aren't still fresh in your mind (or perhaps you weren't even born when
A 2-D "bin plot" counts the number of observations in each cell in a regular 2-D grid. The 2-D bin plot is essentially a 2-D version of a histogram: it provides an estimate for the density of a 2-D distribution. As I discuss in the article, "The essential guide to
I saw an article that claimed Donald Trump recently tweeted 123 times in one day. This got me wondering how many times he typically tweets during a day, and whether this number has changed over the years. This seems like it might be a good topic to analyze with a
Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell
This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The
Longitudinal data are used in many health-related studies in which individuals are measured at multiple points in time to monitor changes in a response variable, such as weight, cholesterol, or blood pressure. There are many excellent articles and books that describe the advantages of a mixed model for analyzing longitudinal
With time series data analysis, we can apply moving average methods to predict data points without seasonality. This includes Simple Average (SA), Simple Moving Average (SMA), Weighted Moving Average (WMA), Exponential Moving Average (EMA), etc. For series with a trend but without seasonality, we can use linear, non-linear and autoregressive
What is an efficient way to evaluate a multivariate quadratic polynomial in p variables? The answer is to use matrix computations! A multivariate quadratic polynomial can be written as the sum of a purely quadratic term (degree 2), a purely linear term (degree 1), and a constant term (degree 0).
The English language can be a bit tough to learn. One reason is that sometimes words can have more than one meaning. For example, the word shady can mean "of doubtful honesty or legality," or it can mean "giving shade from sunlight." Which of those meanings am I thinking about,
My colleague, Mike Drutar, recently showed how to create a "strip plot" that shows the distribution of temperatures for each calendar month at a particular location. Mike created the strip plot in SAS Visual Analytics by using a point-and-click interface. This article shows how to create a similar graph by
Biplots are two-dimensional plots that help to visualize relationships in high dimensional data. A previous article discusses how to interpret biplots for continuous variables. The biplot projects observations and variables onto the span of the first two principal components. The observations are plotted as markers; the variables are plotted as
With all this sitting at a desk writing code, I have to do something to keep in shape. And for me, that something is paddling boats ... as fast as I can - and occasionally trying to race them. This past weekend I entered the race at Hunting Island, SC.
Understanding multivariate statistics requires mastery of high-dimensional geometry and concepts in linear algebra such as matrix factorizations, basis vectors, and linear subspaces. Graphs can help to summarize what a multivariate analysis is telling us about the data. This article looks at four graphs that are often part of a principal
Find out about the changes and enhancements to the best-selling book, The Little SAS Book.
After a marathon of a season, 162 games in each team's schedule to be precise, the stakes for Major League Baseball are higher in October, and postseason play is underway. Whether it's the renewal of an old rivalry, redemption for last year's runners up, or rooting for this season's breakout
Eliud Kipchoge recently ran a marathon in under 2 hours. It was a special marathon where they had set up the best possible conditions to help him achieve this goal (such as swapping in pace-setting runners to block the wind for him), so it won't count as the world record
In response to a recent article about how to compute the cosine similarity of observations, a reader asked whether it is practical (or even possible) to perform these types of computations on data sets that have many thousands of observations. The problem is that the cosine similarity matrix is an