I've got this buddy, Carter Johnson - he's a little bit crazy, but a lot of fun to follow... He holds/held several different long-distance paddling world records, and was one of the coaches for the group that paddled kayaks from Cuba to the US (see my blog post). A few
Search Results: sgplot (958)
You've probably seen a population pyramid, such as this one I showed in a previous blog post. But let's scrutinize population pyramids a bit deeper, with an eye on special features that can make them even more useful! I was inspired to give population trees a second look by this
It is well known that classical estimates of location and scale (for example, the mean and standard deviation) are influenced by outliers. In the 1960s, '70s, and '80s, researchers such as Tukey, Huber, Hampel, and Rousseeuw advocated analyzing data by using robust statistical estimates such as the median and the
When data contain outliers, medians estimate the center of the data better than means do. In general, robust estimates of location and sale are preferred over classical moment-based estimates when the data contain outliers or are from a heavy-tailed distribution. Thus, instead of using the mean and standard deviation of
A previous article discusses the definition of the Hoeffding D statistic and how to compute it in SAS. The letter D stands for "dependence." Unlike the Pearson correlation, which measures linear relationships, the Hoeffding D statistic tests whether two random variables are independent. Dependent variables have a Hoeffding D statistic
Cindy Wang's curiosity about the Mandelbrot set led her to draw one using SAS Visual Analytics.
There are many statistics that measure whether two continuous random variables are independent or whether they are related to each other in some way. The most well-known statistic is Pearson's correlation, which is a parametric measure of the linear relationship between two variables. A related measure is Spearman's rank correlation,
Ranking is a fundamental concept in statistics. Ranks of univariate data are used by statisticians to estimate statistics such as percentiles (quantiles) and empirical distributions. A more advanced use is to compute various rank-based measures of correlation or association between pairs of variables. For example, ranks are used to compute
Have you ever brought home a piece of furniture-in-a-box, and felt undue stress while trying to make sense of the directions to assemble it? ... Apparently you're not alone! A recent analysis studied ~50,000 tweets about IKEA furniture, and determined whether the people posting the tweets were frustrated. They then
Most introductory statistics courses introduce the bar chart as a way to visualize the frequency (counts) for a categorical variable. A vertical bar chart places the categories along the horizontal (X) axis and shows the counts (or percentages) on the vertical (Y) axis. The vertical bar chart is a precursor
A previous article discusses how to interpret regression diagnostic plots that are produced by SAS regression procedures such as PROC REG. In that article, two of the plots indicate influential observations and outliers. Intuitively, an observation is influential if its presence changes the parameter estimates for the regression by "more
This article shows how to use PROC SGPLOT in SAS to create the scatter plot shown to the right. The scatter plot has the following features: The colors of markers are determined by the value of a third variable. The outline of each marker is the same color (such as
Linear programming (LP) and mixed integer linear programming (MILP) solvers are powerful tools. Many real-world business problems, including facility location, production planning, job scheduling, and vehicle routing, naturally lead to linear optimization models. Sometimes a model that is not quite linear can be transformed to an equivalent linear model to reduce
Here is an interesting math question: How many reduced fractions in the interval (0, 1) have a denominator less than 100? The question is difficult is because of the word "reduced." If we only care about the total number of fractions in (0,1) whose denominator is less than 100, we
This is another in my series of blog posts where I take a deep dive into converting customized R graphs into SAS graphs. Today we'll be working on bar charts ... And to give you a hint about what data I'll be using this time, here's a picture from a SAS
A SAS customer wanted to compute the cumulative distribution function (CDF) of the generalized gamma distribution. For any continuous distribution, the CDF is the integral of the probability density function (PDF), which usually has an explicit formula. Accordingly, he wanted to compute the CDF by using the QUAD function in
This is my Pi Day post for 2021. Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. Most years I write about lower-case pi (π), which is the ratio of a circle's
I recently learned about a new feature in PROC QUANTREG that was added in SAS/STAT 15.1 (part of SAS 9.4M6). Recall that PROC QUANTREG enables you to perform quantile regression in SAS. (If you are not familiar with quantile regression, see an earlier article that describes quantile regression and provides
I recently wrote about a simple statistical formula that approximates the wind chill temperature, which is the cumulative effect of air temperature and wind on the human body. The formula uses two independent variables (air temperature and wind speed) to predict the wind chill temperature. This article describes how to
A previous article describes how to use the SGPANEL procedure to visualize subgroups of data. It focuses on using headers to display information about each graph. In the example, the data are time series for the price of several stocks, and the headers include information about whether the stock price
Many characteristics of a graph are determined by the underlying data at run time. A familiar example is when you use colors to indicate different groups in the data. If the data have three groups, you see three colors. If the data have four groups, you see four colors. The
This is another in my series of blogs where I take a deep dive into converting a customized R graph into a SAS graph. Today I'm focusing on a diverging bar chart (where one bar segment is above the zero line, and the other is below). What type of data
This is another in my series of blogs where I take a deep dive into converting a customized R graph into a SAS ODS Graphics graph. This time the example is a needle plot (that's essentially like a bar plot, with lots of tiny bars, plotted along a continuous xaxis).
In a previous article, I showed how to generate random points uniformly inside a d-dimensional sphere. In that article, I stated the following fact: If Y is drawn from the uncorrelated multivariate normal distribution, then S = Y / ||Y|| has the uniform distribution on the unit sphere. I was
In the past, Sanjay showed how to create several basic graphs using both R and SAS ODS Graphics code. I'm going to take a bit of a "deeper dive" and focus a series of blog posts on highly customized graphs. Hopefully the code for these customizations will provide you with
The inverse gamma distribution is a continuous probability distribution that is used in Bayesian analysis and in some statistical models. The inverse gamma distribution is closely related to the gamma distribution. For any probability distribution, it is essential to know how to compute four functions: the PDF function, which returns
Years ago, I wrote about how to compute the incomplete beta function in SAS. Recently, a SAS programmer asked about a similar function, called the incomplete gamma function. The incomplete gamma function is a "special function" that arises in applied math, physics, and statistics. You should not confuse the gamma
I recently had a discussion with a friend, and we were wondering about Apple's market share. This led me to look into the actual data ... finding the online charts lacking, and then designing my own charts. Follow along if you're curious about the process of improving the charts, or
Over 57 billion minutes of The Office was streamed in 2020. My family bears some responsibility. Here's our activity visualized -- using SAS.
Have you ever heard of the DOLIST syntax? You might know the syntax even if you are not familiar with the name. The DOLIST syntax is a way to specify a list of numerical values to an option in a SAS procedure. Applications include: Specify the end points for bins