Get the right information, with visual impact, to the people who need it

SAS' Kris Stobbe shows how you can predict survival rates of Titanic passengers with a combination of both Python and CAS using SWAT, then see how the models performed.
Get the right information, with visual impact, to the people who need it
SAS' Kris Stobbe shows how you can predict survival rates of Titanic passengers with a combination of both Python and CAS using SWAT, then see how the models performed.
A previous article shows how to interpret the collinearity diagnostics that are produced by PROC REG in SAS. The process involves scanning down numbers in a table in order to find extreme values. This can be a tedious and error-prone process. Friendly and Kwan (2009) compare this task to a
There are many ways to add more "visual impact" to your maps. Some techniques grab the users' attention, but often don't add anything useful to the message the map is trying to convey (such as 3D tricks, or flashy/gratuitous images and infographics). I encourage you to design maps that have
Starting your data scientist journey? Want to build your own predictive models? SAS' Xavier Bizoux shows you how to use SAS Visual Analytics to identify which model likely to perform the best.
A SAS programmer wanted to create a graph that illustrates how Deming regression differs from ordinary least squares regression. The main idea is shown in the panel of graphs below. The first graph shows the geometry of least squares regression when we regress Y onto X. ("Regress Y onto X"
Cancer touches nearly everyone. You probably know at least one person who's been diagnosed with cancer -- many of us know many more than one. It's the second leading cause of death worldwide, behind cardiovascular disease. World Cancer Day is observed this year on February 4, and is meant to
The COVID-19 Coronavirus outbreak has been in the news a lot lately, and everyone is probably looking for a quick/easy way to see the data. The best visualization I've seen so far is this dashboard by Johns Hopkins. Here's a screen-capture: But before we dive into the data analysis, let's
Almost everyone enjoys a good glass of wine after a long day, but did you ever stop to wonder how the exact bottle you're looking for makes its way to the grocery store shelf? Analytics has a lot to do with it, as SAS demonstrated to attendees at the National
I've read several articles that mentioned the north magnetic pole has been moving more in the past few decades, than in the previous few hundred years. And as a Map Guy, I knew I just had to plot this data on a map, and see it for myself! I provide
The Johnson system (Johnson, 1949) contains a family of four distributions: the normal distribution, the lognormal distribution, the SB distribution (which models bounded distributions), and the SU distribution (which models unbounded distributions). Note that 'B' stands for 'bounded' and 'U' stands for 'unbounded.' A previous article explains the purpose of
The coronavirus has been a big item in the news lately - it's a pneumonia-like illness that has killed several people. It's thought to have started in China, and has spread to several other countries (including at least one case in the U.S.). The World Health Organization says the coronavirus
The flu season has started here in the U.S., and according to the Centers for Disease Control and Prevention (CDC) data it has caused 214 deaths in the first week of 2020. Is this number higher, or lower, than usual? When does the flu season start, and how long does
Cuando hablamos de convertir datos en inteligencia no solamente estamos haciendo referencia a obtener de ellos conocimiento para tomar las mejores decisiones. También se trata de tener formas de comunicación mucho más efectivas e impactantes a través de capacidades avanzadas de visualización, para lograr que el conocimiento sea realmente usado
From the early days of probability and statistics, researchers have tried to organize and categorize parametric probability distributions. For example, Pearson (1895, 1901, and 1916) developed a system of seven distributions, which was later called the Pearson system. The main idea behind a "system" of distributions is that for each
Some business models will segment the worth of their customers into categories that will often give different levels of service to the more “higher worth” customers. The metric most often used for that is called Customer Lifetime Value (CLV). CLV is simply a balance sheet look at the total cost spent versus the total revenue earned over a customer’s projected tenure or “life.”
As I get older, a few of my buddies are starting to retire. And this makes me think about my own retirement (not that I'm anywhere near old enough to retire, mind you!) Therefore when I saw a list of the "Best & Worst Cities for Retiring" it caught my
In my book Simulating Data with SAS, I show how to use a graphical tool, called the moment-ratio diagram, to characterize and compare continuous probability distributions based on their skewness and kurtosis (Wicklin, 2013, Chapter 16). The idea behind the moment-ratio diagram is that skewness and kurtosis are essential for
Did you add "learn something new" to your list of New Year's resolutions? Last week, I wrote about the most popular articles from The DO Loop in 2019. The most popular articles are about elementary topics in SAS programming or univariate statistics because those topics have broad appeal. Advanced topics
If someone proposes a bet to you, then you should be suspicious that they already know they're going to win. And one frequent topic of such bets is the weather... What if I bet you there's a city in Canada with a warmer average January temperature than Raleigh, NC? You
Last year, I wrote more than 100 posts for The DO Loop blog. The most popular articles were about SAS programming tips for data analysis, statistical analysis, and data visualization. Here are the most popular articles from 2019 in each category. SAS programming tips Create training, testing, and validation data
Do you find yourself on the road during the holidays, and looking for a place to eat that's still open? Or perhaps you're like me - I don't cook at home, and I'm not really into visiting family for the holidays and eating with them. Well then, you probably know
The Rise of Skywalker, the final movie in the third set of the three Star Wars trilogies, will finally be released tomorrow (December 20, 2019). That's 9 movies, in about 42 years. And, if the first movies aren't still fresh in your mind (or perhaps you weren't even born when
From sports and health data to environmental and policy data, our data visualization experts have used SAS technologies to explore and present analyses on hundreds of topics throughout the year. We've selected some of the best in this end-of-year roundup to showcase their skills and SAS technologies, and to demonstrate
A 2-D "bin plot" counts the number of observations in each cell in a regular 2-D grid. The 2-D bin plot is essentially a 2-D version of a histogram: it provides an estimate for the density of a 2-D distribution. As I discuss in the article, "The essential guide to
I saw an article that claimed Donald Trump recently tweeted 123 times in one day. This got me wondering how many times he typically tweets during a day, and whether this number has changed over the years. This seems like it might be a good topic to analyze with a
I can tell that my area (Wake county, NC) has a growing population, because the traffic keeps getting worse and worse. But it's a little difficult to quantitatively gauge growth by looking at traffic congestion. Therefore let's have a look at a more direct measurement - the actual population data!
As we're getting into December, and the weather is getting colder, I thought it would be cool to plot some Antarctica data. You might remember I did this about 1.5 years ago, using good-old Proc Gmap, a special projection, and lots of tricky annotation. Well, this time let's use the
SAS' Leonid Batkhan shows you how to compare SAS data sets that include common and uncommon columns. You'll learn how to check mark commonalities and color-code differences in data tables side-by-side columns and add a comments field to see greater detail.
Recently I showed how to visualize and analyze longitudinal data in which subjects are measured at multiple time points. A very common situation is that the data are collected at two time points. For example, in medicine it is very common to measure some quantity (blood pressure, cholesterol, white-blood cell
This is a second article about analyzing longitudinal data, which features measurements that are repeatedly taken on subjects at several points in time. The previous article discusses a response-profile analysis, which uses an ANOVA method to determine differences between the means of an experimental group and a placebo group. The