Automate the placement of reference lines in PROC SGPLOT

The REFLINE statement in PROC SGPLOT is one of my favorite ways to augment statistical graphics such as scatter plots, series plots, and histograms. The REFLINE statement overlays a vertical or horizontal reference line on a graph. You can specify the location of the reference lines on the REFLINE statement.

Robust statistics for skewness and kurtosis

Intuitively, the skewness of a unimodal distribution indicates whether a distribution is symmetric or not. If the right tail has more mass than the left tail, the distribution is "right skewed." If the left tail has more mass, the distribution is "left skewed." Thus, estimating skewness requires some estimates about

[SAS로 딥러닝 시작하기#2]딥러닝 성능 개선 방법 '하이퍼파라미터 튜닝'

지난 딥러닝 시리즈에서는 SAS Visual Data Mining and Machine Learning을 활용한 딥 러닝 모델 생성에 대한 내용 중 <기본 심층 신경망(DNN) 모델 아키텍처와 배치 정규화를 사용한 DNN 모델 구축>에 대해 소개해 드렸습니다. 이번 시리즈에서는 딥 러닝 성능을 개선할 수 있는 하이퍼파라미터를 조정에 대해 소개해 드립니다. 일정 기간에 걸쳐 성능이 향상되고

The expected value of the tail of a distribution

The expected value of a random variable is essentially a weighted mean over all possible values. You can compute it by summing (or integrating) a probability-weighted quantity over all possible values of the random variable. The expected value is a measure of the "center" of a probability distribution. You can

Tips to simulate binary and categorical variables

When there are two equivalent ways to do something, I advocate choosing the one that is simpler and more efficient. Sometimes, I encounter a SAS program that simulates random numbers in a way that is neither simple nor efficient. This article demonstrates two improvements that you can make to your

Coronavirus: per million, per 100k, or percent?

A user commented on one of my previous maps ... "How can there be 820 cases of Coronavirus per 100,000 people? - There aren't even 100,000 people in my county!" Well, when you want to compare something like the number of COVID-19 cases between two areas that have differing populations,

Do low mortgage rates bring you joy(plots)?

When it comes to plotting mortgage rate data, I often look to Len Kiefer for inspiration. He recently posted a retro-looking graph on twitter that caught my eye ... and of course I had to see if I could create something similar using SAS. For lack of a better term,

Confidence intervals for eigenvalues of a correlation matrix

A fundamental principle of data analysis is that a statistic is an estimate of a parameter for the population. A statistic is calculated from a random sample. This leads to uncertainty in the estimate: a different random sample would have produced a different statistic. To quantify the uncertainty, SAS procedures

[SAS로 딥러닝 시작하기#1]기본 DNN, 배치 정규화를 사용한 DNN 모델 구축

딥 러닝은 인공 지능과 함께 유비쿼터스가 된 머신 러닝의 한 영역입니다. 딥 러닝 모델의 복잡하고 뇌와 유사한 구조는 대량의 데이터에서 복잡한 패턴을 찾는 데 사용됩니다. 이러한 모델은 일반 지도 학습 모델, 시계열, 음성 인식, 객체 탐지 및 분류, 감성 분석의 성능을 크게 향상시켰습니다. 사전 정의된 방정식을 실행하도록 데이터를 구성하는 대신

Generate random points in a polygon

The triangulation theorem for polygons says that every simple polygon can be triangulated. In fact, if the polygon has V vertices, you can decompose it into V-2 non-overlapping triangles. In this article, a "polygon" always means a simple polygon. Also, a "random point" means one that is drawn at random

Early voting in North Carolina (2020 vs 2016)

Here in the United States, we have our general election (where we elect the president) every four years - and 2020 happens to be one of those election years. This time we seem to have a lot more people voting early. I can't tell you the reason they're voting early

Generate random points in a triangle

How can you efficiently generate N random uniform points in a triangular region of the plane? There is a very cool algorithm (which I call the reflection method) that makes the process easy. I no longer remember where I saw this algorithm, but it is different from the "weighted average"

SAS para Ciência de Dados? Sim!

A evolução do analytics e da ciência de dados gera constantes atualizações e transformações nas plataformas de análises. Este artigo tem o propósito de apresentar como o SAS tem acompanhado essa evolução. Ambiente Integrado: uma única plataforma, diversas tarefas O SAS oferece recursos que permitem acessar, explorar, transformar, analisar e

The Poisson-binomial distribution for hundreds of parameters

A previous article shows how to use a recursive formula to compute exact probabilities for the Poisson-binomial distribution. The recursive formula is an O(N2) computation, where N is the number of parameters for the Poisson-binomial (PB) distribution. If you have a distribution that has hundreds (or even thousands) of parameters,

NC's voter registration data, for the 2020 election

When an election is on the horizon, I always feel compelled to plot some data! And this time I'm plotting North Carolina's voter registration data! State Data For this graph, I downloaded several of the data snapshots from the NC State Board of Elections' website, and plotted a line showing

Open Source Model Management through REST APIs: Registration

The model management process, which is part of ModelOps, consists of registration, deployment, monitoring and retraining. This post is part of a series examining the model management process, orchestrated through the Model Manager (MM) APIs. The focus of part one is on model registration, specifically on using the APIs from

SASからMicrosoft AzureのBlobストレージ内データにアクセスする方法（第2回）

Trap and map: Trapping invalid values

Finite-precision computations can be tricky. You might know, mathematically, that a certain result must be non-negative or must be within a certain interval. However, when you actually compute that result on a computer that uses finite-precision, you might observe that the value is slightly negative or slightly outside of the

How to refactor SAS code to leverage SAS Viya

If you're a SAS programmer who now uses SAS Viya and CAS, it's worth your time to optimize your existing programs to take advantage of the new environment. This post is a continuation of my SAS Global Forum 2020 paper Best Practices for Converting SAS® Code to Leverage SAS® Cloud

SASからMicrosoft AzureのBlobストレージ内データにアクセスする方法（第1回）

Visualizing 3 waves of COVID-19 in the US

Now that we are many months into the COVID-19 pandemic, I've started going back and reexamining the data for lessons or trends (you might say hindsight is 20/20). This time, I want to explore how COVID-19 has been spreading around the US. I do this by using a graphical idea

The Poisson-binomial distribution

The Poisson-binomial distribution is a generalization of the binomial distribution. For the binomial distribution, you carry out N independent and identical Bernoulli trials. Each trial has a probability, p, of success. The total number of successes, which can be between 0 and N, is a binomial random variable. The distribution