Strengthen your programming skills with tips and techniques from the experts

SAS' Leonid Batkhan demonstrates a popular "divide-and-conquer" efficiency strategy using SAS/Connect®.

Strengthen your programming skills with tips and techniques from the experts

Running SAS programs in parallel using SAS/CONNECT®

SAS' Leonid Batkhan demonstrates a popular "divide-and-conquer" efficiency strategy using SAS/Connect®.

The moving block bootstrap for time series

As I discussed in a previous article, the simple block bootstrap is a way to perform a bootstrap analysis on a time series. The first step is to decompose the series into additive components: Y = Predicted + Residuals. You then choose a block length (L) that divides the total

Blog posts from 2020 that deserve a second look

On The DO Loop blog, I write about a diverse set of topics, including statistical data analysis, machine learning, statistical programming, data visualization, simulation, numerical analysis, and matrix computations. In a previous article, I presented some of my most popular blog posts from 2020. The most popular articles often deal

List of 'big' movies you might not have seen yet

If you've been stuck at home a lot lately, and think you have run out of movies to watch -- think again! Here is a list of big-budget movies you might not have seen, because they flopped (lost lots of money). Follow along as I show you how I created

How to schedule and manage your SAS hot fixes

This, the third of three posts on our hot-fix process, provides a spreadsheet and tips to track and manage your SAS®9 environment.

The simple block bootstrap for time series in SAS

For ordinary least squares (OLS) regression, you can use a basic bootstrap of the residuals (called residual resampling) to perform a bootstrap analysis of the parameter estimates. This is possible because an assumption of OLS regression is that the residuals are independent. Therefore, you can reshuffle the residuals to get

Top posts from *The DO Loop* in 2020

Last year, I wrote more than 100 posts for The DO Loop blog. In previous years, the most popular articles were about SAS programming tips, statistical analysis, and data visualization. But not in 2020. In 2020, when the world was ravaged by the coronavirus pandemic, the most-read articles were related

Where will you travel after the pandemic?

Have you been stuck at home, dreaming up the next big trip you'll take after this pandemic is over? How will you pick a really cool location to visit? Perhaps you can ask your friends for suggestions. My co-worker (and lunch buddy) John recommended the Grand Prismatic Spring. Here's a

Create a response variable that has a specified R-square value

When you perform a linear regression, you can examine the R-square value, which is a goodness-of-fit statistic that indicates how well the response variable can be represented as a linear combination of the explanatory variables. But did you know that you can also go the other direction? Given a set

2020 roundup: SAS Users YouTube channel how to tutorials

Find out the most popular SAS Users YouTube channel how to tutorials, and learn a thing or two!

Find a vector that has a specified correlation with another vector

Do you know that you can create a vector that has a specific correlation with another vector? That is, given a vector, x, and a correlation coefficient, ρ, you can find a vector, y, such that corr(x, y) = ρ. The vectors x and y can have an arbitrary number

Historical state centers of population in the US (1900-2010)

If you have plotted data on a map, you have probably tried to estimate the geographical (or visual) 'center' of map areas, to place labels there. But have you ever given any thought to the "center of population"? This is one of the myriad of statistics the US Census Bureau

How to create a Napoleon plot with Graph Template Language (GTL)

Do you need to see how long patients have been treated for? Would you like to know if a patient’s dose has changed, or if the patient experienced any dose interruptions? If so, you can use a Napoleon plot, also known as a swimmer plot, in conjunction with your exposure

SAS/ACCESSのご紹介とSnowflakeとの連携デモ

01. はじめに 最近多くの人々がクラウド環境をベースにしたデータストレージサービスを利用しています。 ここで皆さん、突然ですが、データを管理するためにローカル(またはオンプレミス)環境を構築していた過去を振り返ってみてください。 以前は、データを保存するために、関連ソフトウェアやハードウェアを購入・設置・インストールし、様々な環境設定を行います。３か月後、データの量が増えてきてデータベースの容量が足りなくなります。そしてまた多くの費用と時間を使って、必要なソフトウェア・ハードウェアを再び購入、同じく様々な環境設定をします。 上記に記載したような様子は現在のビジネス世界ではほとんど見当たりません。今日必要なのは、ただメールアドレスとクレジットカードのみです。最近では様々なデータストレージサービスが生まれてきたからです。このようなサービスはクラウド環境で動いていて、一定期間料金を支払えば利用できる「subscription」(サブスクリプション)ベースであり、前払い方式ではなく、使用した分だけ課金される「pay as you go」(ペイアズユーゴー)方式が特徴です。SASでも様々なデータストレージサービスに対応していますが、今日はその情報について詳しくお伝えします。 02. SAS/ACCESSのご紹介 「SAS/ ACCESS」とは、SASと他のベンダーのデータストレージサービスを連携するインターフェースです。下記のような特徴があり、様々なデータストレージサービスとの連携を支援しています。 シームレスで透過的なデータアクセス (Seamless, transparent data access) 柔軟なクエリ言語のサポート (Flexible query language support) パフォーマンスチューニングオプション (Performance tuning options) 性能最適化機能 (Optimization features for better performance) より詳しい情報はこちらをご参照ください。 様々なデータストレージベンダーの中で、今回は「SAS/ACCESS INTERFACE TO SNOWFLAKE」を使って「Snowflake」というサービスに連携してみたいと思います。* Snowflakeの設定はこちらを見て事前に行いました。 3. SAS/ACCESSデモ 3-1. LIBNAME statementで連携 SASのLIBNAME statementで簡単にSnowflakeとの連携を行うことができます。連携することでSnowflakeのデータをDATA StepやSASプロシージャで参照することが可能になります。LIBNAME Statementのサンプルコードは下記のボックスをご参考ください。 LIBNAME

Horn's method: A simulation-based method for retaining principal components

One purpose of principal component analysis (PCA) is to reduce the number of important variables in a data analysis. Thus, PCA is known as a dimension-reduction algorithm. I have written about four simple rules for deciding how many principal components (PCs) to keep. There are other methods for deciding how

Understanding the SAS Hot Fix Analysis, Download and Deployment Tool Report

Passionate about helping SAS customers, Sandy Gibbs of Technical Support sheds light on the SASHFADD tool report. This is the second of three posts on our hot-fix process.

Append and Replace Records in a CAS Table

In my previous blog post, I talked about using PROC CAS to accomplish various data preparation tasks. Since then, my colleague Todd Braswell and I worked through some interesting challenges implementing an Extract, Transform, Load (ETL) process that continuously updates data in CAS. (Todd is really the brains behind getting

How to score a logistic regression model that was not fit by PROC LOGISTIC

A SAS customer asked a great question: "I have parameter estimates for a logistic regression model that I computed by using multiple imputations. How do I use these parameter estimates to score new observations and to visualize the model? PROC LOGISTIC can do the computation I want, but how do

Accessing Google Cloud Storage (GCS) with SAS Viya

The story goes on to the tune of 90 percent of available data today has been created in the last two years! As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem,

Automate the placement of reference lines in PROC SGPLOT

The REFLINE statement in PROC SGPLOT is one of my favorite ways to augment statistical graphics such as scatter plots, series plots, and histograms. The REFLINE statement overlays a vertical or horizontal reference line on a graph. You can specify the location of the reference lines on the REFLINE statement.

How to organize your SAS projects in Git

As you begin managing your SAS code and projects in Git, here are a few guidelines for how to organize your work and collaborate with others.

Moving from SAS Enterprise Guide to SAS Studio

If you're a SAS Enterprise Guide user who is looking to move to SAS Studio, there is a lot to like about your new coding environment.

Robust statistics for skewness and kurtosis

Intuitively, the skewness of a unimodal distribution indicates whether a distribution is symmetric or not. If the right tail has more mass than the left tail, the distribution is "right skewed." If the left tail has more mass, the distribution is "left skewed." Thus, estimating skewness requires some estimates about

[SAS로 딥러닝 시작하기#2]딥러닝 성능 개선 방법 '하이퍼파라미터 튜닝'

지난 딥러닝 시리즈에서는 SAS Visual Data Mining and Machine Learning을 활용한 딥 러닝 모델 생성에 대한 내용 중 <기본 심층 신경망(DNN) 모델 아키텍처와 배치 정규화를 사용한 DNN 모델 구축>에 대해 소개해 드렸습니다. 이번 시리즈에서는 딥 러닝 성능을 개선할 수 있는 하이퍼파라미터를 조정에 대해 소개해 드립니다. 일정 기간에 걸쳐 성능이 향상되고

Removing repeated characters in SAS strings

SAS' Leonid Batkhan explains the data cleansing task of removing unwanted repeated characters in SAS character variables.

The expected value of the tail of a distribution

The expected value of a random variable is essentially a weighted mean over all possible values. You can compute it by summing (or integrating) a probability-weighted quantity over all possible values of the random variable. The expected value is a measure of the "center" of a probability distribution. You can

Tips to simulate binary and categorical variables

When there are two equivalent ways to do something, I advocate choosing the one that is simpler and more efficient. Sometimes, I encounter a SAS program that simulates random numbers in a way that is neither simple nor efficient. This article demonstrates two improvements that you can make to your

Coronavirus: per million, per 100k, or percent?

A user commented on one of my previous maps ... "How can there be 820 cases of Coronavirus per 100,000 people? - There aren't even 100,000 people in my county!" Well, when you want to compare something like the number of COVID-19 cases between two areas that have differing populations,

Do low mortgage rates bring you joy(plots)?

When it comes to plotting mortgage rate data, I often look to Len Kiefer for inspiration. He recently posted a retro-looking graph on twitter that caught my eye ... and of course I had to see if I could create something similar using SAS. For lack of a better term,

Confidence intervals for eigenvalues of a correlation matrix

A fundamental principle of data analysis is that a statistic is an estimate of a parameter for the population. A statistic is calculated from a random sample. This leads to uncertainty in the estimate: a different random sample would have produced a different statistic. To quantify the uncertainty, SAS procedures