Blogs

Blogs

Data Visualization

Get the right information, with visual impact, to the people who need it

Data Visualization | Programming Tips

Rick WicklinApril 1, 2024 0

Add a second axis to a SAS graph

Recently, I saw a scatter plot that displayed the ticks, values, and labels for a vertical axis on the right side of a graph. In the SGPLOT procedure in SAS, you can use the Y2AXIS option to move an axis on the right side of a graph. Similarly, you can

Read More

Data Visualization | Learn SAS

Rick WicklinFebruary 28, 2024 0

Using colors to visualize groups in a bar chart in SAS

I sometimes see analysts overuse colors in statistical graphics. My rule of thumb is that you do not need to use color to represent a variable that is already represented in a graph. For example, it is redundant to use a continuous color ramp to represent the lengths of bars

Read More

Analytics | Data Visualization | Programming Tips

Chris Hemedinger

Chris HemedingerFebruary 27, 2024 0

Visualized: US Currency in circulation, past and present

This phenomenon has been in the news recently, so I've updated this article that I originally published in 2017. The paper currency in circulation in the US is mostly $100 bills. And not just by a little bit -- these account for 34% of the notes by denomination and nearly

Read More

Data Visualization | Programming Tips

Peter Styliadis

Peter StyliadisJanuary 19, 2024 0

My Family of Four's Monthly Water Usage (Gallons) Compared to the Town of Cary's Average

Have you ever been curious about your monthly water consumption and how it compares to others in your community? Recently, I had this question and decided to get ahold of my family's water usage data for analysis. Harnessing the power of data visualization, I compared my family of four's monthly

Read More

Analytics | Data Visualization | Learn SAS | Programming Tips

Rick WicklinJanuary 3, 2024 0

Top 10 posts from The DO Loop in 2023

In 2023, I wrote 90 articles for The DO Loop blog. My most popular articles were about SAS programming, data visualization, and statistics. In addition, several "general interest" articles were popular, including my article for Pi Day and an article about AI chatbots. If you missed any of these articles,

Read More

Advanced Analytics | Analytics | Data Management | Data Visualization | Learn SAS | Students & Educators | Work & Life at SAS

Adriana RojasDecember 19, 2023 0

"Cada vez existen más asignaturas vinculadas a temas analíticos en todos los sectores”

La información certera es la base sobre la que se edifican las empresas, especialmente en un contexto en el que la preparación y la resiliencia son cada vez más importantes. Con el aumento en la cantidad de datos disponibles y la necesidad de aprovecharlos para tener mejores resultados, también hemos

Read More

SAS Spain | Spanish

Data Visualization | SAS Administrators

soyongyunDecember 13, 2023 0

알아 두면 유용한 SAS Viya 4의 편리한 기능 – Logging & Monitoring

클라우드 기반 AI 분석 플랫폼인 SAS Viya 4에는 여러 가지 유용한 기능이 있습니다. 이번 글에서는 SAS Viya 4를 위한 Logging & Monitoring 기능에 대해 소개 드리겠습니다. 1. Logging & Monitoring 이란 무엇인가? Logging과 Monitoring은 해석 그대로, 해당 서비스에 대한 로그 기록과 상태를 시각적으로 표시해주는 것을 의미합니다. 기존 SAS Viya

Read More

Advanced Analytics | Analytics | Artificial Intelligence | Data Management | Data Visualization | Machine Learning | SAS Administrators

小林泉December 8, 2023 0

データ分析プロセス全体を管理～自己組織的に育てるナレッジのカタログ化とは

自己組織化とは、自然界において個体が全体を見渡すことなく個々の自律的なふるまいをした結果、秩序だった全体を作り出すこと 2010年から存在した解決アイディアがついに実現可能に今から遡ること十数年前の2010年頃、支援をしていた大手製造業の会社ではすでにデータ分析スキルの社員間でのばらつきと組織全体のスキルの向上、データ分析作業の生産性の向上、人材のモビリティへの耐性としてのデータ分析業務の標準化が課題となっていました。当時ご相談をいただいた私を含むSASの提案チームは、SASが提供するアナリティクス•ライフサイクル•プラットフォームを活用することで、その問題を支援できることがすぐにわかりました。つまり、ビジネス課題から始まり、利用データ、データ探索による洞察、データ加工プロセス、予測モデリングプロセス、モデル、そしてそれをアプリケーションに組み込むディシジョンプロセスという、一連のアナリティクス•ライフサイクルにまたがるすべての作業を電子的に記録し、全体のプロセスそのものをモデリングし、利活用することで、自己組織的にナレッジが蓄積され、且つ活用されるということです。しかし、当時のSASだけではない周辺のIT環境、すなわちPCやアプリケーションアーキテクチャなどのインフラ、データの所在、セキュリティ管理などがサイロ化していること、またSAS以外のModelOps環境もシステムごとにアーキテクチャがバラバラすぎたこと、また、お客様社内のデータリテラシーそのものもまだ課題が多かったため、SASを中心としても、実現にはあまりにも周辺の開発コストがかかりすぎたために、提案を断念しました。時代は変わり昨今、クラウド技術の採用およびそれに伴うビジネスプロセスの変革と標準化が急速に進んでいます。それに歩調を合わせるように、SASの製品も、上記の当時から市場をリードしてきたMLOpsフレームワークをDecisionOpsへと昇華させ、クラウド技術を最大活用すべく、クラウドネイティブなアーキテクチャおよび、プラットフォームとしての一貫性と俊敏性を高めてきました。そしてついに最新版のSAS Viyaでは、アナリティクスライフサイクル全体にわたり、データからデータ分析プロセス全体の作業を電子的に記録し、管理し、活用することが可能となりました。自己組織的にナレッジを蓄積活用するデータ分析資産のガバナンス昨今のデータマネージメントの取り組みの課題詳しくはこちらのブログをご参照いただきたいのですが、多くのケースで過去と同じ過ちを繰り返しています。要約すると、データ分析文化を醸成したい、セルフサービス化を広めたいという目的に対しては、ある1時点のスナップショットでの完成を目的としたデータカタログやDWH/DMのデータモデル設計は問題の解決にはならないということです。必ず5年後にまた別の担当者やプロジェクトが「これではデータ分析しようにもどのデータを使えばわからない、問題だ、整備しよう」となります。では解決策はなんでしょうか。静的な情報を管理したり整備するのではなく、日々変わりゆく、どんどん蓄積され、評価され、改善、進化し続ける、データ分析業務に関わるすべての情報を記録統制することです。つまり、以下の三つのポイントを実現することです。各ポイントの詳細は後段でご紹介しています。ポイント①あらゆるデータ分析資産（ナレッジ）を管理ポイント②データ品質管理の自動化・省力化とガバナンスポイント③社内ソーシャルの力による自己組織的情報の蓄積まずは、それぞれが何を意味しているかを説明する前に、これらを実現するとどのような世界になるのかをユーザーの声によって示してみたいと思います。個々の自由にデータ分析をしているユーザーによる行動を記録することで、全体を見渡している誰かがヒアリングや調査をして情報を管理することなく、データ分析がどのように行われているかを管理・共有・再利用が可能となるのです。誰が、どのような目的で、どのデータを、どのように使用したのか、そしてその結果はどうだったのか？このアプリケーションの出した判定結果の説明をする必要がある。このモデルは誰が作ったのか？どのような学習データを使用したのか？どのようなモデリングプロセスだったのか？よく使用されるデータはどれか？　そのデータはどのように使用すれば良いのか？注意事項はなにか？データ分析に長けた人は誰か？誰が助けになってくれそうか？企業全体のデータ品質はどのようになっているか？　データ品質と利用パターンのバランスは適切か？誤った使い方をしているユーザーはいないか？など従来、社内勉強会を開催したり、詳しい人を探し出してノウハウを聞いたり、正しくないことも多い仕様書をひっくり返してみたり、そのようにして時間と労力をかけて得られていたデータ分析を自律的に行う際に重要となる社内ナレッジが、自己組織的に形成されるということです。「情報資産カタログ」とは～一般的な「データカタログ」との違いこのような世界を実現する機能をSASでは、「情報資産カタログ」と呼んでいます。データ分析プロセス全体を管理・検索・関連付け・レポートできるようにするテクノロジーです。一般的に言われる、また多くの失敗の原因になる、「データカタログ」と対比するとその大きな違いが見えてきます。こちらのブログでも述べましたが、データ分析者がセルフサービスでデータ分析を実践したり、初学者がなるべく自分自身で情報収集して、まずは標準的なデータ分析作業をマスターしたりするためには、既存ナレッジを活用する必要があります。一方で、そのようなナレッジは従来一部の優秀なデータ分析者に聞かないとわからなかったり、あるいはITシステム部門に質問して回答までに長い時間を要してビジネス機会を逸してしまう、という結果を招いていました。既存ナレッジとは、どのようなデータを、どのような意図で、どのような目的で、どのように使い、どのようなアウトプットを得たかという一連の「考え方とやり方」であり、これは管理者が一時的にデータ分析者にヒアリングして「データカタログ」を整備して終わり、というものではなく、日々データ分析者たちの中で自律的に情報が作られていくものです。ポイント①あらゆるデータ分析資産（ナレッジ）を管理 SAS Viyaでは、上述のアナリティクスライフサイクル各ステップのオブジェクトがすべて一元的に記録・管理されます。日々、新しく作られるレポート、データ加工プロセス、作成されるデータマートの情報が、自動的に管理され検索対象になっていきます。このようにアナリティクス・ライフサイクルの各ステップをすべて管理することで、データ、そのデータを使用しているレポート、そのデータを使用しているデータ加工フロー、その出力データ、さらにはそれを学習データとして使用している予測モデリングプロセスと作成されたモデル、これらを関連付けて見ることが可能となります。それにより例えば、ある目的に使用するデータを探している場合、参考にする業務名やプロジェクト名で検索をすることで、関連するレポートや、データ加工プロセスにたどり着き、そこから使用データやそのデータの使い方にたどり着くという効率的な情報の探し方が可能となります。もちろん、この機能は昔からあるインパクト・アナリシス機能として、ITシステム部門が、データへの変更の影響調査ツールとして使用することも可能です。ポイント②データ品質管理の自動化・省力化とガバナンスデータ分析を組織的に行う際に気にすべきポイントの一つは、その正確性です。正しいマスターデータを使用しているか、適切な品質のデータを使用しているかは、最終的なアクションや意思決定の精度すなわち収益に影響します。また、結果に対する説明責任を果たすうえでもアクションに使用したデータの品質は属人的ではなく、組織的に管理されている必要があります。またデータ品質を組織的に管理することにより、データ分析の最初に行っていた品質確認という作業が省力化できます。また、属人的に行っていた品質確認作業も標準化されるため、組織全体のデータ分析作業の品質が向上します。あるお客様では、DWHに格納するデータのETL処理において施すべき処理が実施されていないというミスがあるものの、データの数やETL処理があまりにも多いためそのミスを発見することが困難であるという状況にありました。網羅的な品質管理および品質レポートによってそのようなミスの発見が容易になります。ポイント③社内ソーシャルの力による自己組織的情報の蓄積前述のポイント①により基本的にはデータ分析者個人個人の自律的な活動が自動的に記録され、自己組織的に組織全体のナレッジとて蓄積され共有・再利用可能な状態が作られます。これは、データ分析者個人個人が特に意識しなくても自動的に実現できます。それに加えて、さらに意識的にこのプラットフォームを利用することで、蓄積されるナレッジに深みが増します。例えば、あるビジネス課題をデータ分析で解決使用する場合のスタートは、「問い」です。上述のアナリティクス・ライフサイクルの一番左のスタートにあるものです。その際には、仮説設定をするためや仮説を検証する目的で、様々な角度から「データ探索」を行います。この初期のデータ探索プロセスは、その後のデータ加工やモデリングの根拠になっているため、ナレッジとしてまた説明責任の材料としてはとても重要になります。必ずしも最終的に使用したデータと同じデータを使うとも限らないので、自動的には他のデータ分析資産とは関連づきません。そのような探索プロセスも下記の図のように、同じプロジェクトフォルダに保存しておくことで、関連オブジェクトとして活用することが可能となります。また、プロアクティブに自信が使用したデータやレポートにコメントや評価を付与することで、より価値の高いナレッジへと育つことになります。昨今企業内SNSなどで、オフィスツールの使い方などノウハウを共有をされている企業・組織もあるかと思います。それを全社規模のアナリティクス・プラットフォームで行うことで、データ分析に関わるナレッジをユーザー同士で培っていくイメージです。まとめ「このデータはこの目的に使えますか？」「あ、それはこの情報がないので使えないんですよ。こちらのデータを私は使ってますよ」データ分析者の間でよく交わされる会話です。この問いにいかに迅速に答えられるかが、データ分析の効率性と正確性を高めます。「情報資産カタログ」はまさにこの問いに答えるための機能なのです。

Read More

Data Visualization | Learn SAS

Rick WicklinDecember 6, 2023 0

10 tips for creating effective statistical graphics

These are a few of my favorite things. —Maria in The Sound of Music For my annual Christmas-themed post, I decided to forgo fractal Christmas trees and animated greeting cards and instead present a compilation of some of my favorite data visualization tips for advanced SAS users. Hopefully, this

Read More

Artificial Intelligence | Data Visualization | Machine Learning

Tom SaboDecember 4, 2023 0

Text analytics: A recipe for food safety success

As millions of people party and eat their way through the season of overindulgence, they should feel confident that indigestion and a few extra pounds should be the only downsides to their feasting. Thanks to countless hours of work behind the scenes by food inspectors and public health officials, diners

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinNovember 27, 2023 0

An example of finite-precision issues in a simple collinearity algorithm

The collinearity problem is to determine whether three points in the plane lie along a straight line. You can solve this problem by using middle-school algebra. An algebraic solution requires three steps. First, name the points: p, q, and r. Second, find the parametric equation for the line that passes

Read More

Data Visualization | Programming Tips

Rick WicklinNovember 20, 2023 0

Data visualization tip: Plot rates, not counts

Plot rates, not counts. This maxim is often stated by data visualization experts, but often ignored by practitioners. You might also hear the related phrases "plot proportions" or "plot percentages," which mean the same thing but expresses the idea alliteratively. An example in a previous article about avoiding alphabetical ordering

Read More

Data Visualization | Learn SAS

Rick WicklinNovember 13, 2023 0

Tip: Avoid alphabetical order for a categorical axis in a graph

Howard Wainer, who used to write the "Visual Revelations" column in Chance magazine, often reminded his readers that "we are almost never interested in seeing Alabama first" (2005, Graphic Discovery, p. 72). His comment is a reminder that when we plot data for a large number of categories (states, countries,

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinNovember 8, 2023 0

4 ways to display an inset that contains statistics on a SAS graph

Sometimes it is helpful to display a table of statistics directly on a graph. A simple example is displaying the number of observations and the mean or median on a histogram. In SAS, the term inset is used to describe a table that is displayed on a graph. This article

Read More

Data Visualization | Programming Tips

Cindy WangOctober 23, 2023 0

How to draw a radar chart in SAS® Visual Analytics using a custom graph – Part II

In the second of a two-part series, SAS' Cindy Wang reveals how to create a custom graph template in SAS Graph Builder that can be rendered as a radar chart in SAS Visual Analytics.

Read More

Data Visualization | Programming Tips

Cindy WangOctober 19, 2023 0

How to draw a radar chart in SAS® Visual Analytics using a custom graph – Part I

In the first of a two-part series, SAS' Cindy Wang shows you how to create a radar chart in SAS Visual Analytics using custom graph capabilities.

Read More

Data Visualization | Learn SAS

Cindy WangSeptember 28, 2023 0

Dynamic calculations for xmr control data in SAS® Visual Analytics

SAS' Cindy Wang, inspired by a SAS Support Community post, reveals how to perform dynamic calculations for an xmr control chart.

Read More

Data Visualization | Programming Tips

Rick WicklinSeptember 20, 2023 0

Avoid domain errors by using Taylor series

The other day I was trying to numerically integrate the function f(x) = sin(x)/x on the domain [0,∞). The graph of this function is shown to the right. In SAS, you can use the QUAD subroutine in SAS IML software to perform numerical integration. Some numerical integrators have difficulty computing

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinSeptember 18, 2023 0

Use PROC SGPLOT to embed a graph inside another graph

Did you know that you can embed one graph inside another by using PROC SGPLOT in SAS? A typical example is shown to the right. The large graph shows kernel density estimates for the distribution of the Cholesterol variable among male and female patients in a heart study. The small

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinAugust 28, 2023 0

Generate random uniform points in an ellipse

I have previously written about how to efficiently generate points uniformly at random inside a sphere (often called a ball by mathematicians). The method uses a mathematical fact from multivariate statistics: If X is drawn from the uncorrelated multivariate normal distribution in dimensiond, then S = r*X / ||X|| has

Read More

Analytics | Data Visualization

Anna PriolettiJuly 26, 2023 0

Using Word with SAS for Microsoft 365 as a Marketing Maven

SAS integration with Microsoft Word empowers you to streamline your company’s data analysis, enhance reporting capabilities, and unlock valuable marketing insights, all with the power of SAS Viya.

Read More

Analytics | Data Visualization

Renato LuppiJuly 19, 2023 0

Using PowerPoint with SAS for Microsoft 365 as a Sales Storyteller

From within PowerPoint, you can use the SAS menu available through SAS for Microsoft 365, to access SAS Visual Analytics reports' graphical visuals.

Read More

Analytics | Data Visualization

Briana UllmanJuly 12, 2023 0

Using Outlook with SAS for Microsoft 365 as a Creative Collaborator

By simply logging into SAS Viya from Outlook, data visualizations can give you immediate information about key metrics and business performance through AI-powered explanations.

Read More

Analytics | Data Visualization

Sasha Karpinski

Sasha KarpinskiJuly 5, 2023 0

Using Excel with SAS for Microsoft 365 as a Data Detective

Using SAS for Microsoft 365, you can enhance your Excel spreadsheets with additional insights from SAS Viya via one seamless integrated experience.

Read More

Advanced Analytics | Analytics | Artificial Intelligence | Data Visualization

Aunque no es un único factor y muchas veces ni siquiera es el más importante, la tecnología juega como uno de los principales vehículos para desatar la innovación en los diferentes equipos e iniciativas de negocio. Una de las principales tecnologías que está apalancando esta innovación corporativa es la analítica en la nube.

Ivan Fernando Herrera

Ivan Fernando HerreraJune 29, 2023 0

IA en la nube: una nueva era empresarial

Con un tamaño estimado de más de 619 mil millones de dólares para 2023, la computación en la nube es un mercado que cada año crece en tamaño y complejidad, pero que al mismo tiempo aumenta sus posibilidades gracias a las soluciones analíticas y de Inteligencia Artificial que ayudan a

Read More

Communications | Education

Data Visualization | Learn SAS

Rick WicklinMay 30, 2023 0

How to use a log-scale on a histogram

Real-world data often exhibits extreme skewness. It is not unusual to have data span many orders of magnitude. Classic examples are the distributions of incomes (impoverished and billionaires) and population sizes (small countries and populous nations). The readership of books and blog posts show a similar distribution, which is sometimes

Read More

Data Visualization | Learn SAS

Rick WicklinMay 24, 2023 0

How does PROC SGPLOT position labels for polygons?

Labeling objects in graphs can be difficult. SAS has a long history of providing support for labeling markers in scatter plots and for labeling regions on a map. This article discusses how the SGPLOT procedure decides where to put a label for a polygon. It discusses the advantages and disadvantages

Read More

Advanced Analytics | Data Visualization | Programming Tips

Rick WicklinMay 17, 2023 0

Compute the silhouette statistic in SAS

A previous article defines the silhouette statistic (Rousseeuw, 1987) and shows how to use it to identify observations in a cluster analysis that are potentially misclassified. The article provides many graphs, including the silhouette plot, which is a bar chart or histogram that displays the distribution of the silhouette statistic

Read More

Advanced Analytics | Data Visualization

Time series of the WTI oil prices visualization

Kevin ScottMay 16, 2023 0

The Empirical Mode Decomposition for handling non-stationary time series

Empirical Mode Decomposition (EMD) is a powerful time-frequency analysis technique that allows for the decomposition of a non-stationary and non-linear signal into a series of intrinsic mode functions (IMFs). The method was first introduced by Huang et al. in 1998 and has since been widely used in various fields, such as signal processing, image analysis, and biomedical engineering.

Read More

Analytics | Data Visualization

Rick WicklinMay 15, 2023 0

What is the silhouette statistic in cluster analysis?

Assigning observations into clusters can be challenging. One challenge is deciding how many clusters are in the data. Another is identifying which observations are potentially misclassified because they are on the boundary between two different clusters. Ralph Abbey's 2019 paper ("How to Evaluate Different Clustering Results") is a good way

Read More

1 2 3 … 60 Next