Blogs

Blogs

Search Results: ellipse (50)

Analytics | Data Visualization | Programming Tips

Rick WicklinAugust 28, 2023 0

Generate random uniform points in an ellipse

I have previously written about how to efficiently generate points uniformly at random inside a sphere (often called a ball by mathematicians). The method uses a mathematical fact from multivariate statistics: If X is drawn from the uncorrelated multivariate normal distribution in dimensiond, then S = r*X / ||X|| has

Read More

Rick WicklinJuly 23, 2014 0

Computing prediction ellipses from a covariance matrix

In a previous blog post, I showed how to overlay a prediction ellipse on a scatter plot in SAS by using the ELLIPSE statement in PROC SGPLOT. The ELLIPSE statement draws the ellipse by using a standard technique that assumes the sample is bivariate normal. Today's article describes the technique

Read More

Rick WicklinJuly 21, 2014 0

Add a prediction ellipse to a scatter plot in SAS

It is common in statistical graphics to overlay a prediction ellipse on a scatter plot. This article describes two easy ways to overlay prediction ellipses on a scatter plot by using SAS software. It also describes how to overlay multiple prediction ellipses for subpopulations. What is a prediction ellipse? A

Read More

Learn SAS

Rick WicklinFebruary 7, 2024 0

The elliptical heart

Some hearts are famous. For example, there is the "Heart of Gold" (Neil Young), the "Heart of Glass" (Blondie), and the Heart of Darkness (Joseph Conrad). But have you heard of the "Heart of Ellipses"? No? Well, in 2023, Ted Conway published an amusingly titled article, "Total Ellipse of the

Read More

Learn SAS | Programming Tips

Rick WicklinApril 24, 2023 0

Venn diagrams that illustrate relationships between sets

A previous article discusses how to compute the union, intersection, and other subsets of a pair of sets. In that article, I displayed a simple Venn diagram (reproduced to the right) that illustrates the intersection and difference between two sets. The diagram uses a red disk for one set, a

Read More

Analytics | Learn SAS

Rick WicklinApril 3, 2023 0

Why use rank correlation?

A previous article discusses rank correlation and lists some advantages of using rank correlation. However, the article does not show examples where an analyst might prefer to report the rank correlation instead of the traditional Pearson product-moment correlation. This article provides three examples where the rank correlation is a better

Read More

Analytics | Learn SAS | Programming Tips

Rick WicklinMarch 22, 2023 0

Partial correlation: controlling for confounding variables

A data analyst wanted to estimate the correlation between two variables, but he was concerned about the influence of a confounding variable that is correlated with them. The correlation might affect the apparent relationship between main two variables in the study. A common confounding variable is age because young people

Read More

Analytics

Ryosuke Horiuchi (堀内亮佑)September 20, 2022 0

自然言語処理とSAS (3)

こんにちは！SAS Institute Japanの堀内です。今回も自然言語処理について紹介いたします。前回の投稿では、実際にSASを使って日本語の文章を扱う自然言語処理の例を解説しました。最終回の本投稿ではその応用編として、自然言語処理の代表的なタスクとSASによる実装方法を紹介します。なお、ここでいうタスクとは「定式化され一般に共有された課題」といった意味になります。自然言語処理には複数のタスクがあり、タスクごとに、共通する部分はあるとはいえ、問題解決のアプローチ方法は基本的に大きく異なります。SASには各タスクごとに専用のアクションセット1が容易されています。要約タスクその名の通り文章を要約するタスクです。SASではtextSummarizeアクションセットで対応可能です。ここでは、NHKのニュース解説記事「気になる頭痛・めまい　天気が影響？対処法は？」(https://www.nhk.or.jp/kaisetsu-blog/700/471220.html) の本文を５センテンスで要約してみましょう。 import swat conn = swat.CAS('mycashost.com', 5570, 'username', 'password') conn.builtins.loadActionSet(actionSet='textSummarization') conn.textSummarization.textSummarize(addEllipses=False, corpusSummaries=dict(name='corpusSummaries', compress=False, replace=True), documentSummaries=dict(name='documentSummaries', compress=False, replace=True), id='Id', numberOfSentences=5, table={'name':CFG.in_cas_table_name}, text='text', useTerms=True, language='JAPANESE') conn.table.fetch(table={'name': 'corpusSummaries'}) numberOfSentencesで要約文のセンテンス数を指定しています。結果は以下の通りです。 'まず体調の変化や天気、気温・湿度・気圧などの日記をつけ、本当に天気が影響しているのか、どういうときに不調になるのかパターンを把握すると役立ちます。気温・湿度以外にも、気圧が、体調の悪化や、ときに病気の引き金になることもあります。私たちの体は、いつも耳の奥にある内耳にあると言われている気圧センサーで、気圧の変化を調整しています。ただ、天気の体への影響を研究している愛知医科大学佐藤客員教授にお話ししを伺ったところ、「台風最接近の前、つまり、気圧が大きく低下する前に、頭が痛いなど体調が悪くなる人は多い」ということです。内耳が敏感な人は、わずかな気圧の変化で過剰に反応し、脳にその情報を伝えるので、脳がストレスを感じ、体のバランスを整える自律神経が乱れ、血管が収縮したり、筋肉が緊張するなどして、その結果、頭痛・めまいなどの体に様々な不調につながっているのです。' 重要なセンテンスが抽出されていることが分かります。テキスト分類タスク文章をいくつかのカテゴリに分類するタスクです。その内、文章の印象がポジティブなのかネガティブなのか分類するものをセンチメント分析と呼びます。ここでは日本語の有価証券報告書の文章をポジティブかネガティブか判定してみます。使用するデータセットは以下になります。 https://github.com/chakki-works/chABSA-dataset （なお、こちらのデータセットには文章ごとにポジティブかネガティブかを示す教師ラベルは元々付与されておりませんが、文章内の特定のフレーズごとに付与されているスコアを合算することで教師ラベルを合成しております。その結果、ポジティブ文章は1670文章、ネガティブ文章は1143文章、合計2813文章になりました。教師ラベルの合成方法詳細はこちらのブログをご覧ください。） pandasデータフレームにデータを格納した状態を確認してみましょう。 df = pd.read_csv(CFG.local_input_file_path) display(df)

Read More

Analytics | Data Visualization | Programming Tips

Rick WicklinJune 29, 2022 0

Compute the multivariate t density function

A previous article shows how to compute the probability density function (PDF) for the multivariate normal distribution. In a similar way, you can compute the density function for the multivariate t distribution. This article discusses the density function for the multivariate t distribution, shows how to compute it, and visualizes

Read More

Learn SAS | Programming Tips

Rick WicklinMarch 14, 2022 0

The pi_th roots of unity

On this Pi Day, let's explore the "πth roots of unity." (Pi Day is celebrated in the US on 3/14 to celebrate π ≈ 3.14159....) It's okay if you've never heard of the πth roots of unity. This article starts by reviewing the better-known nth roots of unity. It then

Read More

Learn SAS | Programming Tips

Rick WicklinFebruary 7, 2022 0

Billiards on an elliptical table

I recently showed how to find the intersection between a line and a circle. While working on the problem, I was reminded of a fun mathematical game. Suppose you make a billiard table in the shape of a circle or an ellipse. What is the path for a ball at

Read More

Analytics | Programming Tips

Rick WicklinDecember 6, 2021 0

The expected number of points on a convex hull

While discussing how to compute convex hulls in SAS with a colleague, we wondered how the size of the convex hull compares to the size of the sample. For most distributions of points, I claimed that the size of the convex hull is much less than the size of the

Read More

Analytics | Data Visualization

Predicted probabilities for a logistic regression model

Rick WicklinNovember 18, 2020 0

Create scoring data when regressors are correlated

To help visualize regression models, SAS provides the EFFECTPLOT statement in several regression procedures and in PROC PLM, which is a general-purpose procedure for post-fitting analysis of linear models. When scoring and visualizing a model, it is important to use reasonable combinations of the explanatory variables for the visualization. When

Read More

Advanced Analytics | Machine Learning | Programming Tips

Rick WicklinJuly 23, 2020 0

Fit a multivariate Gaussian mixture model by using the expectation-maximization (EM) algorithm

Last month a SAS programmer asked how to fit a multivariate Gaussian mixture model in SAS. For univariate data, you can use the FMM Procedure, which fits a large variety of finite mixture models. If your company is using SAS Viya, you can use the MBC or GMM procedures, which

Read More

Learn SAS | Programming Tips

Rick WicklinJuly 20, 2020 0

Compute within-group multivariate statistics and store them in a list

I recently showed how to compute within-group multivariate statistics by using the SAS/IML language. However, a principal of good software design is to encapsulate functionality and write self-contained functions that compute and return the results. What is the best way to return multiple statistics from a SAS/IML module? A convenient

Read More

Advanced Analytics | Programming Tips

Rick WicklinJuly 15, 2020 0

How to evaluate the multivariate normal log likelihood

The multivariate normal distribution is used frequently in multivariate statistics and machine learning. In many applications, you need to evaluate the log-likelihood function in order to compare how well different models fit the data. The log-likelihood for a vector x is the natural logarithm of the multivariate normal (MVN) density

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJuly 1, 2020 0

Pooled, within-group, and between-group covariance matrices

A previous article discusses the pooled variance for two or groups of univariate data. The pooled variance is often used during a t test of two independent samples. For multivariate data, the analogous concept is the pooled covariance matrix, which is an average of the sample covariance matrices of the

Read More

Analytics | Data Visualization | Learn SAS

Rick WicklinJune 19, 2019 0

Influential observations in a linear regression model: The DFFITS and Cook's D statistics

A previous article describes the DFBETAS statistics for detecting influential observations, where "influential" means that if you delete the observation and refit the model, the estimates for the regression coefficients change substantially. Of course, there are other statistics that you could use to measure influence. Two popular ones are the

Read More

Data Visualization | Programming Tips

Robert AllisonApril 25, 2019 0

Polar graph remix - using sgplot (no gtl/sgrender!)

A few years ago Sanjay showed how to create a polar graph by creating a gtl template, and then plotting it using Proc SGRender. These days, Proc SGPlot has all the functionality you need to create this graph, therefore I've rewritten the example to just use SGPlot. And while I

Read More

Analytics | Data Visualization

Rick WicklinMarch 27, 2019 0

How to simulate multivariate outliers

In simulation studies, sometimes you need to simulate outliers. For example, in a simulation study of regression techniques, you might want to generate outliers in the explanatory variables to see how the technique handles high-leverage points. This article shows how to generate outliers in multivariate normal data that are a

Read More

Programming Tips

Schematic diagram of outliers in bivariate normal data. The point 'A' has large univariate z scores but a small Mahalanobis distance. The point 'B' has a large Mahalanobis distance. Only 'b' is a multivariate outlier.

Rick WicklinMarch 25, 2019 0

The geometry of multivariate versus univariate outliers

An important concept in multivariate statistical analysis is the Mahalanobis distance. The Mahalanobis distance provides a way to measure how far away an observation is from the center of a sample while accounting for correlations in the data. The Mahalanobis distance is a good way to detect outliers in multivariate

Read More

Data Visualization | Learn SAS

Rick WicklinDecember 3, 2018 0

5 tips for customizing legends in PROC SGPLOT in SAS

When a graph includes several markers or line styles, it is often useful to create a legend that explains the relationship between the data and the symbols, color, and line styles in the graph. The SGPLOT procedure does a good job of automatically creating and placing a legend for most

Read More

Data Visualization | Learn SAS | Programming Tips

Sanjay MatangeNovember 1, 2018 0

Text plot can do that

The TEXT plot was introduced with SAS 9.4M2 to facilitate placement of text strings in a graph. This replaces the MARKERCHAR feature of the SCATTER plot statement, which is still available, but it is better to use TEXT plot in most cases. The syntax is: text x=column y=column text=column </

Read More

Data Visualization | Learn SAS | Programming Tips

Rick WicklinOctober 31, 2018 0

A trick to plot groups in PROC SGPLOT

A useful feature in PROC SGPLOT is the ability to easily visualize subgroups of data. Most statements in the SGPLOT procedure support a GROUP= option that enables you to overlay plots of subgroups. When you use the GROUP= option, observations are assigned attributes (colors, line patterns, symbols, ...) that indicate

Read More

Data Visualization | Learn SAS | Programming Tips

Sanjay MatangeOctober 30, 2017 0

Fill patterns

When a plot is classified by one or more variables, the different classes values are displayed in the graph either by position or by using different plot attributes such as color, marker shape or line pattern. For plots that display the visual by a filled area (bar, bin, band, bubble,

Read More

Government | Health Care | Insurance | Life Sciences

Analytics | Data Visualization | Programming Tips

Robert AllisonAugust 25, 2017 0

When's the next total solar eclipse?

The U.S. was really fortunate in having the recent total solar eclipse pass through so many of its states! This gave lots of people an opportunity to see it, with just a short (or moderate) drive. I think a little kid spoke for all of us when they said "Let's

Read More

Advanced Analytics

Rick WicklinDecember 7, 2016 0

Simultaneous confidence intervals for a multivariate mean

Many SAS procedure compute statistics and also compute confidence intervals for the associated parameters. For example, PROC MEANS can compute the estimate of a univariate mean, and you can use the CLM option to get a confidence interval for the population mean. Many parametric regression procedures (such as PROC GLM)

Read More

Rick WicklinNovember 30, 2016 0

Append data to add markers to SAS graphs

Do you want to create customized SAS graphs by using PROC SGPLOT and the other ODS graphics procedures? An essential skill that you need to learn is how to merge, join, append, and concatenate SAS data sets that come from different sources. The SAS statistical graphics procedures (SG procedures) enable

Read More

Rick WicklinAugust 1, 2016 0

Compute highest density regions in SAS

In a scatter plot, the regions where observations are packed tightly are areas of high density. A contour plot or heat map of a bivariate kernel density estimate (KDE) is one way to visualize regions of high density. A SAS customer asked whether it is possible to use SAS to

Read More

1 2 Next