
A close look at the first of five DevOps ideals in Gene Kim's book, “The Unicorn Project."
A close look at the first of five DevOps ideals in Gene Kim's book, “The Unicorn Project."
I was recently asked to be on the jury for the Asia-Pacific track of the 2023 SAS Hackathon. As many of my colleagues have said before, judging the hackathon is a great experience. We were exposed to lots of innovative ideas about how to apply AI and analytics to a wide
The 2023 International Congress of Actuaries (ICA2023) brought together industry leaders and actuaries worldwide to explore the challenges and opportunities for actuaries. Over the years, SAS has emerged as a variance and regression analysis software, revolutionising the actuarial field. Since then, it has evolved into SAS Viya 4.0, a cloud-native
Given a set of N points in k-dimensional space, can you find the location that minimizes the sum of the distances to the points? The location that minimizes the distances is called the geometric median of the points. For univariate data, the "points" are merely a set of numbers $${p_1,
This July I'm refreshing the previous Strengthening Your Relationship email series from 2021. The email series offers quick ideas and information within three categories: research and insight, questions for closeness, and weekend activity ideas. As I revisit this content I was reminded of the weekly meeting practice I wrote about
While writing an article about labeling a polygon by using the centroid, I almost made a false claim about the centroid. I almost claimed that that the centroid is the point in a polygon that minimizes the sum of the distances to the vertices. It is not. The point that
A colleague asked how to compute the barycentric coordinates of a point inside a triangle. Given a triangle in the plane with vertices p1, p2, and p3, every point in the triangle can be represented as a convex combination of the vertices: c1*p1 + c2*p2 + c3*p3, where c1,c2,c3 ≥
SAS' Phuong Ngo demonstrates an automated shift-left CI/CD security workflow.
Part of the power of the SAS ODS system is the ability to visualize data by using ODS templates. An ODS template describes how to render data as a table or as a graph. A lot of papers and documentation have been written about how to define a custom template
データサイエンスの使いどころ・・・攻めと守りの圧倒的な違い 以前のブログで、データ活用における攻めと守りについてお話しました。今回は小売業を例に多くのデータ活用プロジェクトが陥りやすい罠と、真の目的達成のための方法についてご紹介します。 小売業の目的はもちろん他の業種企業と変わらず、収益の最大化です。昨今データ分析を武器として売り上げの最大化、コストの削減、業務プロセスの生産性向上を目指す企業が増えてきています。時には、データサイエンティストが、データサイエンスを駆使してプロジェクトを実行しているケースもあるでしょう。 ここで、今一度現在取り組んでいる、またはこれから取り組もうとしているデータサイエンスやAI活用のプロジェクトがどんな利益を自社にもたらすのかを改めて考えてみましょう。昨今、需要予測についての相談が非常に多いので、ここでは需要予測について考えてみます。 弊社にご相談いただくケースの中で、少なくない企業が、需要予測をこのブログで言うところの「守りの意思決定」としてとらえています。多くのケースで、過去の実績をベースに将来の需要を予測することで、在庫過多や欠品を減らそうというプロジェクトに投資をしていたり、しようとしています。言い換えると、過去の実績を学習データとして、将来を予測するモデルを構築し、ひとつの将来の需要予測を作成し、それを在庫を加味したうえで、発注につなげています。 手段が目的化することで見失う可能性のある本来の目的とは 非常に典型的なAI活用、データサイエンス活用かと思いますが、実は、「AIで予測」、「機械学習で予測」といった言葉で最新のデータ活用をしているかのような錯覚に陥っているケースが見受けられます。数十年前から行われており、昨今でも同様に行われている、機械学習を用いた典型的な需要予測は、「守り」です。すなわち、どんなに多くの種類のデータを使うかどうかにかかわらず、過去の傾向が未来も続くという前提のもとに予測モデルを作成している場合には、あらかじめ定義した前提・業務プロセスの制約の下で、機会損失を最小化するために予測精度をあげているにすぎません。 つまり、そのような前提での需要予測は、小売業の収益向上という観点では、期待効果が限定的であるということです。では、最終的な収益の最大化を実現するには、何をすべきでしょうか? 収益を向上させるためにはもちろんより多くの商品を売ることにほかなりません。より多くの商品を売るためには当然、顧客の購買心理における購買機会に対して販売を最大化する必要があります。あるいは、顧客の購買心理そのものを潜在的なものから顕在化したものにすることも必要でしょう。つまり、販売機会を最大限に活用するということは、店舗中心ではなく、顧客中心に考えるということです。 小売業における攻めのデータ活用の1つは、品ぞろえの最適化 このように、顧客中心に考えることで初めて最適な品揃えの仮説検証のサイクルが可能となります。過去のデータは、単に過去の企業活動の結果であり、世の中の「真理~ここでは顧客の本当の購買思考」を表しているわけではありません。真理への到達は、仮説検証ベースの実験によってのみ可能になります。わかりやすく言うと皆さんよくご存じのABテストです。このような実験により、品ぞろえを最適化することで、販売機会を最大化することが可能なります。そのプロセスと並行して、オペレーショナルな需要予測を実践していくことが重要となります。 需要予測と品ぞろえ最適化の進化 昨今、AIブーム、データサイエンティストブーム、人手不足や働き方改革といったトレンドの中で、従来データ活用に投資してこなかった小売業においても投資が進んでいます。しかし多くのケースでこれまで述べてきたような守りのデータ活用にとどまっていたり、古くから行われている方法や手法にとどまっているケースが見受けられます。歴史から学ぶことで、無用なPOCや効率の悪い投資を避けることができます。今、自社で行っていることがこの歴史の中でどこに位置しているかを考えてみることで、投資の効率性の向上に是非役立てていただければと思います。 小売業におけるデータ活用のROI最大化にむけたフレームワーク SASでは長年、小売業や消費財メーカのお客様とともにお客様のビジネスの課題解決に取り組んできました。その過程で、小売業・消費財メーカー企業内の個々の業務プロセスを個別最適するのではなく、それら個々の業務プロセスを統合した、エンタープライズな意思決定フレームワークが重要であるとの結論に至っています。AIやデータサイエンスという手段を活用し、データドリブンな意思決定のための投資対効果を最大化するための羅針盤としてご活用いただければと思います。
While writing an article about Toeplitz matrices, I saw an interesting fact about the eigenvalues of tridiagonal Toeplitz matrices on Nick Higham's blog. Recall that a Toeplitz matrix is a banded matrix that is constant along each diagonal. A tridiagonal Toeplitz matrix is zero except for the main diagonal, the
Del 17 al 28 de julio de 2023 SAS entrenará de forma gratuita a estudiantes de Latinoamérica y recién egresados para desarrollar las habilidades demandadas en el mercado laboral. El entrenamiento total tiene un valor aproximado de 7,500 dólares. Al finalizar los estudiantes se podrán certificar de forma gratuita. La
A Toeplitz matrix is a banded matrix. You can construct it by specifying the parameters that are constant along each diagonal, including sub- and super-diagonals. For a square N x N matrix, there is one main diagonal, N-1 sub-diagonals, and N-1 super-diagonals, for a total of 2N-1 parameters. In statistics and applied
SoDA를 이용해 쉽게 배우는 데이터 과학 #1 SoDA(SAS OnDemand for Academics)는 SAS가 무료로 제공하는 교육용 데이터 분석 소프트웨어 프로그램입니다. 앞으로 4회에 걸쳐 'SoDA 를 이용해 쉽게 배우는 데이터 과학'을 자세히 소개해 드리도록 하겠습니다. SoDA란? SoDA (SAS OnDemand for Academics; SoDA)는 데이터 과학을 처음 배우는 입문자들에게 SAS를 무료로 배울 수 있도록
A previous article explains the Spearman rank correlation, which is a robust cousin to the more familiar Pearson correlation. I've also discussed why you might want to use rank correlation, and how to interpret the strength of a rank correlation. This article gives a short example that helps you to
Since the COVID-19 pandemic began, video presentations and webcasts have become a regular routine for many of us. On days that I will be using my webcam, I wear a solid-color shirt. If I don't plan to be on camera, I can wear a pinstripe Oxford shirt. Why the difference?
Real-world data often exhibits extreme skewness. It is not unusual to have data span many orders of magnitude. Classic examples are the distributions of incomes (impoverished and billionaires) and population sizes (small countries and populous nations). The readership of books and blog posts show a similar distribution, which is sometimes
This year, SAS Innovate in Orlando brought executives and other industry luminaries together to inspire attendees to use analytics to outpace tomorrow. Data and analytics enables you to get to the true value of a human being. Michael Lewis, Author of Moneyball Chief Marketing Officer Jenn Chase, who hosted the event,
이스탄불 광역시, 세계적인 벽돌 제조사 ‘위너버거’, SAS Viya로 제품품질 확보와 지속가능성 향상 모두 달성 친환경 비즈니스 전략을 지원하기 위해 주요 기업들은 AI, 머신러닝 및 사물인터넷(IoT) 분석에 더욱 더 의존하고 있습니다. 이러한 기술은 탄소 및 폐기물 배출량을 줄임으로써 지속가능성을 향상시키고, 더 똑똑하고 효율적인 운영 방법을 개발하는 데 도움이 되고 있습니다. SAS는
Labeling objects in graphs can be difficult. SAS has a long history of providing support for labeling markers in scatter plots and for labeling regions on a map. This article discusses how the SGPLOT procedure decides where to put a label for a polygon. It discusses the advantages and disadvantages
This year six teams from Benelux joined the global competition. These teams took on exciting challenges: from unlocking privacy-sensitive healthcare data with synthetic data to optimizing cheese production and much more! With access to the latest SAS software and SAS mentors, they were building innovative solutions to real-world problems. Two
SAS supports many ways to compute the rank of a numeric variable and to handle tied values. However, sometimes I need to rank the values in a character categorical variable. For example, the values {"Male", "Female", "Male"} have ranks {2, 1, 2} because, in alphabetical order, "Female" is the first-ranked
‘SAS 이노베이트 2023’에서 최신 AI, 클라우드 분석 기술 및 사례 발표 세계적인 분석 선두 기업 SAS가 5월 8일부터 10일까지(미국 현지 시간) 미국 플로리다주 올랜도에서 ‘SAS 이노베이트 2023(SAS Innovate 2023)’ 행사를 개최했습니다. ‘SAS 이노베이트’는 전 세계 산업별 전문가와 오피니언 리더들이 참석하는 SAS의 연례 비즈니스 컨퍼런스입니다. 이번 행사에서는 SAS Viya 제품의 놀라운
"For those not knee-deep in the ModelOps process, the process may seem simple," says Ankit Sinha, Director of Product Management at Experian: "You build the model, deploy the model and reap the benefits." But the process starts to become very complex when you're using multiple database systems and data sources
A previous article defines the silhouette statistic (Rousseeuw, 1987) and shows how to use it to identify observations in a cluster analysis that are potentially misclassified. The article provides many graphs, including the silhouette plot, which is a bar chart or histogram that displays the distribution of the silhouette statistic
"Demand planning is a team sport," says Davis Wu, Global Lead of Demand Planning and Analytics at Nestlé S.A. The team at Nestlé includes demand planners, demand analysts and data scientists, who all work together to ensure that Nestlé products remain in stock. Many demand analysts can be seen as
Empirical Mode Decomposition (EMD) is a powerful time-frequency analysis technique that allows for the decomposition of a non-stationary and non-linear signal into a series of intrinsic mode functions (IMFs). The method was first introduced by Huang et al. in 1998 and has since been widely used in various fields, such as signal processing, image analysis, and biomedical engineering.
En el ámbito de la seguridad, la capacidad de investigación es esencial. Sin embargo, en un mundo cada vez más digital, puede resultar difícil mantenerse al día con las últimas herramientas y técnicas para realizar investigaciones de manera eficiente y efectiva. En este post, exploraremos las mejores prácticas y beneficios
Mental Health Month is an important time to honor and raise awareness around mental illness and mental wellness. Correcting and combating stigma and discrimination, including with data, is one of the month’s major goals. It’s hard to talk about mental health without also addressing substance use disorders (including opioids), homelessness
Assigning observations into clusters can be challenging. One challenge is deciding how many clusters are in the data. Another is identifying which observations are potentially misclassified because they are on the boundary between two different clusters. Ralph Abbey's 2019 paper ("How to Evaluate Different Clustering Results") is a good way