Decision trees are one of the top machine learning algorithms used by data scientists. Decision trees use supervised learning to classify problems. Even if you are not a data scientist, chances are you can interpret the visual output from a decision tree.
Uncategorized
I previously wrote about how to understand standardized regression coefficients in PROC REG in SAS. You can obtain the standardized estimates by using the STB option on the MODEL statement in PROC REG. Several readers have written to ask whether I could write a similar article about the STDCOEF option
Here in the US, the pipeline which supplies gasoline to North Caroline (and much of the southeast) was hacked, and had to be shut down for several days. This caused gasoline shortages, and long lines at the gas pumps (as shown in the picture below, by my friend Daniel). But
It’s not too late to register for Tuesday’s (May 18) SAS Global Forum kickoff. And there are so many online sessions it’s going to be hard to choose which to attend. Business and analytics experts are leading dozens of virtual session for industries, from agtech, banking and financial services, goverment,
Lake Superior is the largest freshwater lake (by surface area) in the world. And its largest island is Isle Royale. And its largest lake is Lake Siskiwit. And its largest island is Ryan Island. Ryan Island's largest (seasonal) pond is called Moose Flats. And it contains an 'island' (or a
It’s almost time for another SAS Global Forum, and this year even I’m surprised by the range of topics and experiences available for attendees. With the conference theme of “New day, new answers, inspired by curiosity,” you probably guessed we’ll be talking about analytics and pandemic recovery. And, of course,
The COVID-19 pandemic forced government agencies to accelerate digital transformation efforts, and the Internal Revenue Service (IRS) was no exception. The IRS was already in a significant transition period, spurred by the Taxpayer First Act, which was signed into law in July 2019. The Act was designed to improve the
You can standardize a numerical variable by subtracting a location parameter from each observation and then dividing by a scale parameter. Often, the parameters depend on the data that you are standardizing. For example, the most common way to standardize a variable is to subtract the sample mean and divide
With my collaborators Len Tashman (Editor of Foresight) and Udo Sglavo (VP of Analytics R&D at SAS), we are happy to announce the release of our new collection, Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning (Wiley, 2021). Building upon our previous collection Business Forecasting: Practical Problems and
Early in her career, a mentor gave Dawnté Early some career advice that would ultimately change her life. “If you can learn how to analyze and dig into your own data, instead of paying others to do it for you, you’ll be able to do your own research and write
Odani's truism is a mathematical result that says that if you want to compare the fractions a/b and c/d, it often is sufficient to compare the sums (a+d) and (b+c) rather than the products a*d and b*c. (All of the integers a, b, c, and d are positive.) If you
I think one of the great uses of analytics and graphics is to show things like cancer clusters on a map. There are many factors that can lead to a higher incidence of diseases in geographical areas, and chemicals are often the culprit. For example, paraquat has been potentially linked
Dr. Ayesha Khanna has a vision for blending our physical infrastructures with cloud and AI infrastructures to improve the way we live, work and learn. As the co-founder and CEO of ADDO AI, an AI solutions firm and incubator, she advises corporations and governments on AI and smart cities. She
Break down silos, uncover problems, support mental health solutions – with data and analytics.
Quick! Which fraction is bigger, 40/83 or 27/56? It's not always easy to mentally compare two fractions to determine which is larger. For this example, you can easily see that both fractions are a little less than 1/2, but to compare the numbers you need to compare the products 40*56
By Paul Ridge, Head of Insurance, SAS UK & Ireland Black swan events are not unknown to the insurance industry. Whether it’s natural disasters such as the Japanese earthquake in 2011, the Chernobyl disaster in 1986 or pandemics such as the SARS virus in 2002. By their nature, these kinds
An estimated 44% of people in jail and 37% of those in prison have a mental health condition. When I worked at the San Bernardino County Department of Behavioral Health, the Sheriff and Probation Departments were close partners with us. My Research & Evaluation team worked with their data teams to evaluate
A previous article discusses the definition of the Hoeffding D statistic and how to compute it in SAS. The letter D stands for "dependence." Unlike the Pearson correlation, which measures linear relationships, the Hoeffding D statistic tests whether two random variables are independent. Dependent variables have a Hoeffding D statistic
Every 10 years the United States conducts a Census where we count all the people. This week the 2020 Census population totals for each state were released. And how might these numbers affect you? ... One thing the Census numbers are used for is to determine how many of the
With the release of SAS Viya 2020.1.4, text categories and concept models can now be deployed into production with just a few clicks and used to score data in-batch and via API! You can also now use these models in decision flows.
Data, AI and digital transformation will define the industry of the future. The die is cast. Without an industrial approach for analytics, there will be no future! Diamonds are forever Khepri, a deity of ancient Egypt, symbolized the morning rebirth of the sun. Khepri is also said to have inspired
There are many statistics that measure whether two continuous random variables are independent or whether they are related to each other in some way. The most well-known statistic is Pearson's correlation, which is a parametric measure of the linear relationship between two variables. A related measure is Spearman's rank correlation,
Statewide longitudinal data systems (SLDS) have been around for many years, helping states understand students’ paths through the education system and beyond. The COVID-19 pandemic was an opportunity for one state’s SLDS to step up in new ways that helped feed children in need. With the US Department of Agriculture
Note from Gül Ege Sr. Director, Analytics R&D, IoT: The pattern of training in the Cloud, with your choices of framework and inferencing at the Edge with a target environment, are especially common in Internet of Things (IoT). In IoT, there is a proliferation of hardware environments on the Edge.
My coworker recently shared this post in Grown and Flown from a mother of a teen who lives with both depression and ADHD. This is a timely share as we approach May which is Mental Health Awareness Month. I’d encourage you to read to better understand this experience or, perhaps,
Everybody likes to learn a bit of interesting trivia... It could make you look smart, or might help you win a bet in a bar. Or maybe give you something to amaze your kids with. Do you know what's the biggest non-domesticated land animal in your state? How about all
SAS/IML programmers often create and call user-defined modules. Recall that a module is a user-defined subroutine or function. A function returns a value; a subroutine can change one or more of its input arguments. I have written a complete guide to understanding SAS/IML modules, which contains many tips for working
Government procurement teams are responsible for managing billions of pounds of public expenditure, and taxpayers want more transparency on how their money is being spent. However, experts estimate that procurement errors, waste and abuse can cost central government up to 4.7% of procurement spend.[1] And when government procurement fraud scandals hit
Analyzing climate change business risk helps companies choose the most effective actions.
Ranking is a fundamental concept in statistics. Ranks of univariate data are used by statisticians to estimate statistics such as percentiles (quantiles) and empirical distributions. A more advanced use is to compute various rank-based measures of correlation or association between pairs of variables. For example, ranks are used to compute