New SAS Training Course: Statistics You Need to Know for Machine Learning

Developing an accurate understanding of statistics will help you build robust machine learning models that are optimized for a given business problem. SAS launched a new course that provides a comprehensive overview of the fundamentals of statistics that you'll need to start your data science journey. This course is also a prerequisite to many courses in the SAS data science curriculum.

Crises like the COVID-19 pandemic have increased the demand for public health experts who possess advanced analytics skills. After all, data – when properly collected, analyzed and understood – has immense power to inform decision-making. And in areas like public health, informed decision making can save lives. Azhar Nizam has

Summarizing data

Because it is near the end of the year, I thought a blog about "Summarizing" data might be in order. For these examples, I am going to use a simulated data set called Drug_Study, containing some categorical and numerical variables. For those interested readers, the SAS code that I used

Thomas Bayes’ theorem and “inverse probability”

The following is an excerpt from Cautionary Tales in Designed Experiments by David Salsburg. This book is available to download for free from SAS Press. The book aims to explain statistical design of experiments (DOE) to readers with minimal mathematical knowledge and skills. In this excerpt, you will learn about

6 questions about the future of AI

What does the AI enterprise of the future look like? That’s a tough question that I’ve been asked to consider, along with a distinguished panel at Valley ML AI Expo 2020.  The title of the panel is, “Life, the Universe and the AI Enterprise of the Future.” Based on an initial chat with panel chair Gautam Khera, I’ve written up some possible topics we’ll be covering on the panel. Consider

Testing the Assumption of Normality for Parametric Tests

The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated

Summarization in CASL

Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provides a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.

Why is it important to add a human touch to big data?

In the hype and excitement surrounding artificial intelligence and big data, most of us miss out on critical aspects related to collection, processing, handling and analyzing data. It's important for data science practitioners to understand these critical aspects and add a human touch to big data. What are these aspects?

A statistical crossword puzzle to exercise your brain

I recently read a very interesting article describing how analytics is being used to detect cheating/copying/re-use in crossword puzzle creation, in some of the major news publications. This inspired me to try my hand at creating a totally new & unique crossword puzzle ... of course using SAS software! :) My grandmother

The analytics of March basketball brackets

April 7, 2003 will go down in the history books for me. The streets of Syracuse, New York, were abuzz. I was a junior television major, and our men’s basketball team had just won its first NCAA basketball title. Our three-seed Orangemen had bested #2 Kansas in New Orleans, but the

6 machine learning resources for getting started

If you turned in for my recent webinar, Machine Learning: Principles and Practice, you may have heard me talking about some of my favorite machine learning resources, including recent white papers and some classic studies. As I mentioned in the webinar, machine learning is not new. SAS has been pursuing

What if the Romans had analytics software? Analytics 2015 goes to Rome

SAS is hosting this year’s European Analytics 2015 conference in Rome November 9 – 11. This three-day inspiring event will give you the chance to boost your company’s analytics culture in an international environment to make sure your knowledge and expertise meet the demands of the digital era. But what if

Numeric validation for analytical software testing

There is a job category unfamiliar to most people that plays a crucial role in the creation of analytics software. Most can surmise that SAS hires software developers with backgrounds in statistics, econometrics, forecasting or operations research to create our analytical software; however, most do not realize there is another

Diagnosis: Your data is not “normal”

“Let’s assume a normal distribution …”  Ugh!  That was your first mistake.  Why do we make this assumption?  It can’t be because we want to be able to mentally compute standard deviations, because we can’t and don’t it that way in practice.  No, we assume a normal distribution to simplify

The Importance of Being a Data Scientist

This post is a nod to one of my favourite plays, The Importance of Being Earnest by Oscar Wilde. As the title ‘Data Scientist’ becomes more common, what can we gain about the importance of titles and labelling from this century old play? For those that haven't read it, the

Statistics in the era of big data and the data scientist

Depending on whether you are a half-full or a half-empty kind of person, the "big data" revolution is either a tremendous windfall for the career of a statistician, or the makings of a real existential crisis. As with most things, it’s probably a bit of both. On the one hand,

The Chicken Man versus the Data Scientist

In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze