Developing an accurate understanding of statistics will help you build robust machine learning models that are optimized for a given business problem. SAS launched a new course that provides a comprehensive overview of the fundamentals of statistics that you'll need to start your data science journey. This course is also a prerequisite to many courses in the SAS data science curriculum.
Tag: Statistics
When you use PROC MEANS or PROC SUMMARY to create a summary data set and include a CLASS statement, SAS includes two variables, _FREQ_ and _TYPE_, in the output data set. This blog shows you two ways to interpret and use _TYPE_ using the data set Shoes in the SASHELP
Crises like the COVID-19 pandemic have increased the demand for public health experts who possess advanced analytics skills. After all, data – when properly collected, analyzed and understood – has immense power to inform decision-making. And in areas like public health, informed decision making can save lives. Azhar Nizam has
Because it is near the end of the year, I thought a blog about "Summarizing" data might be in order. For these examples, I am going to use a simulated data set called Drug_Study, containing some categorical and numerical variables. For those interested readers, the SAS code that I used
The following is an excerpt from Cautionary Tales in Designed Experiments by David Salsburg. This book is available to download for free from SAS Press. The book aims to explain statistical design of experiments (DOE) to readers with minimal mathematical knowledge and skills. In this excerpt, you will learn about
What does the AI enterprise of the future look like? That’s a tough question that I’ve been asked to consider, along with a distinguished panel at Valley ML AI Expo 2020. The title of the panel is, “Life, the Universe and the AI Enterprise of the Future.” Based on an initial chat with panel chair Gautam Khera, I’ve written up some possible topics we’ll be covering on the panel. Consider
“Technology is an industry that eats its young, it is rare to come across providers that have been around for more than a human generation.” Tony Bear, Big on Data With more than 40 years in the market, SAS is one of the rare technology providers that has been around
A note from Udo Sglavo: The need for randomization in experimental design was introduced by the statistician R. A. Fisher in 1925, in his book Statistical Methods for Research Workers. You would assume that developing a successful treatment for COVID-19, the illness caused by the SARS-CoV-2 virus, will eventually conclude in
Learn how to use the SGPLOT procedure for graphical representation when you perform statistical analysis for a quadratic ANCOVA model with the GLM procedure.
A note from Udo Sglavo: In Digital transformation, scientific computing, and peace of mind, I mention that the COVID-19 pandemic is paralyzing the world. However, new challenges are also inspiring new ideas to tackle those challenges. We might ask questions about what is causal in nature, trying to figure out
Remember Subconscious Musings? It was the name of the blog Radhika Kulkarni (now retired Vice President of SAS R&D) started in 2012. She wrote about trends that drove innovation and challenges that expanded the boundaries of what we thought was possible. It eventually evolved into what we now know as
One of the first and most important steps in analyzing data, whether for descriptive or inferential statistical tasks, is to check for possible errors in your data. In my book, Cody's Data Cleaning Techniques Using SAS, Third Edition, I describe a macro called %Auto_Outliers. This macro allows you to search
The t-test is a very useful test that compares one variable (perhaps blood pressure) between two groups. T-tests are called t-tests because the test results are all based on t-values. T-values are an example of what statisticians call test statistics. A test statistic is a standardized value that is calculated
Summarizing numeric data is an important step in analyzing your data. CASL provides multiple actions that generate summary statistics. This blog provides a quick overview of three of those actions: SIMPLE.SUMMARY, AGGREGATION.AGGREGATE, and DATAPREPROCESS.RUSTATS.
In the hype and excitement surrounding artificial intelligence and big data, most of us miss out on critical aspects related to collection, processing, handling and analyzing data. It's important for data science practitioners to understand these critical aspects and add a human touch to big data. What are these aspects?
I recently read a very interesting article describing how analytics is being used to detect cheating/copying/re-use in crossword puzzle creation, in some of the major news publications. This inspired me to try my hand at creating a totally new & unique crossword puzzle ... of course using SAS software! :) My grandmother
We live in a complex world that overflows with information. As human beings, we are very good at navigating this maze, where different types of input hit us from every possible direction. Without really thinking about it, we take in the inputs, evaluate the new information, combine it with our
April 7, 2003 will go down in the history books for me. The streets of Syracuse, New York, were abuzz. I was a junior television major, and our men’s basketball team had just won its first NCAA basketball title. Our three-seed Orangemen had bested #2 Kansas in New Orleans, but the
In “Explaining statistical methods to the terrified & disinterested: A focus on metaphors”, I discuss the usefulness of metaphors for explaining abstract statistical concepts to non-technical readers. This is an approach taken in my new SAS Press book, Business Statistics Made Easy in SAS®, since many readers of this level
In a previous blog I suggested that many readers in many applied areas are reading statistics texts under duress for a course or project, and are in truth somewhere between disinterested and terrified. In my new SAS Press book Business Statistics Made Easy in SAS® I make use of various
If you turned in for my recent webinar, Machine Learning: Principles and Practice, you may have heard me talking about some of my favorite machine learning resources, including recent white papers and some classic studies. As I mentioned in the webinar, machine learning is not new. SAS has been pursuing
One question I get asked a lot is: What is the most exciting new statistical feature in the 14.1 release? And they get a bit frustrated when I say: It depends. But it does depend! SAS statistical software provides a broad array of capabilities that help users track disease outbreaks,
SAS is hosting this year’s European Analytics 2015 conference in Rome November 9 – 11. This three-day inspiring event will give you the chance to boost your company’s analytics culture in an international environment to make sure your knowledge and expertise meet the demands of the digital era. But what if
There is a job category unfamiliar to most people that plays a crucial role in the creation of analytics software. Most can surmise that SAS hires software developers with backgrounds in statistics, econometrics, forecasting or operations research to create our analytical software; however, most do not realize there is another
“Let’s assume a normal distribution …” Ugh! That was your first mistake. Why do we make this assumption? It can’t be because we want to be able to mentally compute standard deviations, because we can’t and don’t it that way in practice. No, we assume a normal distribution to simplify
This post is a nod to one of my favourite plays, The Importance of Being Earnest by Oscar Wilde. As the title ‘Data Scientist’ becomes more common, what can we gain about the importance of titles and labelling from this century old play? For those that haven't read it, the
Depending on whether you are a half-full or a half-empty kind of person, the "big data" revolution is either a tremendous windfall for the career of a statistician, or the makings of a real existential crisis. As with most things, it’s probably a bit of both. On the one hand,
In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze
Business Intelligence (BI) can mean many things to many people, but generally BI is associated with business reports. When you fold business analytics (BA), especially advanced analytics that are predictive or prescriptive, under the BI umbrella you inherently dilute the value proposition that analytics can provide to an organization. Why
So far in our Ask the Statistician blog and video series, we have heard responses from statisticians at the Analytics 2013 conference about: The many ways statistics benefit their organizations. The types of statistical analyses used to solve business issues. Best practices for explaining results. How they put statistical