
Todd Wright says Hadoop can be difficult – but data management can help.
Todd Wright says Hadoop can be difficult – but data management can help.
Datasets are rarely ready for analysis, and one of the most prevalent problems is missing data. This post is the first in a short series focusing on how to think about missingness, how JMP13 can help us determine the scope of missing data in a given table, and how to
St. Louis Union Station welcomed its first passenger train on Sept. 2, 1894 at 1:45 pm and became one of the largest and busiest passenger rail terminals in the world. Back in those days, the North American railroads widely used a system called Timetable and Train Order Operation to establish
In the digital world where billions of customers are making trillions of visits on a multi-channel marketing environment, big data has drawn researchers’ attention all over the world. Customers leave behind a huge trail of data volumes in digital channels. It is becoming an extremely difficult task finding the right
In May, the New York Times published an article “The Little Known Statistician Who Taught Us to Measure Teachers” which profiled the life of Dr. Bill Sanders. The author reflected on Dr. Sanders’ life and work improving education for all students. For those of us in education, Dr. Sanders’ work
SAS power users (and actually, power users of any application) like to customize their environment for maximum productivity. Long-time SAS users remember the KEYS window in SAS display manager, which allows you to assign SAS commands to "hot keys" in your SAS session. These users will invest many hours to
My previous blog post focused on a graph, showing the % of women earning STEM degrees in various fields. While that graph was was designed to answer a very specific question, let's now look at the data from a broader perspective. Let's look at the total number of STEM degrees
An important problem in machine learning is the "classification problem." In this supervised learning problem, you build a statistical model that predicts a set of categorical outcomes (responses) based on a set of input features (explanatory variables). You do this by training the model on data for which the outcomes
Editor-in-Chief Len Tashman's Preview The forecasting field is surely cross-disciplinary, as exemplified by the diverse membership of the International Institute of Forecasters (the publisher of this journal), but it is also multidimensional, as can be clearly seen in this Summer 2017 issue. The articles you’ll read here encompass sales and
Phil Simon chimes in on the immediacy of enterprise data.
For the past several years, efforts have been under way to recruit more women into the STEM (science, technology, engineering, and math) fields. I recently saw an interesting graph showing the percentage of bachelor's degrees conferred to women in the US, and I wondered if I could tweak that graph
I started my training in machine learning at the University of Tennessee in the late 1980s. Of course, we didn’t call it machine learning then, and we didn’t call ourselves data scientists yet either. We used terms like statistics, analytics, data mining and data modeling. Regardless of what you call
Artificial intelligence promises to transform society on the scale of the industrial, technical, and digital revolutions before it. Machines that can sense, reason and act will accelerate solutions to large-scale problems in myriad of fields, including science, finance, medicine and education, augmenting human capability and helping us to go further,
By now most of us have heard about grass-fed beef and dairy, but you may still be confused about what that really means and if the benefits are worth the extra cost. First let me point out that “organic” has nothing to do with whether an animal product is grass
I recently showed how to compute a bootstrap percentile confidence interval in SAS. The percentile interval is a simple "first-order" interval that is formed from quantiles of the bootstrap distribution. However, it has two limitations. First, it does not use the estimate for the original data; it is based only
This post presents some basic aspects of ODS Graphics: enabling, selecting, and displaying graphs.
Carbon Dioxide ... CO2. Humans breathe out 2.3 pounds of it per day. It's also produced when we burn organic materials & fossil fuels (such as coal, oil, and natural gas). Plants use it for photosynthesis, which in turn produces oxygen. It is also a greenhouse gas, which many claim
I don't have a big vacation planned this summer. Don't feel sorry for me... I am going to Germany for a week in October and on Friday I leave for my second weekend at the beach. I have recently been reading about what makes a vacation "restorative". There is some cool
Everyone is talking about artificial intelligence (AI). In fact, many SAS customers who've been using our analytics capabilities for years or even decades are asking: What can we do with AI? What exactly is AI from a software perspective? How can we infuse cognitive computing into our customer interactions and on the customer
In SAS Viya 3.2, SAS Visual Data Builder provides a mechanism for performing simple, self-service data preparation tasks for SAS Visual Analytics or other applications. SAS Visual Data Builder is NOT an Extract, Transform and Load (ETL) or data quality tool. You may still need one of those tools to
Joyce Norris-Montanari explains why it's so important to pick the right tools to manage your big data.
I previously wrote about how to compute a bootstrap confidence interval in Base SAS. As a reminder, the bootstrap method consists of the following steps: Compute the statistic of interest for the original data Resample B times from the data to form B bootstrap samples. B is usually a large
For the fifteenth year, the International Institute of Forecasters, in collaboration with SAS®, is proud to announce research grants for how to improve forecasting methods and business forecasting practice. The award for the 2017-2018 year will be two $5,000 grants, in Business Applications and Methodology. Criteria for the award of
Two of my colleagues have shared their experiences as a statistic and a child who could have been left behind. I too have my own story that helped drive my passion. All of us define equity in different ways. Equity is a concept that is hard to define, and we
For colleges and universities, awarding financial aid today requires sophisticated analysis. When higher education leaders ask, “How can we use financial aid to help meet our institutional goals?” they need to consider many scenarios to balance strategic enrollment goals, student need, and institutional finances in order to optimize yield and
Lengthens inner thigh and side waist. Engages core and lengthens spine while opening chest, arms and shoulders. Gate Pose (Sanskrit: Parighasana) Begin on mat kneeling with toes curled under and torso lengthened with crown reaching tall. Extend your right leg out to the right and press your entire foot into
As many of the regular readers of this blog know, SGPLOT and GTL, provide extensive tools to build complex graphs by layering plot statements together. These plots work with axes, legends and attribute maps to create graphs that can scale easily to different data. There are, however, many instances where
In an IoT world, everything is connected. But what does it mean to be connected? Does it mean being plugged in to your phone, car, home, TV, favorite apps and retailers? Does it mean knowing what’s happening all around you? And having the “things” you’re connected to acting as recommender
When building models, data scientists and statisticians often talk about penalty, regularization and shrinkage. What do these terms mean and why are they important? According to Wikipedia, regularization "refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information usually
In the fourth post of the 10 Commandments of Applied Econometrics series we discussed the issues of keeping the solutions sensibly simple and applying model validation. Today, I will present another commandment related to data mining techniques. Use data mining reasonably. In the econometric community, data mining is a controversial and highly emotional