In my previous post, I showed how to approximate a cumulative density function (CDF) by evaluating only the probability density function. The technique uses the trapezoidal rule of integration to approximate the CDF from the PDF. For common probability distributions, you can use the CDF function in Base SAS to
Uncategorized
Does your forecast look like a radio? No? Then don't treat it like one. A radio's tuning knob serves a valid purpose. It lets you make fine adjustments, improving reception of the incoming signal, resulting in a clearer and more enjoyable listening experience. But just because you can make fine adjustments to
The old adage is that “Data is the lifeblood of the insurance industry.” However, for many insurance companies, data is like the red-headed stepchild. No one is willing to take care or have responsibility for it. In the past, insurance companies have created data governance programs, but these have often
When I saw Robert Kosoro's cool ZIPScribble map, I knew I had to create a SAS version - and of course I had to add a few enhancements along the way.... I was perusing some of the examples on dadaviz.com, and Kosoro's ZIPScribble map caught my attention. It wasn't a particularly useful
One of the common traps I see data quality analysts falling into is measuring data quality in a uniform way across the entire data landscape. For example, you may have a transactional dataset that has hundreds of records with missing values or badly entered formats. In contrast, you may have
You might be surprised at how many movies and TV shows are made in North Carolina - especially within the last few years. This blog provides a SAS graph that will make the list of films even easier to read! A recent story by the Tar Heel Traveler, and an exhibit
In The Princess Bride, one of my favorite movies, our hero Westley – in an attempt to save his love, Buttercup – has to navigate the Fire Swamp. There, Westley and Buttercup encounter fire spouts, quicksand and the dreaded rodents of unusual size (RUS's). Each time he has a response to the
Evaluating a cumulative distribution function (CDF) can be an expensive operation. Each time you evaluate the CDF for a continuous probability distribution, the software has to perform a numerical integration. (Recall that the CDF at a point x is the integral under the probability density function (PDF) where x is
Warranties have a long - and some might say - interesting past. But the future is even brighter. New technologies and data sources are transforming our understanding of field quality, enabling deeper insights into product performance and customer preferences. These breakthroughs are accelerating the quest to reduce defects and satisfy customers.
I don’t know about you, but I get pretty determined to prove them wrong when people tell me that I cannot do something. I am not talking about fantastical things such as flying through the heart of the sun and out the other side without getting burned. Nor, am I
In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed
Helping students to reason statistically is challenging enough without also having to provide in-class software instruction. “Practical Data Analysis with JMP, Second Edition” walks students through the process of analysis with JMP at their own speed at home, allowing faculty to devote class time to crucial or subtle statistical concepts
Unter dem Motto "Big Data - Einsatzerfahrungen, Entscheidungsprozesse, Effekte" trafen sich gestern 580 Big Data Verantwortliche und Interessierte aus allen Branchen in Hanau, um Erfahrungen auszutauschen und Impulse für Big Data Initiativen in dem eigenen Unternehmen mitzunehmen. Ich möchte meine Erkenntnisse in diesem Blog weitergeben und diese insbesondere bezogen auf
Financial institutions are mired with large pools of historic data across multiple line of businesses and systems. However, much of the recent data is being produced externally and is isolated from the decision making and operational banking processes. The limitations of existing banking systems combined with inward-looking and confined data practices
Small data is akin to algebra; big data is like calculus.
Last week I received a message from SAS Technical Support saying that a customer's IML program was running slowly. Could I look at it to see whether it could be improved? What I discovered is a good reminder about the importance of vectorizing user-defined modules. The program in this blog
From the pressures of a highly competitive marketplace to changing economic conditions, to the evolution of the distribution network - the challenges facing the hospitality are many and varied. In this video, SAS asked a panel of experts to share their views on the issues that will challenge the hospitality
I recently wrote about how to overlay multiple curves on a single graph by reshaping wide data (with many variables) into long data (with a grouping variable). The implementation used PROC TRANSPOSE, which is a procedure in Base SAS. When you program in the SAS/IML language, you might encounter data
BI ist Auslöser für heftige Diskussionen. War es immer, wird es noch lange sein. Aktuell stehen zwei Lager im Ring. Die beiden debattieren aber nicht darüber, ob das Glas Wasser halb voll oder halb leer ist. Nein, sie reden darüber, wie schwer die jeweilige Variante ist. Und das hängt,
The electoral battlespace for the upcoming general election in the United Kingdom is starting to take shape. Campaigners are busily debating the political landscape. They want to own the high ground that dominates areas that matter most to voters – the NHS and the economy. With an ageing population and
In the movie Big, a 12-year-old boy, after being embarrassed in front of an older girl he was trying to impress by being told he was too short for a carnival ride, puts a coin into an antique arcade fortune teller machine called Zoltar Speaks, makes a wish to be big,
Data. To a statistician, data are the observed values. To a SAS programmer, analyzing data requires knowledge of the values and how the data are arranged in a data set. Sometimes the data are in a "wide form" in which there are many variables. However, to perform a certain analysis
Despite an increase in the availability of data in the federal government over the past few years, data and analytics could be doing even more for federal agencies. A strategic approach to managing and analyzing the data is needed. And, like many technology challenges – that’s a people problem. A
"You show me a successful complex system, and I will show you a system that has evolved through trial and error." ~ Tim Harford TED Talk link: http://www.ted.com/talks/tim_harford Karl Marx died thinking that the first communist revolution would occur in Great Britain, driven by the long hours and unsafe
Die Data Science und die Nachfrage nach entsprechenden Experten hat gewaltig Fahrt aufgenommen. Aber bei näherer Betrachtung zeigt sich, dass es fast ebenso viele unterschiedliche Ausprägungen des Begriffs gibt, wie offene Stellen zu besetzen sind. Das zeigt unter anderem der Persönlichkeitstest, den wir gemeinsam mit einem englischen Psychologenteam entwickelt haben. Wir laden daher
This week’s author tip is from Robert Virgile and his book SAS Macro Language Magic: Discovering Advanced Techniques. Virgile chose this tip because even good programmer’s make errors. We hope you find this tip useful. You can also read an excerpt from Virgile’s book. Even good programmers make errors. In
Mobile World Congress is quickly approaching. Attendees and exhibitors are feverishly scheduling meetings, doing research, and determining their areas of focus to maximize their experience of the event. If you're hoping to learn more about big data analytics at the conference, here are some helpful insights and resources to help you
Sports provide us with many familiar clichés about playing defense, such as: Defense wins championships. The best defense is a good offense. Or my favorite: The best defense is the one that ranks first statistically in overall defensive performance, after controlling for the quality of the offenses it has faced. Perhaps not
After the legalization of recreational marijuana use in Colorado in 2012, it has been a much more frequent news topic than before - even from a data analysis perspective... I was recently looking for 'interesting' data to analyze with SAS, and I noticed some articles about the increasing potency of marijuana in
SAS procedures usually handle missing values automatically. Univariate procedures such as PROC MEANS automatically delete missing values when computing basic descriptive statistics. Many multivariate procedures such as PROC REG delete an entire observation if any variable in the analysis has a missing value. This is called listwise deletion or using