Although the NSA and news media have given metadata a bad name in the popular press, the fact remains that information about the nature of your data is extremely valuable. For example, I posted an article yesterday about data cardinality. Cardinality measures the uniqueness of values in a variable. Cardinality
Uncategorized
As you've probably guessed, I'm a "visual" person - I like to see things (in a chart/graph/map) rather than just reading about them (in a data table and summary statistic). Don't get me wrong - I'm a big fan of statistics and analytics -- but I'm an even bigger fan of
If you have been reading the articles in this blog, you already know that the Graph Template Language (GTL) forms the underlying foundation for all graphs produced automatically from SAS analytical procedures and custom graphs created with the SG Procedures and the ODS Graphics Designer. SG procedures and Designer provide
Data cardinality is an important topic for for data analysis. In mathematical terms, cardinality is simply the number of elements in a set. But in data applications, cardinality signifies the number of unique values for a given field of data. Related terms include number of levels (thank you, PROC FREQ
On Kaiser Fung's Junk Charts blog, he showed a bar chart that was "published by Teach for America, touting its diversity." Kaiser objected to the chart because the bar lengths did not accurately depict the proportions of the Teach for America corps members. The chart bothers me for another reason:
The Inside SAS Global Forum video series has been officially launched for 2014. Yes, the conference is still several months away, but it is not too soon to plan! Your first item of business: submit your proposal for content. Then next: register for the conference. Anna Brown and I provide
Should you ever guess on the SAT® or PSAT standardized tests? My son is getting ready to take the preliminary SAT (PSAT), which is a practice test for the SAT. A teacher gave his class this advice regarding guessing: For a multiple-choice questions, if you can eliminate one or two
My aunt Susanne is an elderly lady, who lives at the countryside and looks forward to celebrating her 80th birthday soon. Since the 1960's she has had a telephone connection with her fixed line provider. At that time, and for many years later, in the country where my aunt lives,
In this quarter's installment of the SAS/Foresight Webinar Series, Steve Morlidge will discuss his promising new approach to evaluating the "Avoidability of Forecast Error." Based on his article in the Summer 2013 issue of Foresight, and examined in a four-part series on The BFD blog (Part 1, Part 2, Part
With Data Stewards Day fast approaching I started to reminisce about the many data stewards I’ve had the pleasure of working with in the past. What struck me was just how many people take on the role of data steward - but under the guise of conventional roles. For example,
In my last blog post I described how to implement a "runs test" in the SAS/IML language. The runs test determines whether a sequence of two values (for example, heads and tails) is likely to have been generated by random chance. This article describes two applications of the runs test.
Is big data becoming too big to ignore? An increasing number of organizations seem to think so. As Matt Asay on ReadWriteWeb writes: According to a recent Gartner report, 64% of enterprises surveyed indicate that they're deploying or planning Big Data projects (emphasis mine). Yet even more acknowledge that they
While walking in the woods, a statistician named Goldilocks wanders into a cottage and discovers three bears. The bears, being hungry, threaten to eat the young lady, but Goldilocks begs them to give her a chance to win her freedom. The bears agree. While Mama Bear and Papa Bear block
Traditionally, SAS users like their processes to behave like Ron Popeil's famous rotisserie: they want to set it and forget it. That's the definition of a batch process. You work like heck to get it ready to run, then you push the button (or schedule it) and walk away. But
What is the best way to share SAS/IML functions with your colleagues? Give them the source code? Create a function library that they can use? This article describes three techniques that make your SAS/IML functions accessible to others. As background, remember that you can define new functions and subroutines in
Forecast Value Added (FVA) is a metric for comparing the performance of your organization’s forecasting process to “doing nothing” and using a naïve model to generate your forecasts. The idea is, if all the resources and effort we put into forecasting are not providing forecasts that are better than using
"It slices, it dices ... it helps test laboratory mices!" In a joking way, this is a perfect description of SAS software, don't you think!?! :) And to prove it, this blog contains a collection of 32 examples, showing a variety of ways SAS can be used to graph data
Massive open online courses (MOOCs) are all the rage today. Some people see free online courses as a convenient way to introduce statistical concepts to tens of thousands of students who would not otherwise have an opportunity to learn about data analysis. Whereas 2013 is the International Year of Statistics,
Some of you may have already noticed the small graphical icon on the lower right side of the blog article labeled "Graphically Speaking Index". Yes, it is a link to a visual index for all articles published in this blog. Well, eventually it will have all the articles. So far, I
Has this ever happened to you? You have a SAS program with statements that you wrote, then you make use of a macro function or %include file supplied by a helpful colleague, and when your SAS code resumes, you find that all of your SAS titles have been changed out
The second part of my data governance primer series addresses ways to "mind your metadata." I can just hear the collective groans, and perhaps a stifled yawn. Sorry, but metadata collection is one of those necessary evils that may not be fun, but having it available as a resource to
Last week I described how to generate permutations in SAS. A related concept is the "combination." In probability and statistics, a combination is a subset of k items chosen from a set that contains N items. Order does not matter, so although the ordered triplets (B, A, C) and (C,
Some recent press articles question the value of big data while a book takes the opposite approach; I’ll choose the middle way. The New York Times article ‘Is Big Data an Economic Big Dud?’ questions the value of digital data and the resulting increase in the amount of data. This
"It's a floor wax, and a dessert topping" - this pretty much describes SAS/Graph! (bonus points if you know where this quote came from!) Some people think of SAS as just a quality control tool. Others think of it as just a sales & marketing tool. And yet others think
Businesses need to know who their customers are, and how much money they should invest in marketing to them. It’s an obvious idea, but it also served as pretty much the sum of my knowledge of Customer Lifetime Value (CLV). That is, until Edward Malthouse came into my life. Ed’s
This is the last post in my recent series of articles on computing contours in SAS. Last month a SAS customer asked how to compute the contours of the bivariate normal cumulative distribution function (CDF). Answering that question in a single blog post would have resulted in a long article,
I've written several articles that show how to generate permutations in SAS. In the SAS DATA step, you can use the ALLPEM subroutine to generate all permutations of a DATA step array that contain a small number (18 or fewer) elements. In addition, the PLAN procedure enables you to generate
Your biggest problem with maps used to be learning how to fold a paper road map. Today, with the advent of GPS, Google Maps, and location-specific data, the bar has been raised! ... you now need to know how to plot your data on a map! Below are several examples of different kinds
I'm spoiled by the internet. I've grown so accustomed to being able to instantly find an answer to any query—no matter how obscure—that I am surprised when I don't find what I am looking for. The other day I was trying to find a mathematical result: a formula for the
Sometimes, your first impulse may not be correct, like trading in your practical sedan for a hot 2-seater. Other times, your first impulse is perfect, as in the examples below. Suppose the automobile data you wish to analyze resides in a CSV file. Naturally, your first impulse is to import