In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
Uncategorized
The xkcd comic often makes me think and laugh. The comic features physics, math, and statistics among its topics. Many years ago, the comic showed a "binary heart": a grid of binary (0/1) numbers with the certain numbers colored red so that they formed a heart. Some years later, I
.@philsimon on the reliability of social numbers.
Brilliant, humorous, and obscure. Those words could describe two of my favorite comedians, Emo Philips* and the late Dennis Wolfberg. They could also describe, with the addition of "exceedingly" brilliant, "scathingly" humorous, and "apparently totally" obscure, a 1957 article, "Two Important Problems in Sales Forecasting" by James H. Lorie (The
We’re all about numbers here at SAS. So when the Global Certification program hit its 75,000th credential – we had to make it a big deal. We tracked down the 75,000th credential holder to Susan Langan, a research analyst in Maryland, and what’s even more special than Langan holding the
Once in a while, people run into an issue with the data that doesn't really need to be fixed right to ensure success of a specific project. So, the data issues are put into production and forgotten. Everyone always says, “We will go back and correct this later.” But that
The SAS DATA step supports multidimensional arrays. However, matrices in SAS/IML are like mathematical matrices: they are always two dimensional. In simulation studies you might need to generate and store thousands of matrices for a later statistical analysis of their properties. How can you accomplish that unless you can create
According to a 2012 report, it was estimated that over the next five years the US Internal Revenue Service (IRS) will issue more than $20 billion in potentially fraudulent tax refunds. Figures like this do little to boost taxpayers’ confidence in our nation’s tax system. And tax fraud is not
I'm ramping up my visualization skills in preparation for the next big election, and I invite you to do the same! Let's start by plotting some county-level election data on a map... To get you into the spirit of elections, here's a picture of my friend Sara's dad, when he was
Because finding analytical talent continues to be a challenge for most, here I offer tips 5, 6, and 7 of my ten tips for finding data scientists, based on best practices at SAS and illustrated with some of our own “unicorns.” You can read my first blog post for why they
Regulatory compliance is a principal driver for data quality and data governance initiatives in many organisations right now, particularly in the banking sector. It is interesting to observe how many financial institutions immediately demand longer timeframes to help get their 'house in order' in preparation for each directive. To the
Part 1 of this topic presented a simple Sudoku solver. By treating Sudoku as an exact cover problem, the algorithm efficiently found solutions to simple Sudoku problems using basic logic. Unfortunately, the simple solver fails when presented with more difficult Sudoku problems. The puzzle on the right was obtained from
Insurance can be a complex business, so filing an insurance claim can be daunting task for many small businesses. When an incident does occur, be it property damage, business interruption, professional indemnity or public liability among the myriad of other potential causes of loss, it is typically a period of
In many ways it’s open season for open data; open data is one of those phrases we hear a lot but it’s not always appreciated as having value. The fact that it’s openly available is seen by some as proof that there’s no value in the data – unlike, for
In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
In SAS, the order of variables in a data set is usually unimportant. However, occasionally SAS programmers need to reorder the variables in order to make a special graph or to simplify a computation. Reordering variables in the DATA step is slightly tricky. There are Knowledge Base articles about how
Staying competitive in a big data world means working fast and making decisions even faster. You need to assess conditions, approve access, stop transactions and reroute activities quickly so you can seize opportunities or prevent problems. With increasing data volumes from the Internet of Things (Cisco predicts that fifty billion
North Carolina is one of those lucky states that has a huge variety of scenic destinations, such as mountains, piedmont, coastal plains, beaches, and 'outer banks' islands. We have state parks in all of these areas, but can you guess which state park has been trending the most during the past
There are companies that have no data quality initiative, and truly do believe that if they see no data problem. In effect, they say that if it does not interfere with day-to-day business, then there is no data quality problem. From what I have seen in my consulting experience, it usually
We asked our partners at the Cornell Center for Hospitality Research to poll the research faculty at the Hotel School to understand their guidance about what to expect in 2015. We were also able to get a preview of what the faculty will be working on in terms of research
Over my last two posts, I suggested that our expectations for data quality morph over the duration of business processes, and it is only at a point that the process has completed that we can demand that all statically-applied data quality rules be observed. However, over the duration of the
A SAS/IML programmer asked a question on a discussion forum, which I paraphrase below: I've written a SAS/IML function that takes several arguments. Some of the arguments have default values. When the module is called, I want to compute some quantity, but I only want to compute it for the
Google recently announced that they will be adding Google Fiber high speed network and TV to my area. This was great news, because it will give us more choices ... and a little competition among providers tends to make them all 'try harder' to please the customer. :-) I was curious what other
It's an exciting time for reality! We've been technologically enhancing reality for a long time -- eye glasses, telescopes, binoculars, microscopes, photography, moving pictures, live streaming video over the Internet, etc. But whether it's augmented reality, virtual reality or somewhere in between, a new wave of eye wear technology is
This week’s author tip is from Robert Virgile and his book “SAS Macro Language Magic: Discovering Advanced Techniques”. Virgile chose this tip because discovering and developing this technique will help you make the most of MACROS. We hope you find this tip useful. You can also read an excerpt from
This week, I finally ate some liver, for the first time in over 20 years - and I realized it's a lot like prepping data (which I'll explain in this blog post). Here are a few of the similarities: They're both good for you. Thinking about them makes you go
This year the American Statistical Association Conference on Statistical Practice (CSP) has some weighty themes including Big Data Prediction and Analysis and all of its exciting applications. But just as important is the theme Communication and Impact. Everyone knows that if you have a great idea or discovery but you
In my book Simulating Data with SAS, I discuss a relationship between the skewness and kurtosis of probability distributions that might not be familiar to some statistical programmers. Namely, the skewness and kurtosis of a probability distribution are not independent. If κ is the full kurtosis of a distribution and
SAS Programming Professionals, Are you doing things SAS backwards? Yea; we have all been there. A manager, or a user, or a client, asks you to perform an analysis or create a data set and gives you a sketchy description of what they want. Maybe they say "It's like that
We asked our partners at the Cornell Center for Hospitality Research to comment on what they are seeing in terms of trends that will impact the hospitality industry in 2015. Cathy Enz, full professor in strategy and The Lewis G. Schaeneman Jr. Professor of Innovation and Dynamic Management at the