In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
English
Have you ever sprinkled sugar over your salad? Probably not, but as it turns out food companies have already done it for you! Dressings can have up to a whopping 2 teaspoons of sugar in just 2 tablespoons- that’s 1/3 of your recommended sugar for the entire day. Wouldn’t you rather
The xkcd comic often makes me think and laugh. The comic features physics, math, and statistics among its topics. Many years ago, the comic showed a "binary heart": a grid of binary (0/1) numbers with the certain numbers colored red so that they formed a heart. Some years later, I
.@philsimon on the reliability of social numbers.
Brilliant, humorous, and obscure. Those words could describe two of my favorite comedians, Emo Philips* and the late Dennis Wolfberg. They could also describe, with the addition of "exceedingly" brilliant, "scathingly" humorous, and "apparently totally" obscure, a 1957 article, "Two Important Problems in Sales Forecasting" by James H. Lorie (The
We’re all about numbers here at SAS. So when the Global Certification program hit its 75,000th credential – we had to make it a big deal. We tracked down the 75,000th credential holder to Susan Langan, a research analyst in Maryland, and what’s even more special than Langan holding the
Once in a while, people run into an issue with the data that doesn't really need to be fixed right to ensure success of a specific project. So, the data issues are put into production and forgotten. Everyone always says, “We will go back and correct this later.” But that
The SAS DATA step supports multidimensional arrays. However, matrices in SAS/IML are like mathematical matrices: they are always two dimensional. In simulation studies you might need to generate and store thousands of matrices for a later statistical analysis of their properties. How can you accomplish that unless you can create
Suppose someone needs a kidney transplant and a family member is willing to donate one. If the donor and recipient are incompatible (because of blood types, tissue mismatch, and so on), the transplant cannot happen. Now suppose two donor-recipient pairs A and B are in this situation, but donor A
According to a 2012 report, it was estimated that over the next five years the US Internal Revenue Service (IRS) will issue more than $20 billion in potentially fraudulent tax refunds. Figures like this do little to boost taxpayers’ confidence in our nation’s tax system. And tax fraud is not
I'm ramping up my visualization skills in preparation for the next big election, and I invite you to do the same! Let's start by plotting some county-level election data on a map... To get you into the spirit of elections, here's a picture of my friend Sara's dad, when he was
Because finding analytical talent continues to be a challenge for most, here I offer tips 5, 6, and 7 of my ten tips for finding data scientists, based on best practices at SAS and illustrated with some of our own “unicorns.” You can read my first blog post for why they
Regulatory compliance is a principal driver for data quality and data governance initiatives in many organisations right now, particularly in the banking sector. It is interesting to observe how many financial institutions immediately demand longer timeframes to help get their 'house in order' in preparation for each directive. To the
Part 1 of this topic presented a simple Sudoku solver. By treating Sudoku as an exact cover problem, the algorithm efficiently found solutions to simple Sudoku problems using basic logic. Unfortunately, the simple solver fails when presented with more difficult Sudoku problems. The puzzle on the right was obtained from
In many ways it’s open season for open data; open data is one of those phrases we hear a lot but it’s not always appreciated as having value. The fact that it’s openly available is seen by some as proof that there’s no value in the data – unlike, for
It’s February, so love is in the air (or at least hearts, chocolate, and roses are lining the isles at the grocery store) in the weeks before Valentine’s Day. For the singles in the house, don’t stop here! The stats are in, and according to the http://www.pursuit-of-happiness.org/ , people who have
In this blog series, I am exploring if it’s wise to crowdsource data improvement, and if the power of the crowd can enable organizations to incorporate better enterprise data quality practices. In Part 1, I provided a high-level definition of crowdsourcing and explained that while it can be applied to a wide range of projects
In SAS, the order of variables in a data set is usually unimportant. However, occasionally SAS programmers need to reorder the variables in order to make a special graph or to simplify a computation. Reordering variables in the DATA step is slightly tricky. There are Knowledge Base articles about how
Staying competitive in a big data world means working fast and making decisions even faster. You need to assess conditions, approve access, stop transactions and reroute activities quickly so you can seize opportunities or prevent problems. With increasing data volumes from the Internet of Things (Cisco predicts that fifty billion
North Carolina is one of those lucky states that has a huge variety of scenic destinations, such as mountains, piedmont, coastal plains, beaches, and 'outer banks' islands. We have state parks in all of these areas, but can you guess which state park has been trending the most during the past
I stated in my previous blog about the value and benefits of volunteering that SAS Global Forum is designed to bring users with questions together with users with know-how. This goal is accomplished primarily in breakout and ePoster presentations. During his keynote address at SAS Global Forum 2014, Futurist Thornton
There are companies that have no data quality initiative, and truly do believe that if they see no data problem. In effect, they say that if it does not interfere with day-to-day business, then there is no data quality problem. From what I have seen in my consulting experience, it usually
We asked our partners at the Cornell Center for Hospitality Research to poll the research faculty at the Hotel School to understand their guidance about what to expect in 2015. We were also able to get a preview of what the faculty will be working on in terms of research
Over my last two posts, I suggested that our expectations for data quality morph over the duration of business processes, and it is only at a point that the process has completed that we can demand that all statically-applied data quality rules be observed. However, over the duration of the
I love to teach, but it took several years of teaching before I felt comfortable being in front of a class. And having taught for over 20 years, the fear of presenting in the classroom has passed, but what about presenting at professional meetings or in front of my peers?
A SAS/IML programmer asked a question on a discussion forum, which I paraphrase below: I've written a SAS/IML function that takes several arguments. Some of the arguments have default values. When the module is called, I want to compute some quantity, but I only want to compute it for the
Significant progress in reduction of Cancer mortality is shown in a graph that I noticed recently on the Cancer Network web site. This graph showed the actual and projected cancer mortality by year for males. The graph is shown on the right. The graph plots the projected and actual numbers
Google recently announced that they will be adding Google Fiber high speed network and TV to my area. This was great news, because it will give us more choices ... and a little competition among providers tends to make them all 'try harder' to please the customer. :-) I was curious what other
It's an exciting time for reality! We've been technologically enhancing reality for a long time -- eye glasses, telescopes, binoculars, microscopes, photography, moving pictures, live streaming video over the Internet, etc. But whether it's augmented reality, virtual reality or somewhere in between, a new wave of eye wear technology is
In the latest release of SAS Visual Analytics Designer, a parameter is a variable whose value can be changed and that can be referenced by other report objects. Why is this an important introduction? This addition means that, not only can you design interactive reports via prompt controls, those controls