“Garbage in, garbage out” is more than a catchphrase – it’s the unfortunate reality in many analytics initiatives. For most analytical applications, the biggest problem lies not in the predictive modeling, but in gathering and preparing data for analysis. When the analytics seems to be underperforming, the problem almost invariably
Uncategorized
If you know me, you know two undeniable things (other than my love for froyo): I consider shopping a sport and I am an Analytics geek. Being an Analytics geek means that I see potential for using data everywhere, and never more than when it’s my data as a customer.
Sustainability is an idea whose time has come. Individuals, organizations and governments are increasingly recognizing that it doesn’t make sense to compromise the future to meet the needs of the present. To that end, the UN recently replaced the Millennium Development Goals (MDGs) with the Sustainable Development Goals (SDGs). It’s
There's a lot of buzz about Hadoop these days. I started checking into it, and there seemed to be a gazillion releases. So, being The Graph Guy, I decided to create a graph to make it a little easier to digest! During my search for Hadoop information, I found the
A few of our clients are exploring the use of a data lake as both a landing pad and a repository for collection of enterprise data sets. However, after probing a little bit about what they expected to do with this data lake, I found that the simple use of
Choosing nutrient-dense foods is only half the battle when it comes to eating right. The other half is making sure you’re getting the most benefits from your great food choices. How do you do that, you ask? The answer is pretty simple. Paying attention to a few common food prep
While discussing ways and means to improve Sales and Operations Planning (S&OP) and forecasting, many a time business executives ask “What can we do with social media?" This was definitely NOT a usual topic in S&OP forum just a few years back! Most of the time, I push back the
Last month I wrote about how to simulate a drunkard's walk in SAS for a drunkard who can move only left or right in one direction. A reader asked whether the problem could be generalized to two dimensions. Yes! This article shows how to simulate a 2-D drunkard's walk, also
Machine learning is moving into the mainstream. Once the sole purview of academic researchers and advanced technology firms, machine learning is now being is used by many companies in more traditional industry verticals. Machine learning uses mathematical (not necessarily statistical) models to learn about data. In this context, learning about
As technology and analytics continue to evolve, we're seeing new opportunities not only in the way that we analyze data, but also in deployment options. More specifically, real-time deployment of analytical algorithms that enable organizations to detect and respond to security threats, offer timely incentives to customers, and mitigate risk by detecting compliance
There's big money in professional sports these days - we're talking billions of dollars! Do you know which teams are the most valuable? The graphs in this blog will show you... I recently saw a bar chart on dadaviz.com showing the world's most valuable sports teams. It was the right kind
Why visualization? Several reasons, actually, the most compelling being that sometimes visualization literally solves the problem for you. I remember an exercise in eighth grade English class where we were asked to describe, in words only, an object set in front of us with sufficient clarity such that our classmates,
Krystian Matusz is what I’d call a super SAS user. He currently holds seven out of the nine credentials SAS offers. SAS Certified Advanced Programmer for SAS 9 SAS Certified Base Programmer for SAS 9 SAS Certified BI Content Developer for SAS 9 SAS Certified Clinical Trials Programmer Using SAS
While I've often written about how to get your SAS data to Microsoft Excel in some automated way, I haven't really addressed what's probably the most frequently used method: copy and paste. SAS Enterprise Guide 7.1 added a nifty little feature that makes copy-and-paste even more useful. The new "Copy
Many people who plan data governance initiatives ignore the need for a business case. "We've already had approval for the project; why do we need a business case when we've got the budget signed off?" The perception is that because they have a strong commitment, there is no need to get
When using SAS to format a number as a percentage, there is a little trick that you need to remember: the width of the formatted value must include room for the decimal point, the percent sign, and the possibility of two parentheses that indicate negative values. The field width must
Right now I’m crossing the Pacific toward Australia and New Zealand for the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (a.k.a. KDD), a Data Science Melbourne MeetUp, and the SAS Users of New Zealand conference. New Zealand is the birthplace of open source R. So this trip
It’s been very hot here in Northern Italy: electricity provision has struggled to keep up and we’ve had frequent power outages in the area, even within our apartment building. A bit inconvenient? Don’t get me started. I feel like my home appliances have turned against me, taking me back to
In my previous post I talked to John Cassara about the growing threat of mobile payments and how mobile phones can be used to launder illicit funds globally. I spoke with him again recently on the topic of financial intelligence. Here are the highlights from our discussion. So what is
I enjoy watching TV crime series like Law and Order, Crime Series Investigation (CSI), CriminalMinds, Numb3rs, Person of Interest, as well as real-life mystery stories on shows like 20/20 and others. Obviously, the popularity of these types of shows means I'm not the only one who enjoys this type of entertainment. Here at SAS,
Last week's post about odds ratio plots in SAS made me think about a similar plot that visualizes the parameter estimates for a regression analysis. The so-called regression coefficient plot is a scatter plot of the estimates for each effect in the model, with lines that indicate the width of
I've heard lots of people quote statistics about marriage & divorce, but the experts don't always agree on what the data means. So I decided to run the data through a SAS graphical analysis, and see what the numbers say ... Before we get into the numbers though, let's have a
Financial institutions have been managing their AML models to meet regulatory expectations for some time. But what about customer risk rating models? We’re seeing a trend where firms are re-evaluating whether their heuristic, rules-based customer risk rating models can withstand regulatory expectations. Rules-based models follow simple analytical formulas, such as,
With apologies to this candy advertisement from the 1980s: "Hey, you got your Lua in my SAS program." "You got your SAS code in my Lua program!" Announcer: "PROC LUA: Two great programming languages that program great together!" What is Lua? It's an embeddable scripting language that is often used
I recently saw a cool graph showing the US import/export trade deficit. But after studying it a bit, I realized I was perceiving it wrong. Follow along in this blog, to find out what the problem was, and how I redesigned the graph to avoid it. I was looking through dadaviz.com
Bigger doesn’t always mean better. And that’s often the case with big data. Your data quality (DQ) problem – no denial, please – often only magnifies when you get bigger data sets. Having more unstructured data adds another level of complexity. The need for data quality on Hadoop is shown by user
Imagine the following scenario. You have many data sets from various sources, such as individual stores or hospitals. You use the SAS DATA step to concatenate the many data sets into a single large data set. You give the big data set to a colleague who will analyze it. Later
In the oil and gas industry, analytics are used to improve both upstream and downstream operations, from optimizing exploration and forecasting production to reducing commodity trading risk and understanding customer's energy needs. If you plan to derive value from the digital oil field, big data, and analytics, one of the first things
In my quest for interesting data to graph, I found some Drug Enforcement Administration (DEA) data on US domestic cannabis eradication. Does the data say anything interesting? Read on to find out! ... While doing some searches for other data, I happened across a table on the DEA website titled