The taxman cometh - for Amazon.com!

Do you order things online, to avoid paying sales tax? Those "good old days" might be coming to an end soon...

Here's a snapshot of my latest purchase from Amazon.com (a little something for the Talk Like a Pirate party I had on Sept 19):

pirate_rings

In the US, each of the 50 states handles sales taxes a little differently, especially when it comes to online purchases. In general, if an online retailer has a physical presence in your state (such as a store or warehouse), then that online retailer must charge you sales tax for your online purchases. And in my state (North Carolina), even if the online retailer does not charge you sales tax, the buyer is supposed to pay a use tax when they file taxes at the end of the year.

As consumers have been buying more online, and less in local stores (for convenience, price, etc), the states have seen a decrease in sales tax revenue. Therefore many states are pressuring online retailers to collect sales taxes for the state - especially the large online retailers like Amazon.com.

Being an Amazon Prime customer myself, I wondered how many states currently force Amazon to collect sales taxes. I did a few searches, and found a nice detailed map in a Wall Street Journal article that showed what I was looking for. But their map was somewhat 'busy' (showing 4 different categories of taxation), and took a while for me to understand. Therefore I decided to create a simplified version using SAS.

In my SAS map, I make the states where Amazon.com has to collect sales tax red (and all other states a light/subdued color). I also added a timestamp, which will become important as the states which do/don't charge tax will likely change in the future.

amazon_sales_tax

 

Do you have to pay sales taxes for your online purchases? What are other countries doing? What's your suggestion on the best/most-equitable way to handle it?

 

Post a Comment

Which cars get the most speeding tickets?

Is the type of car you drive more likely, or less likely, to get a speeding ticket? Let's analyze some data to find out!

Do red cars attract more attention from the police, and get more tickets? How about cars with a 'racing stripe'? Or cars with a big chromed motor, a blower, and side pipes (such as the one in the picture below that I took at a local car show)?  Zoom, zoom!

fast_motor

Of course, cars don't get speeding tickets - people do. But perhaps people who drive fast (and get lots of tickets) tend to drive certain types of cars? A recent CNN article used data for people who had gotten a quote from insurance.com, and listed the Top 20 cars where the highest percentage of people wanting to insure that car had a recent ticket.

I would imagine the insurance quote data is a little biased. For example, the people looking for an insurance quote might be more likely to have tickets than the general population (that might be why they're looking). But nonetheless, the data is 'interesting' so let's go with it!

The CNN article showed each of the top 20 cars on a separate page, which made it time-consuming to see all 20, and also made it difficult to compare them. Therefore I created a simple SAS bar chart to overcome those problems:

most_ticketed_cars

Seeing the 20 cars with the most tickets was interesting, but it made me curious about the 20 cars with the fewest tickets. Therefore I dug up that data, and created a similar bar chart for the fewest tickets. Note that I scaled it the same as the previous chart, so it would be easy to visually compare the two charts:

most_ticketed_cars1

Of course, while I was scraping around to find the data for the above charts, I also got the data for all the cars in between (over 500 different models in all). And with all that data, I had to try to visualize it all at once! I created a scatter plot, with the data grouped by make along the vertical axis (similar to the bar chart layout), and sorted the makes by their average number of tickets.

most_ticketed_cars2

 You can click any of the graphs above to see the interactive version, with html hover-text, and drilldowns that do a Google search for images of that vehicle!

Did any of the cars in the best and worst 20 surprise you? Do you own one of those cars, and can you confirm whether or not you have speeding ticket(s)? What other factors do you think influence your probability of getting a speeding ticket? Do you have any 'tricks' for not getting speeding tickets, that you'd like to share?

 

Post a Comment

Analysis of credit scores, and automobile loans

Have you heard the old saying that "Banks only loan money to people who don't need it"?  Let's analyze the data and see if that is true!...

I'm very much a car-guy, and I love learning about all the new vehicles, and love the new-car feel ... and even the smell.  It's hard to not like a nicely detailed sporty vehicle. For example, here's a picture of the Miata a co-worker (and fellow car enthusiast) recently bought. Looks really nice sitting there on the Blue Ridge Parkway, doesn't it!

jims_miata2

 ... and with the price of vehicles these days, most people need a loan to buy one. Speaking of car loans, I recently saw a very interesting article by Liberty Street Economics where they showed how much $ in car loans was made, grouped by credit score. I found the raw data, downloaded it, and created my own SAS version of the graph. I kept mine very similar to their original, but cleaned up the time axis a little (only showing the year at each tick mark), stacked the color legend values, and included markers on the lines (which I think provides a little more visual insight into how fast the data is changing, etc).

auto_loan_originations

As you can see in the graph, subprime lending (to people with lower credit scores) took the biggest hit during the recent recession, but is currently making a comeback.

Later in the article, they show the same graph split into 2 categories - auto finance companies, and banks & credit unions. The auto finance companies tend to cater towards the subprime lending more than the banks & credit unions. Rather than scaling them both to the same axis of the first graph ($30 billion), I let each of these auto-scale to show the spread of the data in my SAS version.

auto_loan_originations1

auto_loan_originations2

And, I guess in answer to the original question, it appears that banks do loan money to people who need it (ie, people who have low credit scores) - close to $6 billion this year. But they loan a lot more money to people with higher credit scores.

Anybody got any inside-insight into this data, or ideas about other ways to graph this data? - Feel free to share it in a comment!

 

Post a Comment

Help! Why does the WHERE clause choke on the INPUT function?

A student brought in this coding problem after her manager was struggling with this issue for a while. They played guessing games, but to no avail. Here’s what happened when they submitted data step and proc sql code using a WHERE clause with an INPUT function?

 data aileen;
length hcn $10.;
input prov $ hcn $;
datalines;
BC 9999999698 
AB 612345800 
99 1 
CA V79999915 
QC NIGS999996 
ON 0 
ON 9876543210 
;
run;
 
 
NOTE: The data set WORK.AILEEN has 7 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.09 seconds
 
 
data dswarn;
set aileen;
where input(hcn,10.)>=1000000000; 
run;
 
WARNING: INPUT function reported 'WARNING: Illegal first argument to function' while processing
         WHERE clause.
WARNING: INPUT function reported 'WARNING: Illegal first argument to function' while processing
         WHERE clause.
NOTE: There were 2 observations read from the data set WORK.AILEEN.
      WHERE INPUT(hcn, 10.)>=1000000000;
NOTE: The data set WORK.DSWARN has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
 
 
proc sql;
create table sqlwarn as
select * from aileen
where input(hcn,10.)>=1000000000;
quit;
 
WARNING: INPUT function reported 'WARNING: Illegal first argument to function' while processing
         WHERE clause.
WARNING: INPUT function reported 'WARNING: Illegal first argument to function' while processing
         WHERE clause.
NOTE: Table WORK.SQLWARN created, with 2 rows and 2 columns.
 
172  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Their take:

The data step and sql procedure both generated this warning twice. (I think it is the 2 HCNs with leading characters that generated the warnings)

The Solution:

They are on the right track. The WHERE clause errors out when it finds an invalid value being passed through the INPUT function. Two records have leading character data and are being passed into the INPUT function to get converted to a numeric value. The INPUT function fails to convert the leading character data to numeric & hence the error. I fixed it by using the “?” format modifier on the INPUT or PUT function. This time SAS doesn’t articulate the choke on the INPUT function resulting in a clean log. The output dataset results are still the same. The 2 leading character observations will be stored as a missing value. The big takeaway? Using the “?” format modifier just ensures a clean log.

data warn;
set aileen;
where input(hcn,?10.)>=1000000000; 
run;
 
 
NOTE: There were 2 observations read from the data set WORK.AILEEN.
      WHERE INPUT(hcn, 10., '?')>=1000000000;
NOTE: The data set WORK.DSWARN has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.06 seconds
 
 
proc sql;
create table sqlwarn as
select * from aileen
where input(hcn,?10.)>=1000000000;
quit;
 
NOTE: Table WORK.SQLWARN created, with 2 rows and 2 columns.
 
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Resources:

I’d love to claim I came up with the solution. However, it was the encyclopedic support.sas.com to the rescue. Here’s where I learned about this error and how to fix it.

Isn’t that pretty amazing? I’m always pleasantly surprised by how much knowledge is available on support.sas.com and it’s all free!!

Post a Comment

Just say no (not only) to OLS

Zubin Dowlaty

Zubin Dowlaty

This guest post was written by Zubin Dowlaty. He has 20+ years’ experience in the business intelligence and analytics space. At Mu Sigma, he works closely with Fortune 500 companies counseling them on how to institutionalize data-driven decision-making. Zubin is focusing his efforts managing an agenda of rapidly implementing innovative analytics technology and statistical techniques into the Mu Sigma ecosystem. 

There is an old adage that goes, “don’t put all your eggs in one basket” and for those that like financial advice, the only free lunch is diversification.  The spirit of these ideas emphasizes risk reduction with a behavior change. The idea of minimizing risk to interpret and generalize analytical models is not new. Reducing the risk of over-fitting and methods such as bootstrapping are utilized to reduce risk. However, the idea of diversification tends to be underutilized in analytics workflow.

In the big data space, we are witnessing a trend towards NoSQL technologies, where we have multiple tools and frameworks to access data at our disposal. For the data analyst, prepping data for modeling consumes a tremendous amount of time. Anecdotal estimates are usually between 60%-80% of the data scientist’s time allocated to preparing the ‘model' ready data set.

Why then, when we complete the data prep, most analysts will estimate OLS regression models and stop? If that’s not the case, then only one modeling technique will be selected. The analyst will then interpret, refine the model and present results. At least in corporate America there is a clear bias towards running only one technique. This one model bias clearly goes against the spirit of minimizing risk by utilizing a portfolio approach. Ensembles, the technical term for running multiple models, should be the default method not the former.

Design thinking and design principles are beginning to be taught in major graduate business schools. One of the major principles of design thinking is prototyping and ideation. Furthermore, the ensemble approach leads to the champion model of a natural ‘po’ concept, which translates to a provocation. Design concepts are also in alignment with the ensemble approach, especially for measurement and forecasting use cases.  One should explore and provoke the ‘champion’ model.

Let’s say you have selected an OLS model to be your champion. One should challenge this model by running a portfolio, or ensemble of methods to improve insight and generalization. Given the various assumptions around robustness, functional form, structural change within our data, it can be very risky to estimate one model. The idea is not to run various models like in the traditional ensemble sense, but rather to aggregate the various models in some form to get a better predictor. Again, it would be ideal to run an ensemble, leveraging the advantages of robustness, variable importance, and functional form in other techniques, in order to improve your dominant champion model, not replace it.

With today’s computational resources, the marginal time and cost of running many models are near zero – there is no longer any excuse. The bottom line, it’s a mindset change.

Join me in a discussion of these ideas at the Analytics 2014 conference.  We will review over a dozen models used in a business use case, in order to harden the champion model.  We will demonstrate how the ensemble approach to improving your champion model can significantly improve interpretability as well as trust in your model outcomes.

Post a Comment

IMDb ratings for "The Big Bang Theory"

In my previous blog, we visualized how many people viewed each episode of The Big Bang Theory TV series. Now let's analyze how well people liked each episode...

As a starting point, I looked around to see if anyone else already had a plot of the episode ratings. I found a plot, but it didn't give me as much information as I wanted. So I downloaded the IMDb data, imported it into SAS, and started working on a new/improved version of the plot.

I decided to create 2 plots - one with the axis going from 0-10 (to see the "big picture"), and then a second plot 'zoomed-in' to better see the more subtle changes in ratings. Below are snapshots of my two graphs - click them to see the full-size interactive versions. My interactive graphs have html hover-text and drill down for each marker - I think this adds a huge amount of value to the graph, over the original graph!

big_bang_theory_ratings

big_bang_theory_ratings1

So although the number of viewers has gone up by several million people (as shown in the graph in my previous blog), the ratings have generally been going down slightly. What's your theory on the reason for that, and what do you think about the few outlier episodes that were higher/lower than the rest?

Post a Comment

SAS on "The Big Bang Theory" TV series!

If you're a SAS user, chances are you're a bit of a science/technology/engineering/math nerd -- and also a fan of The Big Bang Theory. Therefore this SAS analysis on The Big Bang Theory should be right up your alley!

Yesterday (September 22) was the start of the 8th season for the TV series, and I saw a graph representing the number of viewers for each episode of the first 7 seasons. It was eye-catching and interesting, and showed a general upward trend in viewers. But I immediately noticed several things I'd do differently with a SAS graph...

So I found the raw data on Wikipedia, and imported it into SAS and started working on my own graph. Here's what I came up with (click the snapshot of the graph below to see the full-size interactive version):

big_bang_theory

The original graph used an 'area under the curve' graph, but I used a bar chart instead because showing angled lines between episodes is somewhat misleading (each episode is a discrete event, not a snapshot of a continuous event). And by using discrete bars, I was able to add html hover-text with the episode number & title for each bar.

The original graph only showed response axis numbers up to 20 million, and it was therefore not obvious that some of the episodes in Season 7 had more than 20 million viewers. I show the full axis in my graph, and add reference lines so you can easily see that some episodes had over 20 million viewers.

I also added a title to my graph, so you'd know what it represents! :)

So, what's your theory on why The Big Bang Theory is so popular?

Post a Comment

Chris Hemedinger reveals all the secrets to custom tasks in SAS EG

To thousands of end users around the world, SAS® Enterprise Guide is a productivity tool for data management, analytics, reporting and SAS programming.  As many features as SAS Enterprise Guide offers, these still represent just a fraction of what people can do with SAS.

That's why a growing number of people see SAS Enterprise Guide as something else: a platform for custom features.  In this era of "apps" that accomplish focus tasks, SAS developers are looking for a way to distribute their ideas for custom features to the SAS users that they support.  And SAS custom tasks – which can work in SAS Enterprise Guide and the SAS Add-In for Microsoft Office – offer a robust method for doing just that.

For someone who is new to application development, the idea of building a SAS custom task can be intimidating.  After all, it requires not just SAS programming skills, but also the use of object-oriented programming in Microsoft Visual Studio.  You can choose your programming language (C# or Visual Basic), but you still need to step outside the traditional SAS application framework.

SAS_chris1

The benefits are worth it.  SAS custom tasks offer almost limitless options for the user interface experience, and you have tremendous flexibility for capturing your custom business logic to help your end user to do more with SAS.

SAS_chris2

Getting started doesn't have to be a scary experience – all you need is some guidance.  And fortunately, we've got that "know-how" ready to deliver in two forms: a SAS Press book, and a brand-new SAS Business Knowledge Series course!

The book is Custom Tasks for SAS Enterprise Guide Using Microsoft .NET, by Chris Hemedinger.  In the book, Chris introduces you to the tools you need and outlines the best practices for building good, solid custom tasks.  He also provides templates and dozens of examples to get you started.

Alan Churchill from Savian had this to say, “The meat of the book is how to create custom tasks in .NET for SAS. This is where it shines. Hemedinger lays out what interfaces are needed, how to build the .NET project, tips on debugging, and lots of sample code. For a beginner in this area, he walks them through the process. For a more experienced .NET developer, Hemedinger saves a lot of time by pointing out exactly what interfaces should be used; he then provides that all important code sample for making it happen. A great book and one I will use again and again."

In his new course, Developing Custom Tasks for SAS Enterprise Guide, Chris brings that knowledge into the classroom for a two-day hands-on experience.  You'll learn how to use the APIs and tools to build new tasks, examine lots of examples and even build your own task from scratch!  With Chris as your instructor, you'll be learning from someone who has built more tasks than he can count…and who even helped to develop the custom task framework that you'll be using.  Plus…you'll receive a free copy of Chris' book!

So, relax and let Chris be your guide to extend or customize your SAS Enterprise Guide environment to fit your needs and that of your industry or business.

Post a Comment

If You Don’t Know Where You Are Going, Any Road Will Take You There!

Dr. Jay Liebowitz

Dr. Jay Liebowitz

This guest post was written by Dr. Jay Liebowitz, DiSanto Visiting Chair in Applied Business and Finance at Harrisburg University of Science and Technology. He is also the author of several books including, “Big Data and Business Analytics,” Business Analytics: An Introduction,” and “Bursting the Big Data Bubble.”

Next month, Liebowitz will be a keynote presenter at the Analytics 2014 conference in Las Vegas on Oct. 20. His keynote, “Analytics + Intuition = Success!” will focus on the interplay between analytics and intuition in terms of executive decision making.

As we look at the growing field of analytics, it’s pretty clear that they can provide the signposts to help organizations gauge how well they are doing.  In speaking about signposts, this reminds me of one of my pet peeves over the years--that is, lack of proper signage, especially at airports.   Nothing can be more frustrating than not finding your way to and within the airport.  Case in point:  Traveling from Montreal to the Burlington, Vermont airport, a sign says “New York or Vermont”.  If you follow the Vermont sign, which would be the most natural choice, it takes you all around the state of Vermont and you’ll never make your flight out of Burlington (I know, it is the only flight I have ever missed in all my years of flying).  It is even more annoying to follow signs at the airport that are blurred.  That’s right, according to Alice Rawsthorn’s October 21, 2012 New York Times article, “Designers of the Signs that Guide You,” the new signs in the Vienna Airport  (see recent photo) are intentionally blurred.  This can be troublesome for those who might have jet-lag and haven’t slept well on the plane, aside from those who are vision impaired.

Airport signage needs to be clear (pun intended), concise, and minimize customer dissatisfaction.  Some airports are getting better with their signage.  Instead of saying “Arrivals” and “Departures”, some say “Ticketing/Check-in” and “Passenger Pick-up”.  However, some airports like Brussels, are still using Helvetica type which is one of the poorest fonts for readability.  And, with car rental return signs, drivers are still being confused at such airports in Orlando and Florence, Italy, according to the blogs and newspaper accounts.

In the same manner, analytics need to convey and capture the right measures, such as Key Performance Indicators in the organization’s executive dashboards.  They need to not only report on what has happened (descriptive analytics), but also what will happen (predictive analytics) and ultimately what are the optimal conditions (prescriptive analytics for optimization).

We can learn what still needs to be done in analytics by looking at airport signage to increase customer satisfaction?   First, don’t get fancy with the airport signage—people want to be able to recognize the signs quickly (whether driving or catching flights in the airport).  The signs need to communicate the intent--clearly (both visually and content-wise).  In much the same way, analytics should use the KISS philosophy (Keep It Simple, Stupid) and provide the appropriate messages and signals.  Second, design signs with the lowest common denominator in mind.  That is, as international and domestic visitors travel throughout airports, include universal symbols, colors, and verbiage so that the typical traveler can understand.  Analytics can also use this guidance in terms of their respective end users.  Last, continue to embed an analytics culture throughout the organization, in the same way that airport signs should also be intuitive.

Similar to using analytics for improving the business user’s experience, I am also trying to suggest ways to improve the airport signage for the average traveler.  And, it’s not just in the airline industry, it applies across other transportation industries as well.  For example,  I noticed that there was an electronic sign at the front of each Amtrak car that stated “Exit” and whether the rest room was occupied.  Why couldn’t the sign also have the train stop at each embarkation?

Call me crazy, but I’m still “waiting for a sign”!

Post a Comment

Shocking data about your electricity rate!

Did you know that different states charge different $$ rates for electricity? The graphs in this blog will let you easily compare your rate to the rates in other states ...

Did you have a portable radio (aka "boombox") back in the 1980s? Do you remember how much it cost to buy batteries for it? Here's a picture of my latest vintage boombox (given to me by my good friend Reggie).  It consumes quite a bit of power, and uses 10 D-cell batteries (can you guess exactly what make & model it is? - leave your guess in a comment!)

boombox

Of course, rather than buying batteries, it's a lot cheaper to plug it into a wall outlet. Even then, electricity isn't free. But just how much does your electricity cost, and are other people getting a 'better deal' than you?

I found a table on the Web that lists how much the utility companies in each state charge for a kilowatthour of power. It was interesting to see that there was quite a difference from state to state. Here's a screen-capture of part of the table:

power_rates

 It's great to have a table of the data! And a table is fine if you just want to look up one or two values. But it sure is difficult to see the 'big picture' in a table, and compare the values of all the states. So, of course, I imported the data into SAS, and created some graphs!

First a map, and then a bar chart (sorted from highest to lowest cost). You can click on the images below to see all my electric rate maps and graphs, with html hover-text to easily see the exact values:

electricity

electricity5

How well did your state do? Do you have any theories as to why electricity costs vary by state? Leave a comment with your thoughts!

Post a Comment