Are you afraid that if you take a ride in a taxi, you might get "taken for a ride"? If trying to figure out the reasonable price of a taxi is a bit voodoo/black-box to you, here is a SAS data analysis of over 12 million NYC Yellow Cab rides, that will hopefully get you in the right ballpark!
Before we get started, here's a picture of a taxi I saw on my trip to Cuba last fall - I've never been to NYC, but I imagine taxi rides are a little different there! :)
I recently came across an interesting graph posted by reddit user 'badgraphs' that analyzed ~1,000,000 NYC Yellow Cab rides from January 2015. His goal was to estimate the "effective rate for an average yellow cab trip in NYC" ($/mile). Below is a copy of his graph:
I found his graph interesting (mainly because I had no idea that this detailed data from all the NYC Yellow Cabs was available!), and the combination of his graph and his write-up answered many questions about the data. But I wondered if I could create a better graph, that was a little more self-explanatory, and didn't need an accompanying article to help users know what was going on in the graph.
I located the data on the nyc.gov website, downloaded the csv file, and imported it into SAS. There were actually over 12 million rides in the data for January 2015 (whereas the graph above only plots ~1 million rides), and of course I included all 12 million in my graph, since SAS can handle that. I decided to let the data speak for itself rather than using regression lines and such, and I found it useful to color the data by the RateCodeID. The coloring helps explain several of the visual features in the graph.
Showing the cab fare -vs- distance was interesting, but I had a more direct question ... how much do people generally pay for a cab ride? Therefore I rounded all cab rides to the nearest dollar, and created a histogram. Looks like the typical ride in a NYC Yellow Cab is around $9 (good to know, eh!?!)
Inquiring minds also want to know ... how much do people generally tip a cab driver? Luckily, the tip amount was in the data, so I was able to plot the tip amount -vs- the total bill. I noticed some clear trends in this plot, and after annotating some green reference lines, I could see that people tended to tip amounts at even increments of 5% - and it looks like the most common amounts are 20% and 25%. How much will that taxi cost me? #analytics Click To Tweet
Hopefully these graphs provide you with some useful insight into your future taxi rides. And if you have access to data from other cities (or other ride services such as Uber or Lyft), I think it would be very interesting to see graphs of that data, and compare them to these NYC Yellow Cab graphs. Here's a link to the SAS code I used (data, graph1, graph2, graph3), if you'd like to download it to use as a starting point for your own graphs.