If you do much traveling in the United States, you're bound to hear a few words and expressions that are unique to certain areas. Well y'all get ready, because I'm fixin' to analyze some of those words for ya!
I recently found a really neat web application called The Great American Word Mapper that lets you enter words, and see maps of where those words were used most frequently in Twitter posts. It's pretty cool, and almost addicting! Here's an example showing the two words I found most interesting - yall and yinz:
And as with any cool map, I felt compelled to try to create a similar one with SAS software!
Fortunately, they provided a link to their data on a Google drive, which made my endeavor a lot easier. They provided a separate csv file for each letter of the alphabet, and each level of smoothing (none, low, med, and high). Since yall and yinz both start with the same letter, I only needed the 'y' data files, and I decided to go with the 'medium' smoothed data, since those maps looked the best to me. I used the DMS SAS File->Import wizard, which wrote me a bit of Proc Import code that imported the data quickly & easily. I was then able to plot the data fairly easily using Proc Gmap. Here are my SAS maps showing the smoothed data for yall and yinz:
Here are a few changes (hopefully improvements) in my version of the maps:
- I added titles/text to explain more about what is represented in the map, and where the data came from.
- I made the state outlines darker, and left out the county outlines, shifting the focus from counties (which most people aren't familiar with) to states (which most people are familiar with).
- I leave out the city labels, because they obscure parts of the map and I think the state outlines suffice.
- I added html mouse-over text to show the state names (click the map image snapshots above, to see the interactive versions with the mouse-over text).
I liked the original maps, but I like my versions even better! The 'yall' map showed just about what I expected - common usage throughout the southeast, with the exception of Florida (where a lot of retirees from up north live). The 'yinz' map showed a high concentration in the western half of Pennsylvania, which is correct (according to Wikipedia and my friend who grew up in that area). But I was a bit curious about a second yinz concentration encompassing several counties located along the border of North Carolina & Virginia. I've never really heard the word yinz used in that area, so I was a bit skeptical. So I decided to dig a little deeper...
Any time smoothing is used, there is a possibility it will distort the true nature of the data. Therefore I decided to plot the unsmoothed yinz data, to see if it might shed some additional light on this odd NC/VA concentration. As I suspected, the raw data map showed that it was really only a couple of counties (Orange and Person) that had the high number of Twitter posts containing the word yinz. So in this particular case, the smoothing exaggerated the NC/VA yinz hotspot quite a bit, and it's probably better to use the unsmoothed data. (Which reinforces my suggestion to always plot your data in several different ways!)
And now for a fun example ... My friend Margie is a bit of a local legend. Her passion is to create clever signs to hold up during sporting events (especially ice hockey) - they are frequently shown on the jumbo screen, and sometimes even on television. She's earned the nickname Clever Sign Chick, which she wears proudly. She spent her early childhood in western Pennsylvania, and therefore she's familiar with their local words such as 'yinz'. So when the Pittsburgh team came down to play our NC team, she greeted them with the following sign (all in good fun, of course!) ... which is an example of a perfect combination of the slang words yinz (commonly used around Pittsburgh), and ain't (commonly used in NC).
What special slang words are unique to your area? Feel free to share in the comments!