If you write a blog, you deal with spam comments. That's just part of the deal. Spammers are forever inventing new and creative methods for "tricking" you into accepting their spam comments. These comments have nothing to do with your blog topic but do contain trackback links to their own irrelevant websites. Apparently, adding more links translates to higher ranking on search engines. At least, that's the hope of the spammer.
We've had spammers come to visit ever since SAS blogs began in 2007. As blog authors we moderate each incoming comment, so a spam comment cannot appear on a blog unless an author is duped into approving it. Sometimes spam comments look a lot like legitimate comments, which means that each comment requires some scrutiny.
Evaluating spam comments adds a bit of overhead to a blogger's workload, but until recently this was just a small bit of noise that was mixed in with all of the high-quality blog content and comments that we enjoy with our community of readers. (SAS blogs have accumulated almost 10,000 legitimate comments from engaged readers.)
Then, during November 2012, the spam activity spiked -- at least, that's the way it initially seemed to me. The volume of spam content just felt different to me, but because I was in the habit of marking such comments as "spam" immediately, it was difficult to know for certain how much increased spam volume I was experiencing. Since I'm just one author out of dozens of active blog authors at SAS, I wondered whether others were having the same experience.
Well, I did not have to wonder for very long. We have data: our SAS blogs are hosted on a WordPress platform, which uses a MySQL database to store content and activity. And we have SAS. Using the two together, I produced this:
The red dots indicate the number of spam comments received in a weekly period, and the gray line shows the overall trend. This chart shows clearly that at the end of 2012, the spammers really started going to town.
We know about commercial anti-spam solutions that can be applied to this problem, but we had to ask our IT staff (who supports our blogging platform along with all of our website infrastructure) for a slice of their time to help us.
At SAS, we are lucky to have a tremendously responsive IT team. When we have a business problem that IT can help to solve, they are always happy to partner with us. But they are very busy with lots of projects, all of which compete for resources based on various priorities. As bloggers, we were able to say, "we are receiving a lot more spam than we used to." But what impact does that have, and how should we prioritize the remedy?
I created a report that includes the chart shown earlier plus the one below. You can see the pace at which spam arrives, with sometimes 200 spam messages per day, or over 1000 per week. Even if each spam message requires just a few seconds of attention, the cost (in time) adds up quickly.
I wish I could say that these reports resulted in a "drop everything! solve this!" reaction from IT. Alas, as important as blogs and bloggers are, the SAS IT staff still had bigger fish to fry on other projects. But these reports did provide unambiguous, objective evidence of my hunch: spam was a serious problem and a productivity issue for SAS bloggers. And everyone agreed that something had to be done.
Today, thanks to our IT staff, we have an effective spam solution in place. The spam noise has been reduced down to a slight hum. My historical spam reports are completely broken because spam messages are automatically categorized and deleted. But that's okay -- the reports have served their purpose.
By the way, if you are curious about which blogs seem to attract the most spam, you're reading one right now. The SAS Dummy and The DO Loop both top the list for the most spam accumulated, even though neither of these blogs contains the highest number of articles.
At Rick Wicklin's suggestion I also created a plot of blog-post-volume versus spam-comment-volume. Here's the result, which shows that even though SAS Voices has the highest number of posts, it hasn't attracted nearly as much spam as some of the other blogs. I guess Rick and I each cover topics that the spammers love:
I even used a technique that Rick shared for labeling only certain points within the plot. Wow, these blogs are really useful after all!
Pingback: How to find the ENGINE name for a SAS library - The SAS Dummy
Pingback: 40-somethings love fancy sushi, and other visual analytics discoveries
Pingback: Blog spam: ain’t nobody got time for that - The SAS Dummy