Comment spammers are definitely getting trickier, as noticed by Mark Ghosh in his article Comment Spam with more Kung Fu?. I’m starting to notice that the comment spammers are now starting to simply copy and paste an existing comment and submit as their own comment.
My guidelines for identifying comment spam that doesn’t get caught by Akismet:
- If the “name” is not a first and last name, the comment is probably spam. It will at least get a closer look.
- If the “name” is not a name at all (furniture, travel deals, SEO-anything) the comment is spam.
- If the “email” is from China or Russia, the comment is spam.
- If the “email” looks fake, the comment is spam.
- If the site at the “URL” is not in English, the comment is spam.
- If the site at the “URL” feels spammy (an entirely subjective opinion), the comment is spam.
- If the comment itself has links to the poster’s web site, like an email signature, that signature will get removed from the comment. Plus if the signature URL is different than the “URL” field, the comment is spam.
If the comment passes all of these items, then I check my web server’s log files to find where the user came from. Look at this spammer:
75.101.138.119 – – [22/Sep/2008:21:35:36 -0400] “GET /2008/06/26/helo-bot-hostname/ HTTP/1.1” 200 23602 “-” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
75.101.138.119 – – [22/Sep/2008:21:35:39 -0400] “POST /wp-comments-post.php HTTP/1.1” 302 1 “http://www.planetmike.com/2008/06/26/helo-bot-hostname/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
75.101.138.119 – – [22/Sep/2008:21:35:43 -0400] “GET /2008/06/26/helo-bot-hostname/#comment-15818 HTTP/1.1” 404 19992 “http://www.planetmike.com/2008/06/26/helo-bot-hostname/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”
They’re coming from Amazon’s Web Services (75.101.138.119 is ec2-75-101-138-119.compute-1.amazonaws.com). Hmmm, weird. They originally came to my site from nowhere, ie a bookmark or typing my URL into their IE 6 web browser. It only took them three seconds to read the page, enter their comment, and submit their comment. Not likely. This comment was spam.
After looking at the server log, if I’m still not sure if its a real comment, I generally, but not always, will approve the message but may remove the URL so there isn’t a link from the comment.