Skip to content

Blocking WordPress Comment Spam

Back in January I tried renaming the wp-comments-post.php file to avoid comment spammers. That worked for about 10 hours, then they started using the new file name. So I switched back to the default filename. Like I said back then “So unless you change the comment post filename regularly, it doesn’t do much good.”

Well, duh, how about if I change the filename regularly? Over the last week I’ve been experimenting on a couple of my blogs. I manually changed the filename about once a day. The new filename got picked up and used, although there were still a lot of hits to wp-comments-post.php. Any ip address that attempts to “POST” to a non-existent wp-comments-post.php file should be firewalled.

I started wondering about the possibility of (1) changing the filename for every request; and (2) preventing spammers from storing that filename. So I’ve come up with the code to change the filename on every request. Here’s how I am currently doing it. Each request makes a call to the user’s ip address.php (e.g. 1.2.3.4.php):

1. Rename your wp-comments-post.php file to something random-ish. This new filename will never be visible to the public. This is called security by obscurity.

mv wp-comments-post.php roses-are-red.php

2. Create a new directory, accessible under your blog directory. You can call it anything you like.

mkdir kittens

3. Change to that directory

cd kittens

4. Create a .htaccess file

vi .htaccess

Put these two lines into it:

RewriteEngine on
RewriteRule ^.*$ /roses-are-red.php

The filename at the end of line 2 should be the same filename you used in step 1 above. What these commands do is any request to any filename in the kittens directory, will actually be calling the renamed wp-comments-post.php file.

5. Edit your template’s comments.php file. This will be in (your blog directory)/wp-content/themes/(theme name). Look for the line that sets up the form to the comment submission page. In the default Kubrick style, this is on line 72. Comment that line out by adding <!−− before it and −−> after it:

<!--<form action="<?php echo get_option('siteurl'); ?>/wp-comments-post.php" method="post" id="commentform">-->

You comment this out so that if the spammers’ spiders are looking for the post page, they’ll find it, and not the “real” post page. Then add these lines after the commented line:

<form action="<?php
$ip = $_SERVER['REMOTE_ADDR'];
echo get_option('siteurl'); echo "/responses/".$ip; echo ".php"; ?>" method="post" id="commentform">

And now if a comment spammer spiders my site and later tries to send spam through the comment submission page, all I have to do is check to see if the IP address matches the filename. If they don’t match, someone is storing the comment submission page URL and trying to spam through it.

So for example, this line was in my log file this morning:

192.107.152.61 - - [02/Apr/2007:07:00:16 -0400] "POST /kittens/72.36.205.226.php HTTP/1.1"
   302 - "http://www.example.com/2007/04/01/exampleurl/" "Mozilla/4.0 (compatible; MSIE 5.5; Windows
   NT 5.0; H010818; InfoPath.1)"

Note the request came from 192.107.152.61, but the comment was submitted to 72.36.205.226.php. So when I grep through the log for the ip address “72.36.205.226” I find this line:

72.36.205.226 - - [02/Apr/2007:06:52:22 -0400] "GET /2007/04/01/exampleurl/ HTTP/1.0" 200
   16942 "-" "topicblogs/0.9"

Googling topicblogs shows lots of references that topicblogs may be a spammer. Well, there’s the proof.

The commands in step 5 above could very easily be tweaked to include whatever information you want to store. I started out by creating an MD5 hash, but decided I would start out easy and work up to a more complicated tracking system.

I also tried to create this as a WordPress plugin, but it looks like there isn’t a system call for the filename of the wp-comments-post.php file.

If a user is using some kind of a proxy to surf the web, it is possible that they may be caught by this. Their original request would generate an ip-address.php submission page, but in the few minutes it would take to enter their comment, their proxy system may change their IP address. So their comment would come from a different IP address.

“Human” Comment Spammer

Wow, I’ve just received my first batch of comment spam from a real “person” (if spammers are to be considered people.) It’s all from the IP address of 121.35.151.27, which is in China. The first hit was this morning at 2:03am. “121.35.151.27 – – [13/Mar/2007:02:03:31 -0500] “GET /journal/2007/02/08/powered-by-wordpress-directory/ HTTP/1.1” 200 5080 “http://www.google.com/search?q=powered+by+wordpress&hl=en&newwindow=1&start=20&sa=N” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”

So, he started out by googling for “Powered by WordPress.” Then he actually visited my web site (the home page, another blog entry, my jokes listing, a few jokes, back to the blog, submitted comments on several entries, then left at 2:48. So, he spent 45 minutes on my site, for no real purpose. I guess this is the next wave of things to fight against.

This clickstream would have defeated Bad Behavior. And Akismet missed all of the comments as well. A human would have gotten past a CAPTCHA or math question. Hmmm, adding all of ChinaNet to the firewall? That would only work until the spammers use infected Windows machines as proxies, so it looks like they are surfing from the US.

Using Tagged Email Addresses for Fun and Profit

Pascal van Hecke recently caught some spam that was sent to a unique email address he gave to Performancing.com. Read the details. One problem is he used a very simple tagged address that could be easily guessed. For example, I would guess that his MyBlogLog email address is mybloglog.com[at]vanhecke.info. So what’s to stop the spammers out there from bruteforcing popular domains at other domains? For example, amazon.com@whatever.

That’s why I use a bash script to create unique email addresses when I register at a new site. The script uses the MD5 function to create a unique 32 character email address. Then it adds the address to my mail server’s virtusertable file. Here’s the script:

#!/bin/bash
domain=example.com
password=pick-a-strong-password
ts=`date`
echo $ts
echo $password
echo $1
echo $password $1 | md5sum –
echo $password $1 $ts >> listing.txt
echo `echo $password $1 | md5sum – | cut -c 1-32`@$domain
echo `echo $password $1 | md5sum – | cut -c 1-32`@$domain >> listing.txt
echo >> listing.txt
echo `echo $password $1 | md5sum – | cut -c 1-32`@$domain pm-list >> virtusertable-list.txt

This is much safer than simply using “their domain name”@”your domain name.” See Bruce Schneie’s Crypto-Gram Newsletter for May 15, 2003 Unique E-mail Addresses and Spam for similar thoughts.

216,031 pieces of spam collected at Gmail

In the last 3 weeks, my spam-only domain at Gmail has collected 216,031 pieces of spam. I’m interested in selling the domain. If you’re interested, email me privately.

Google sending email to webmasters?

In Better badware notifications for webmasters, Google mentions they may send email to generic addresses (like webmaster@ and admin@) to help alert webmasters that their site has “bad” stuff on it.

Please don’t send alerts to “generic” addresses. Those get only spam for me. It would be great to let us webmasters to set up an address for notifications.

And why in the world does Google make it difficult for people to leave a comment on their blog? You have to be a registered user of Blogger to do that. And their trackback mechanism is not intuitive. They want you to again sign into your Blogger account, who knows what happens after that. You know, I would think that Google would be aware that there are other services out on the Intertubes that aren’t affiliated with Google.

Powered by WordPress Directory

Fitting into the “Doh! Why didn’t I think of that?” category, Powered By WordPress – PoweredbyWP.com launched recently. It’s a directory of the zillions of blogs and web sites that are operated by the WordPress CMS (Content Management System). I’ve started submitting my blogs to it, hopefully they will be approved.

Web 2.0 Explained in Under Five Minutes

A very cool video explaining how and why the Web is so important.

Adding An Advertisement After the First Post on a WordPress Blog

On one of my other sites we run advertising, a banner ad. Because I haven’t had time until today to figure out how to get the ad to appear after the first posting on a page, we’ve been putting the ad on every page of the site, except the home page. The include code is in the footer of the page, so the ad ended up down at the bottom of the page. Not really an optimal place for visibility. So I’ve been digging through the WordPress docs and found the way to do it.

I tried adding an if statement inside the loop, using next_post_link(). But that variable is the same for the entire page. So then I looked for a “first_post” variable, but there isn’t one. So I created one.

In index.php in my template, I added this code:

<?php $ppl=$ppl+1;
if ($ppl==1) include("ad-block.html"); ?>

just after the <p class=”postmetadata”> section, and before the <?php endwhile; ?> section. So now when the $ppl variable is equal to one, the ad-block.html file is included, at other times, nothing is. You could easily use this to have a different text (or included text) show up after different posts. For example, after every second post. Or a different piece of text for the 4th posting on a page.

You can see this on my DC Area Theater information blog, ShowBizRadio.net.

Google Appliance Mail Update 2

Wow. Seeing is believing. After nearly three days, my catchall account in the Google Domain Appliancehas caught 67,930 messages; and has 3,185 messages in the Inbox. The amazing stat is at the bottom of the mailbox: “You are currently using 864 MB (42%) of your 2048 MB.” Looking through the messages, it looks like the attachments on the spam are using a giant hunk of the space. It looks like I’ll need to clean out the Spam box much sooner than 30 days, or mail coming in will start to bounce.

I wonder if there’s an easy way to chart the mail volume coming into a Gmail account? I could download the mail with POP, but that seems like a waste, and an uphill battle.I thought about turning this mailbox into an automagically updating WordPress blog, but I’m not sure why. Would making all this data be useful to anyone? I see mostly true non-existent user error messages, but there are a few vacation messages, spam complaints, and other backscatter. Thoughts?

If I tag the Inbox messages as spam, will that affect other users of Google’s mail appliances, or just this one account? I don’t want to mess up any one else’s email.

(Now up to 68,123 spam, 3,187 Inbox, 866MB used)

Google Appliance Mail Update 1

I just logged into the catchall account for the domain I added to Google’s Web appliance. Spam to the old box in my office has slowed to a trickle (only 61 messages last hour, down from 1,237 for the same hour yesterday). In the Google Web Appliance, there are 2,596 spam messages caught, 12 missed that ended up in the Inbox. In 3.4 hours, that’s a rate of 649 spam per hour, 12.7 per minute, or roughly one spam every 5 seconds.

I’m using 10MB of space. So far. So extrapolate that out, I’ll be at 2,160MB in 30 days. The size limit appears to be 2,048MB. So I’ve got a shot to fill the box up.