The logs of my (drupal powered) website show a lot of referer spam. Some time ago I had this statistics page which contained a listing of the last 10 pages my site's visitors came from (aka referers). Soon spambots found out and spammed this list. I made the list invisible for anonymous visitors, but nevertheless spambots stil target my site (less frequent than when the list was visible, however), polute my stats, use bandwidth, use processing power and kill those cute little puppies. Now I went a bit further to block those dirty spambots ...

There are some drupal modules concerning different sorts of spam, but I found another solution that blocks the spambots before drupal kicks in to generate webpages. The trick is using the .htaccess file to tweak the Apache http server's behavoir. I added the following lines to drupal's .htaccess file (inside the mod_rewrite block):

RewriteCond %{HTTP_REFERER} (poker) [NC,OR]
RewriteCond %{HTTP_REFERER} (viagra) [NC,OR]
RewriteCond %{HTTP_REFERER} (casino) [NC]
RewriteRule .* - [F]

What this means: if the HTTP referer contains 'viagra', 'poker' or 'casino' (typical words in spam referers), the webserver answers with "forbidden" (HTTP response 403). The NC makes the patterns case insensitive ( n o c ase), the OR is the glue between the different conditions (it makes an or-combination, that's why the last condition does not need an OR) and the F stands for "forbidden". The result is that the corresponding spambots don't get in.

More examples and information on how this works:

Here's a simple test to see the spambot blocking in action. With the wget utility we'll play a spambot ourselves with referer "http://www.poker-stinks.com":

$> wget --referer="http://www.poker-stinks.com" http://example.com/
--19:56:07--  http://example.com/
           => `index.html'
Resolving example.com... 357.593.740.825
Connecting to example.com|357.593.740.825|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
19:56:07 ERROR 403: Forbidden.

Huray, it works.

At the time of this writing, 9 out of the latest 10 (non google) referers are spam entries. I hope it declines from now on, maybe after adding some more spam domain matching conditions.

PS. It seems that "referer" is a misspelling of "referrer", that it made it into the official specification of HTTP.