The logs of my (drupal powered) website show a lot of referer spam. Some time ago I had this statistics page which contained a listing of the last 10 pages my site's visitors came from (aka referers). Soon spambots found out and spammed this list. I made the list invisible for anonymous visitors, but nevertheless spambots stil target my site (less frequent than when the list was visible, however), polute my stats, use bandwidth, use processing power and kill those cute little puppies. Now I went a bit further to block those dirty spambots ...
There are some drupal modules concerning different sorts of spam, but I found another solution that blocks the spambots before drupal kicks in to generate webpages. The trick is using the .htaccess
file to tweak the Apache http server's behavoir. I added the following lines to drupal's .htaccess
file (inside the mod_rewrite
block):
RewriteCond %{HTTP_REFERER} (poker) [NC,OR]
RewriteCond %{HTTP_REFERER} (viagra) [NC,OR]
RewriteCond %{HTTP_REFERER} (casino) [NC]
RewriteRule .* - [F]
What this means: if the HTTP referer contains 'viagra', 'poker' or 'casino' (typical words in spam referers), the webserver answers with "forbidden" (HTTP response 403). The NC
makes the patterns case insensitive ( n o c ase), the OR
is the glue between the different conditions (it makes an or-combination, that's why the last condition does not need an OR
) and the F
stands for "forbidden". The result is that the corresponding spambots don't get in.
More examples and information on how this works:
- http://drupal.org/node/24302
- http://www.spywareinfo.com/articles/referer_spam/
- The mod_rewrite module (Apache http server documentation)
Here's a simple test to see the spambot blocking in action. With the wget utility we'll play a spambot ourselves with referer "http://www.poker-stinks.com":
$> wget --referer="http://www.poker-stinks.com" http://example.com/
--19:56:07-- http://example.com/
=> `index.html'
Resolving example.com... 357.593.740.825
Connecting to example.com|357.593.740.825|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
19:56:07 ERROR 403: Forbidden.
Huray, it works.
At the time of this writing, 9 out of the latest 10 (non google) referers are spam entries. I hope it declines from now on, maybe after adding some more spam domain matching conditions.