Wednesday, March 10, 2010

Being Evil

Someone was using SiteSucker on a demo site and was behaving badly, this site also hosts our redmine install on the same DB and started to slow down my work. To block we had a few options, iptables the IP, block the IP w/in apache (htaccess) or do the evil thing, just block based on their user-agent. I chose the latter.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^SiteSucker2.3.1
RewriteRule ^(.*) /badbot.html
This is evil since the persons crawler will get out badbot.html page. The person will then use firefox/ie/whatever to browse to the page and look to see why it's not working, but since it's sending a differend user-agent, they will be allowed to browse. For anyone who knows how to configure a crawler it isn't an issue changing the supplied agent, but then again that person will likely be able to control their crawler and not kill my web server. Here's a snapshot of the logs showing the crawler pulling badbot (size 174) followed by a browse attempt from safari. - - [09/Mar/2010:19:52:41 -0500] "GET /cgi-bin/isadg/ HTTP/1.1" 200 174 "" "SiteSucker2.3.1 CFNetwork/438.14 Darwin/9.8.0 (i386) (MacBook1%2C1)" - - [09/Mar/2010:19:52:42 -0500] "GET /cgi-bin/isadg/ HTTP/1.1" 200 174 "" "SiteSucker2.3.1 CFNetwork/438.14 Darwin/9.8.0 (i386) (MacBook1%2C1)" - - [10/Mar/2010:11:03:13 -0500] "GET /cgi-bin/isadg/ HTTP/1.1" 200 2286 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; fr-fr) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10"