[BusyBox] SVN tarball snapshots?

Ralph Siemsen ralphs at netwinder.org
Fri Mar 11 19:58:41 UTC 2005


Erik Andersen wrote:

> Disobeying the rules was precisely the problem...  I allow google
> to crawl through everything, but they understandably choose not
> to scrounge about under cgi-bin.  All other agents are disallowed
> from crawling cgi-bin, but they do anyways.

We had the "mischevous spider" problem on netwinder.org, whose tiny 
little 200MHz processor was easily overwhelmed by a few concurrent 
on-the-fly tar and gzip processes.  Based on observation of log files, 
there are actually a few spiders that specifically look in /cgi-bin, and 
that explicitly go where robots.txt tells them not to, in hope of 
"finding" something sensitive or confidential.

The solution I adopted was to put a fictitious entry into robots.txt, 
that was not advertised or linked from anywhere else.  Apache has a 
handler for this ficitious entry, pointing to a CGI that blackholes the 
sender via a firewall rule.  This worked pretty well in practise.

-Ralph



More information about the busybox mailing list