[BusyBox] SVN tarball snapshots?
Ralph Siemsen
ralphs at netwinder.org
Fri Mar 11 19:58:41 UTC 2005
Erik Andersen wrote:
> Disobeying the rules was precisely the problem... I allow google
> to crawl through everything, but they understandably choose not
> to scrounge about under cgi-bin. All other agents are disallowed
> from crawling cgi-bin, but they do anyways.
We had the "mischevous spider" problem on netwinder.org, whose tiny
little 200MHz processor was easily overwhelmed by a few concurrent
on-the-fly tar and gzip processes. Based on observation of log files,
there are actually a few spiders that specifically look in /cgi-bin, and
that explicitly go where robots.txt tells them not to, in hope of
"finding" something sensitive or confidential.
The solution I adopted was to put a fictitious entry into robots.txt,
that was not advertised or linked from anywhere else. Apache has a
handler for this ficitious entry, pointing to a CGI that blackholes the
sender via a firewall rule. This worked pretty well in practise.
-Ralph
More information about the busybox
mailing list