We check logs to make sure things are working. Nothing like getting a huge number of failed requests to spoil your day. So some things stick out. Like 1 request per second for 10,000+ seconds from a single site. In this case, in France. Or a bot getting stuck in a calendar. Like the Microsoft bot.
In the case of the former, it happened this morning. The easiest thing to do is simply to firewall them off. You can’t fix someone else’s broken machine or configuration, and I am of the belief that we should treat these things as configuration errors until it is demonstrated they are malicious. However, the response is the same either way.
The other class is annoying.
Pre-programmed bots that don’t understand how web service URLs work. And get lost in following the “next” link. Or the “previous” link. Such bots ignoring robots.txt to pursue the additional information they crave. And demonstrate the bot authors ignorance of how things actually work on real web sites. Or use URLs that haven’t existed in 6 months as their starting point.
Some company in Redmond ought to either fix their bots, or outsource to someone who does a far better job of crawling responsibly. I can programmatically redirect their queries if they keep coming in for old/broken/non-existent URLs to their competitors. I don’t like them abusing our resources, as they should know better. Their competitors do.
Viewed 11862 times by 2512 viewers