Fixing Baidu’s broken search bot

It seems that the bot was generating some effectively random broken URLs. Or maybe not so random. I saw endpoints in the logs that haven’t been in use for at least 7 years. I can’t imagine this was simply a harmless bug, as much as … maybe? … a search for moved/renamed endpoints?

As the web server is now done very differently than in the past, the missing endpoints merely generated log spam. And messed our analysis.

So I needed a way to fix their code, without … their code.

So using our front end server, I marked the specific IP range as being a bad_user.

geo $bad_user {
        default 0;
        180.76.15.0/24 1;
}

then I told our server to do rewriting when it found the bad_user as a client

if ($bad_user) {
    rewrite ^(.*)$ https://scalableinformatics.com;
  }

This isn’t redirecting them into a bad place, this is redirecting them to our front page if they have anything other than the front page in the end point. I figure a day of this, and they’ll get the idea that mebbe something is borked in their code/db and clean it up.

Annoying that I have to resort to this.

Lets see if it helps.

Viewed 20662 times by 1918 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail