Most of our traffic on the day job site now comes from Baidu

Well, their web crawler.

Way way back in the day, I complained about broken bing-bots. This was 8 years ago. Bing was fairly crappy at crawling, and seems to have improved. Google is still the lightest touch. Least impactful. Deeply in the traffic noise.

Not Baidu. There bot is, for lack of a better term, broken. Its not into DoS levels, but it is wasting traffic/resources, and providing lots of log spam.

I am guessing a programming bug on their part. I see them issuing requests with concatenated paths that do not make sense, that aren’t represented in our tree. So they exercise our 404 page quite a bit.

I could, likely, help them out by completely rewriting the pathway they are requesting, an old historical, unused in over two years, pathway endpoint, that is still being constructed and used incorrectly.

I suspect they are (mis)integrating historical data and current data.

But as of now, they are the dominant user of our website, and we get lots of log spam from them. Hopefully they will fix their problem.

