AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt | Blogmarks

Fascinating!

Aaron clearly warns users that Nepenthes is aggressive malware. It's not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an "infinite maze" of static files with no exit links, where they "get stuck" and "thrash around" for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models.

I imagine the basic idea is to have a page that a crawler would find if it ignored your robots.txt. It would be served with nonsense content and tons of internal dynamic links which go to more pages full of nonsense content and tons more internal dynamic links. Less of a maze and more of a never-ending tree.

But efforts to poison AI or waste AI resources don't just mess with the tech industry. Governments globally are seeking to leverage AI to solve societal problems, and attacks on AI's resilience seemingly threaten to disrupt that progress.

Weird to make this unqualified claim with no examples of how governments are trying to solve societal problems with AI -- I'm certainly having a hard time thinking of any.

Edit: Cloudflare has since launched something similar that it calls an AI Labyrinth.

When AI crawlers follow these links, they waste valuable computational resources processing irrelevant content rather than extracting your legitimate website data. This significantly reduces their ability to gather enough useful information to train their models effectively.