what is a bot motel and how do you run one?

m3047 · on Dec 22, 2024

Easy way is to implement e.g. a 4xx handler which serves content with links which generate further 4xx errors and rewrite the status code to something like 200 when sent to the requester. Load the garbage pages up with... garbage.

m3047 · on Dec 23, 2024

Since this is getting upvoted, I will put forth a suggestion I've made to the people who've paid me to help with this sort of subterfuge: turn your 404 handler into search. Then a human who goes there has a way out. But absolutely, load it up with garbage and broken links.

throaway89 · on Dec 22, 2024

Thanks, and you can make money with this? Sorry I'm a total noob in this area.

shadowgovt · on Dec 23, 2024

Not really... You cost the bots money.

Many are trying to index the web for whatever reason. By feeding them a Library of Babel, you can clog up their storage with noise.

m3047 · on Dec 23, 2024

Once in a while people pay you to do something you enjoy doing, like making people cry and wish they had a jobs flipping burgers instead. But I do it on my own systems for fun, honestly.

yesco · on Dec 22, 2024

The idea is that bots are inflexible to deviations from accepted norms and can't actually "see" rendered browser content. So if your generic 404, 403 error pages return a 200 status instead, with invisible links to other non accessible pages. The bots will follow the links but real users will not, trapping them in a kind of isolated labyrinth of recursive links (the urls should be slightly different though). It's basically how a lobster trap works if you want a visual metaphor.

The important part here is to do this chaotically. The worst sites to scrape are buggy ones. You are, in essence, deliberately following bad practices in a way real users wouldn't notice but would still influence bots.