Reading view

There are new articles available, click to refresh the page.

Block AI Bots from Crawling Websites Using Robots.txt

Une liste de User-Agent à bloquer (ou troller).

EDIT: Du coup j'ai mis ça dans la config de mon Apache:
RewriteCond "%{HTTP_USER_AGENT}" "(ChatGPT-User|Meta-ExternalFetcher|Amazonbot|Applebot|OAI-SearchBot|PerplexityBot|YouBot|Applebot-Extended|Bytespider|CCBot|ClaudeBot|Diffbot|FacebookBot|Google-Extended|GPTBot|Meta-ExternalAgent |omgili |AI Agent Anthropic-AI|AI Agent Claude-Web|Cohere-AI Agent|Ai2Bot|Ai2Bot-Dolma|GoogleOther|GoogleOther-Image|GoogleOther-Video|ImagesiftBot|PetalBot|Scrapy|Timpibot|VelenPublicWebCrawler|Webzio-Extended|facebookexternalhit)" [NC]
RewriteRule .* - [R=429,L]

(HTTP 429 c'est "Too many requests")

Bien sûr je sais que ça ne suffit pas (beaucoup de bots mentent sur leur User-Agent), mais c'est déjà ça.

EDIT: Voir aussi : https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt
(Permalink)
❌