AI Bots Are Looting Libraries and Repositories
Cloudflare and glam-e lab (NYU & University of Exeter) are sounding the alarm about the increasingly malicious and excessive scraping activities of bots deployed by AI companies in the quest for LLM training data.
Michael Weinberg (glam e-lab): Are AI Bots Knocking Cultural Heritage Offline? https://www.glamelab.org/products/are-ai-bots-knocking-cultural-heritage-offline “This report captures the impact that bots building datasets for AI model training are having on online cultural collections in early 2025.”
Cloudflare: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives (2025-08-04) https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
Matt Enis (Library Journal): AI Bots Swarm Library, Cultural Heritage Sites, Causing Slowdowns and Crashes (Jul 21, 2025) https://www.libraryjournal.com/story/ai-bots-swarm-library-cultural-heritage-sites-causing-slowdowns-and-crashes