By mid-2025, 35% of newly revealed web sites have been AI-generated or AI-assisted, up from zero earlier than ChatGPT’s November 2022 launch.
The confirmed results are semantic contraction and synthetic positivity—not misinformation or stylistic homogeneity, regardless of what most individuals imagine.
At 35% AI prevalence, mannequin collapse threat shifts from a theoretical concern to an empirical one for the following era of basis fashions.
A brand new examine has a quantity for the way a lot of the web is now AI-generated: 35%. That is the share of newly revealed web sites categorized as AI-generated or AI-assisted by mid-2025, in line with analysis from Stanford College, Imperial Faculty London, and the Web Archive. The determine was primarily zero earlier than ChatGPT launched in November 2022.
“I discover the sheer pace of the AI takeover of the online fairly staggering,” Jonáš DoleĹľal, researcher at Imperial Faculty London and co-author of the paper, advised 404 Media. “After many years of people shaping it, a good portion of the web has turn out to be outlined by AI in simply three years.”
The examine, titled “The Impression of AI-Generated Textual content on the Web,” drew on 33 months of web site snapshots from the Web Archive’s Wayback Machine and used an AI textual content detector known as Pangram v3 to categorise every web page.
]]>
The confirmed harms: vibes, not information
Researchers examined six hypotheses about what AI content material does to the online. Solely two held up beneath information scrutiny.
The primary: We’re turning right into a horde of dumb NPCs appearing in the identical approach… Or extra scientifically put, the online is changing into much less semantically various.
AI-generated websites confirmed pairwise semantic similarity scores 33% greater than human-written ones. The identical concepts hold getting expressed in almost the identical methods.
The paper suggests the web Overton window could also be narrowing, not by censorship or coordinated campaigns, however as a result of language fashions optimize for outputs near their coaching distribution.
The second: The net is getting aggressively cheerful.
AI content material confirmed optimistic sentiment scores greater than 107% greater than human content material. Researchers tie this to the well-documented sycophantic tendencies of LLMs—skilled on human approval indicators, they produce textual content that feels sanitized, friction-free, and relentlessly upbeat.
An web flooded with cheerful, homogenized content material might marginalize human dissent at scale with out anybody pulling a lever.
Regardless of widespread public perception, the examine discovered no statistically important proof that AI content material is making the web much less factually correct. Researchers discovered no significant correlation between AI prevalence and factual error charge.
The stylistic monoculture speculation—AI flattening particular person voices right into a generic uniform register—was the idea respondents held most strongly (83% agreed). The information did not affirm it. Character-level evaluation discovered no statistically important enhance in stylistic homogeneity tied to AI prevalence.
The mannequin collapse downside simply obtained actual
The broader stakes transcend discourse high quality. At 35% AI prevalence, the theoretical threat of mannequin collapse—the place future fashions degrade after coaching on AI-generated information—shifts from tutorial concern to empirical actuality. Future basis fashions skilled on modern internet crawls will inevitably ingest information that’s considerably AI-generated and measurably much less semantically various.
The crew is now working with the Web Archive to show the examine right into a steady, stay monitoring device, monitoring AI’s share of the online in actual time quite than as a one-off snapshot.
A U.S. survey performed alongside the examine discovered most People already imagine all six unfavourable hypotheses, together with those the information does not help. Individuals who use AI sometimes have been 12% extra prone to imagine within the harms than frequent customers. Useless Web Idea believers, meet the information: The web is not useless, however 35% of what is new might be zombie content material not directly.
Day by day Debrief Publication
Begin every single day with the highest information tales proper now, plus unique options, a podcast, movies and extra.