For a while now I’ve been speculating about what would happen when AI agents started being able to write papers indistinguishable in quality from those that have been typical of the sad state of hep-th for quite a while. Sabine Hossenfelder today has AI Is Bringing “The End of Theory”, in which she gives her cynical take that the past system of grant-holding PIs using grad students/postdocs to produce lots of mediocre papers with the PI’s name on them is about to change dramatically. Once AI agents can produce mediocre papers much more quickly than the grad students/postdocs, then anyone can play and we’ll get flooded by such papers from not just those PIs, but everyone else.
I decided to take a look at the arXiv hep-th submissions, and quickly generated the following numbers, by simple searches using
https://arxiv.org/search/advanced
to find all hep-th submissions in various date ranges.
For 12/1 to 12/31 the numbers were
2022: 634
2023: 684
2024: 780
2025: 1192
For 1/1 to 2/1
2022:583
2023:531
2024:626
2025:659
2026:1137
For 2/1 to 2/15
2022:299
2023:266
2024:271
2025:333
2026:581
From this very limited data it looks like submission numbers in the last couple months have nearly doubled with respect to the stable numbers of previous years.
I thought about spending more time I don’t have lookng into this, then realized “this is a job for AI!”. Surely an AI agent could do a lot better job than me in gathering such data, figuring out things like whether you can recognize the AI agent papers or not, and writing up a detailed analysis. I’m still resisting learning how to use AI agents, so someone else will have to do this.
One of my main problems with the comments here has been that it’s increasingly hard to tell the difference between human and AI generated ones. In this case, maybe the AI generated ones would be better than those from meatspace. So, unless you have something really substantive (like an explanation for why these numbers don’t mean what it looks like they mean, or know what the arXiv is doing about this) please resist commenting. I’ll moderate comments for things like irrelevance and hallucinations, but won’t delete comments just because they are non-human.


Many theses are never prepared and submitted to journals for many reasons, including the time required to improve quality to the point where the PI is comfortable submitting it. I wonder if this could reflect backlog breakthrough as well as higher new baseline, both from traditional academia and the broader public?
SB,
This is about arxiv submissions, not really anything to do with how theses are handled. Also, we’re talking about a year to year doubling in the size of the literature, theses far too small a number to do this.
Seriously, I hope someone puts an AI agent to work comparing the pre-2025 literature to the current literature, getting some actual data about what has changed, so what are the effects driving this.
There’s been a similar rapid increase in hep-ph (for January, 504 -> 601 -> 667 -> 1007). The funny thing is, until you pointed it out, I hadn’t consciously noticed the increase in volume; instead what I noticed was a sharp decrease in interesting papers. Our local journal club is having a lot of trouble finding new papers worth discussing. There seems to be an increase in very incremental papers, calculating random things in random models, or applying technique X to dataset Y yet again.
For a more complete sample, I do try to at least skim everything that comes out in my small subsubfield in hep-ph. In late 2024, I noticed the first obviously poorly AI-generated paper. It was this weird combination of big claims interspersed with total triviality (“here’s a Python plot of e^(-x) so you know what it looks like!”). The equations were all standard from different parts of physics, but none of them were actually connected to each other. The authors had generated 4 such papers in a month, all in different fields.
Throughout 2025 the rate in my subsubfield accelerated from one per month to one per week. (I keep a folder of them!) By skimming so many I picked up on some patterns. For instance, the authors were often a bunch of students with no experience, or a very old physicist who hadn’t had a student in a long time. Also, while the papers had increasingly coherent explanations of why their new thing was important, the logic would suddenly go away in the crucial part in the middle, where the new thing was supposed to be justified. In 2026 I stopped keeping track.
The common narrative that AI will democratize physics is clearly wrong. Good physicists can use AI as a tool to write good papers, by giving it good problems and frequent feedback. Others can use it to churn out mediocre papers, by giving it incremental (but at least well-defined) problems and frequent feedback. Nonphysicists can’t supply meaningful feedback at all. They just poison the LLMs with nonsense (“add more topological fractal complexity and ether vortex dynamics”), yielding the content on r/LLMPhysics. There may be a time where AI no longer needs high quality input, but there will never be a time where it benefits from low quality input.
Sociologically, people will just retreat to private channels, like Slack or Discord, or talking at the journal club or the coffee machine. When I was a young student, senior people encouraged me to learn by checking arXiv every day, but now I run into senior people who declare they no longer read it at all. It would be far from the first time a public online forum is ruined.