Looks Like it is Happening…

Update: Sorry, but a commenter points out that this may just be an artifact of counting based on when most recently modified, not on original submission date.

Numbers using original, not most recent, submission dates

For 12/1 to 12/31 the numbers were
2022: 800
2023: 811
2024: 815
2025: 855

For 1/1 to 2/1
2022:510
2023:490
2024:501
2025:544
2026:617

For 2/1 to 2/15
2022:255
2023:221
2024:280
2025:276
2026:311

These do show significant increases year to year for the last couple months, but not the near doubling indicated by the other numbers. The hep-th arxiv apocalypse is not here yet.

For a while now I’ve been speculating about what would happen when AI agents started being able to write papers indistinguishable in quality from those that have been typical of the sad state of hep-th for quite a while. Sabine Hossenfelder today has AI Is Bringing “The End of Theory”, in which she gives her cynical take that the past system of grant-holding PIs using grad students/postdocs to produce lots of mediocre papers with the PI’s name on them is about to change dramatically. Once AI agents can produce mediocre papers much more quickly than the grad students/postdocs, then anyone can play and we’ll get flooded by such papers from not just those PIs, but everyone else.

I decided to take a look at the arXiv hep-th submissions, and quickly generated the following numbers, by simple searches using
https://arxiv.org/search/advanced
to find all hep-th submissions in various date ranges.

For 12/1 to 12/31 the numbers were
2022: 634
2023: 684
2024: 780
2025: 1192

For 1/1 to 2/1
2022:583
2023:531
2024:626
2025:659
2026:1137

For 2/1 to 2/15
2022:299
2023:266
2024:271
2025:333
2026:581

From this very limited data it looks like submission numbers in the last couple months have nearly doubled with respect to the stable numbers of previous years.

I thought about spending more time I don’t have lookng into this, then realized “this is a job for AI!”. Surely an AI agent could do a lot better job than me in gathering such data, figuring out things like whether you can recognize the AI agent papers or not, and writing up a detailed analysis. I’m still resisting learning how to use AI agents, so someone else will have to do this.

One of my main problems with the comments here has been that it’s increasingly hard to tell the difference between human and AI generated ones. In this case, maybe the AI generated ones would be better than those from meatspace. So, unless you have something really substantive (like an explanation for why these numbers don’t mean what it looks like they mean, or know what the arXiv is doing about this) please resist commenting. I’ll moderate comments for things like irrelevance and hallucinations, but won’t delete comments just because they are non-human.

This entry was posted in Uncategorized. Bookmark the permalink.

18 Responses to Looks Like it is Happening…

  1. SB says:

    Many theses are never prepared and submitted to journals for many reasons, including the time required to improve quality to the point where the PI is comfortable submitting it. I wonder if this could reflect backlog breakthrough as well as higher new baseline, both from traditional academia and the broader public?

  2. Peter Woit says:

    SB,
    This is about arxiv submissions, not really anything to do with how theses are handled. Also, we’re talking about a year to year doubling in the size of the literature, theses far too small a number to do this.

    Seriously, I hope someone puts an AI agent to work comparing the pre-2025 literature to the current literature, getting some actual data about what has changed, so what are the effects driving this.

  3. Kevin Zhou says:

    There’s been a similar rapid increase in hep-ph (for January, 504 -> 601 -> 667 -> 1007). The funny thing is, until you pointed it out, I hadn’t consciously noticed the increase in volume; instead what I noticed was a sharp decrease in interesting papers. Our local journal club is having a lot of trouble finding new papers worth discussing. There seems to be an increase in very incremental papers, calculating random things in random models, or applying technique X to dataset Y yet again.

    For a more complete sample, I do try to at least skim everything that comes out in my small subsubfield in hep-ph. In late 2024, I noticed the first obviously poorly AI-generated paper. It was this weird combination of big claims interspersed with total triviality (“here’s a Python plot of e^(-x) so you know what it looks like!”). The equations were all standard from different parts of physics, but none of them were actually connected to each other. The authors had generated 4 such papers in a month, all in different fields.

    Throughout 2025 the rate in my subsubfield accelerated from one per month to one per week. (I keep a folder of them!) By skimming so many I picked up on some patterns. For instance, the authors were often a bunch of students with no experience, or a very old physicist who hadn’t had a student in a long time. Also, while the papers had increasingly coherent explanations of why their new thing was important, the logic would suddenly go away in the crucial part in the middle, where the new thing was supposed to be justified. In 2026 I stopped keeping track.

    The common narrative that AI will democratize physics is clearly wrong. Good physicists can use AI as a tool to write good papers, by giving it good problems and frequent feedback. Others can use it to churn out mediocre papers, by giving it incremental (but at least well-defined) problems and frequent feedback. Nonphysicists can’t supply meaningful feedback at all. They just poison the LLMs with nonsense (“add more topological fractal complexity and ether vortex dynamics”), yielding the content on r/LLMPhysics. There may be a time where AI no longer needs high quality input, but there will never be a time where it benefits from low quality input.

    Sociologically, people will just retreat to private channels, like Slack or Discord, or talking at the journal club or the coffee machine. When I was a young student, senior people encouraged me to learn by checking arXiv every day, but now I run into senior people who declare they no longer read it at all. It would be far from the first time a public online forum is ruined.

  4. Jerry Ling says:

    The effect goes away if you search properly using the original submission date instead of the most recent submission date. By using most recent submission date, your analysis is biased because we’re so close to the beginning of 2026 so ofc we will see a peak that’s just people who have recently modified their submission.

  5. Peter Woit says:

    Jerry Ling,
    Thanks for pointing this out. I’ve put up numbers using original submission dates. There is an increase this year (about 13%), but quite a lot smaller than the doubling I was seeing counting most recent submission dates.

  6. Philip Chang says:

    Jerry and I were discussing about this. And here is the AI-driven analysis you were looking for: https://github.com/sgnoohc/arxiv-submission-analysis

  7. Peter Woit says:

    Philip Chang,
    Thanks, that’s great! I’m still mystified why this big change in the number of revisions/replacements.

    Looking at January and February numbers, I do see larger year to year increases (about 13%) in the original submission numbers.

  8. Philip Chang says:

    Our thought was that it’s because whenever new revision comes in the date gets associated with the latest, and so the red is constantly “shifting to the future and piling near recent times”, as time goes on.

    This probably implies that the integral of blue should be higher than integral of red in the previous times, and although it’s not very clear, it does show that the blue spikes are above red. cs.AI area is much more clear where red is consistently lower and then the recent revisions pileup near “today’s” time.

  9. Sabine says:

    Thanks for the link!

    It’s become rather difficult to interpret arXiv submission stats because their endorsement and moderation rules have changed so much. You also have to take into account that we have seen exponential growth of the number of scientists for the past century or so, and the number of papers has grown in proportion with that. And then there is the issue that the number of scientists has grown dramatically faster in some regions of the world, notably China.

    Altogether it’s a tricky question how to interpret the numbers. Generally you expect that as submissions increase (whether for online repositories or journals), fast rejections also increase, so it is likely that we are only seeing a small part of the problem.

  10. Other Andrew says:

    You need a statistical model and to run simulations to see what’s happening. I have a theory: typical pattern is that you submit a paper (v1), submit a revise a week or so later to incorporate typos and reference requests (v2), then submit a final version to the arXiv at some time later, after revisions for and acceptance at a journal (v3).

    This may lead to a spike in revisions at any given time, as the current months have v2 and v3 revisions. The v2 revisions give a spike over the baseline of v3 revisions. Look in again in 12 months and those v2 revisions that gave the spike will have been replaced by v3 revisions, spread out over time by the review process.

    I’m not sure about this though and would like a simulation 🙂 I would do

    * new submissions per month ~ Poisson(some rate)
    * each of them revised that same month
    * each of them revised again at d ~ Poisson(review time in months) months later

  11. Peter Woit says:

    Sabine,
    If AI agents do reach the point of being able to write hep-th papers indistinguishable from current ones, one would expect to start seeing a distinctive signal of larger submission numbers then. There are many other things going on (eg. growth in number of scientists in China), but the effects of those I think would be on longer time-scales.

    I haven’t seen any reports from people trying to do this (write hep-th papers using AI agents) other than the recent heavily publicized amplitudes result. There are a lot of reports emerging from math researchers, see for example Daniel Litt here
    https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel
    These seem to indicate that AI agents at the moment are sometimes a useful tool, not yet though capable of doing a complete piece of publishable work on their own. In math the issues are somewhat different, since what’s needed is a rigorous proof, checkable either by expert humans or formalizable in Lean.

    It looks like this same issue is coming up in just about every scientific field. If you believe half of what the enthusiasts for this new world are saying, we should soon see definitive evidence. The 13% or so increase in submissions the past month or two versus last year looks to me like a 2-3 sigma signal. By the end of the year, if it’s real, should be 5 sigma…

  12. Cobi says:

    Note that at some point in the last year arXiv started to list revisions in hep-th with full abstract. This increases visibility for papers upon revision and can create an additional incentive to revise.

  13. Other Andrew says:

    Here is the simulation.

    https://andrewfowlie.github.io/blog/curious-trends-in-arxiv-submission-data.html

    The results are as I guessed. The fact that the most recent months haven’t been replaced yet (as they will be replaced in the future) gives a surge at the current months.

  14. Peter Woit says:

    There are some papers appearing hep-th that are clearly AI generated, e.g. today’s
    https://arxiv.org/abs/2602.22503
    Picking a random paragraph, an online Chat-GPT detector says 84% probability AI generated.
    This paper has already been published in Nuclear Physics B
    https://www.sciencedirect.com/science/article/pii/S0550321326000635

  15. thphys says:

    Peter, this post and your last point make interesting observations. First, in general, egregiously bad AI written papers are often simply not posted to arXiv in the first place. They are, however, submitted to legitimate journals, and journals have seen a huge spike in low-quality submissions in the last year or two. Second, arXiv may push an article to moderation or even reject it according to their criteria. However, this is reconsidered if the paper is published in a reputable journal. So, low-quality, AI slop submissions have motivation to submit to a journal because if it happens to be accepted, then it may be posted on arXiv. Once an author posts on arXiv, I would assume that later submissions are more likely to be accepted, as well (but perhaps there is some secondary moderation still).

  16. Peter Woit says:

    thphys,

    I’d noticed that about this paper (“preprint” submitted to arxiv after already published in a reputable journal). This is something that easily could have been (and should have been…) rejected by the arXiv, harder for them to do so if a journal has accepted it.
    Maybe we’re going to see an inversion of the traditional system of preprints being less reliable than published articles, with the journals making money publishing obvious AI-generated slop while the arXiv moderators sometimes are able to stop it.

    Just tried testing another paragraph with the AI detector. 87% chance AI it says.

  17. John Baez says:

    On the Category Theory Community Server, where I spend a lot of my social media time, some people are keeping track of new papers on the arXiv that appear to be either AI-generated or written with the help of AI by someone who doesn’t understand category theory. Some candidates:

    Homotopical observables and the Langlands program via infinity-topoi, https://arxiv.org/abs/2505.22558

    Recursive difference categories and topos-theoretic universality, https://arxiv.org/abs/2505.22931

    Tunnel geometry and proliferation logic: a strict categorical equivalence, https://arxiv.org/abs/2601.00803

    Some such papers are being moved by arXiv moderators to math.GM (general mathematics), but not all.

  18. Alessandro Strumia says:

    The AI detector likely gives another irrelevant result, as many authors use AI to improve the grammar. For example, I stopped writing trough instead of through.

    The problem I see is that arXiv hep papers got long and boring.

Leave a Reply

Informed comments relevant to the posting are very welcome and strongly encouraged. Comments that just add noise and/or hostility are not. Off-topic comments better be interesting... In addition, remember that this is not a general physics discussion board, or a place for people to promote their favorite ideas about fundamental physics. Your email address will not be published. Required fields are marked *