*depending strongly on who you are buying from, what the topic and preferred venue are and what authorship slot you want.
For the uninitiated: paper mills are illicit organizations that sell services to boost an academic’s publication profile. Although a paper mill can operate by a variety of business models, the prototypical paper mill mass produces fraudulent scholarly manuscripts, sells authorship slots on those manuscripts and colludes with editors at target journals to get those manuscripts published.
I and others researching paper mills are frequently asked how much these services cost. In part to make answering this question easier, myself and my colleagues Spencer Hong and Anna Abalkina created BuyTheBy, to date the largest structured dataset of paper mill advertisements. BuyTheBy v1.0 contains price data from 18,710 advertisements made by seven paper mills operating out of seven different countries.
Check out the preprint on arXiv:
The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. We perform elementary analysis of this dataset to demonstrate its utility for quantitative understanding of markets for academic fraud services and suggest future use cases.
And the dataset itself on Zenodo:
There are many useful things you can do with this dataset and interesting tidbits contained therein (for ideas of analyses that could be performed here and what you mind find, I defer to the preprint and coverage by Nature, Science, Retraction Watch, C&EN and Times Higher Education). I’ll use this post to take at the recurrent question that launched the development of this dataset: how much does a paper mill product actually cost?
The prices offered for authorship slots on articles vary dramatically, even within a single paper mill’s catalog. For instance, the advertisements we processed from an Uzbekistani paper listed prices between $100 and $2000 for first authorship. Clearly, paper mills will offer a variety of products at a variety of price points. Prices also varied wildly between paper mills; while our Indian paper mill was not selling any article authorship positions at a price greater than $200, our Russian paper mill was selling positions for as much as $5,600.

Then the answer is, predictably, it depends. If this answer doesn’t cut it for you, you can look past all limitations and conditionals for a single central estimate, which the dataset will provide: the median first authorship slot offered by these seven paper mills goes for about $800. Based on median listed prices, buying every slot on a five-author manuscript probably costs around $3,000.
I hope that one way others use ByTheBy is by mapping titles offered in these advertisements to their eventually-published products. For a demonstration of this, check out this impressive piece of work by Anna Abalkina, Marie Kunešová, Yagmur Ozturk and Solal Pirelli, for which they tracked more than 1700 advertised titles to articles that were eventually published in conference proceedings:
Opening Pandora’s box: Paper mills in conference proceedings
Paper mills are a growing threat to the integrity of science, yet their penetration in conference proceedings remains underexplored despite conferences being more important than journals in some scientific subfields. This study aims to identify papers in conference proceedings whose titles have been offered for sale on social media platforms. We collected more than 4,000 unique publication offers from more than 200 social media channels and used semi-automated methods along with human assessment to match offers with papers published in IEEE conference proceedings. We identified 1,720 papers in 286 IEEE conference proceedings, accounting for up to 23.51% of an individual conference. These problematic papers are co-authored by more than 6,500 researchers from over 3,500 affiliations in 55 countries. The identified papers demonstrate collaboration anomalies, high diversity of affiliations per paper, citation manipulation, a predominance of six-author papers, and content-based irregularities. Our findings show that paper mills are a large, organized, and often public market that commercializes scientific misconduct, not limited to papers, but infiltrating multiple parts of the research ecosystem.
Both of these manuscripts demonstrate that paper mills offer a variety of different products, from authorship slots on journal articles to authorship slots on conference proceedings to editorship on textbooks to bogus prizes to citation boosting to “design patents” (which are actually not patents in the jurisdictions being advertised, more on that here). More generally, they demonstrate that although paper mills operate largely in secret, they conduct enough business out in the open that we do not have to content ourselves with just guessing.
The header image for this post is an illustration from a manuscript of Nicole Oresme’s 14th century translation of Aristotle’s Ethics, Politics, and Economics.


Leave a comment