Reese Richardson

Case studies in scientific reproducibility

Datasets

This page links to various datasets that I have collected or curated that others might find useful. If you wind up doing any interesting with one, I would love to hear about it!

Peer review archives

Many journals now practice “transparent peer review”, meaning that peer review reports and author responses to those reports are made available to read alongside the published article. Access to a large number of these peer review reports can be useful for scholarly study of the peer review process. However, publishers seldom make these reports available to download in bulk. I have downloaded the complete archives of peer review reports and author responses for several open access journals.

BMC Anesthesiology (downloaded 20 September 2025)
BMC Cancer (downloaded 31 August 2025)
BMC Chemistry (downloaded 14 September 2025)
BMC Complementary Medicine and Therapies (downloaded 14 September 2025)
BMC Gastroenterology (downloaded 31 August 2025)
BMC Genomics (downloaded 15 September 2025)
BMC Microbiology (downloaded 14 September 2025)
BMC Medical Genomics (downloaded 31 August 2025)
BMC Medicine (downloaded 31 August 2025)
BMC Nursing (downloaded 14 September 2025)
BMC Plant Biology (downloaded 14 September 2025)
BMC Public Health (downloaded 15 September 2025)
BMC Surgery (downloaded 14 September 2025)
BMC Women’s Health (downloaded 14 September 2025)

Publisher archives

Some publishers, like PLOS, make all of their article metadata available without restrictions in a single large archive of XML files following the JATS standard. Hindawi (an ill-fated Wiley acquisition) used to do this, but their XML dump disappeared shortly after Wiley folded the Hindawi imprint. Luckily, we downloaded a copy shortly before its disappearance.

Hindawi XML Corpus (downloaded April 2, 2024)

Predatory/hijacked journal archives

Predatory journals can be ephemeral. To preserve their footprints, I’ve downloaded several complete PDF archives of several predatory, hijacked or otherwise shady journals.

PalArch’s Journal of Archaeology of Egypt/Egyptology (downloaded 27 May, 2023)
Res Militaris (downloaded 27 May, 2023)
Journal of Pharmaceutical Negative Results (downloaded 27 May, 2023)
Russian Law Journal (downloaded 27 May, 2023)
HIV Nursing (downloaded 27 May, 2023)
Journal of Science Technology and Research (JSTAR) and International Journal of Engineering Innovations and Management Strategies (IJIEMS) (downloaded 5 May 2025)

Find My Understudied Genes (FMUG)

Find My Understudied Genes (FMUG) is a data-driven tool to help biomedical researchers identify understudied human genes and characterize their tractability for future research. FMUG is built on an aggregate database synthesizing a diverse array of bibliometric, experimental and molecular data.

FMUG site, associated eLife article

Articles with associated datasets

Some potentially useful datasets are described by the articles below.

“The entities enabling scientific fraud at scale are large, resilient, and growing rapidly” (2025)
“Widespread misidentification of scanning electron microscope instruments in the peer-reviewed materials science and engineering literature” (2025)