Reese Richardson

Case studies in scientific reproducibility

The scientific enterprise is a human enterprise

Published by

on

The following is an excerpt from the preface of my newly-defended PhD dissertation in Interdisciplinary Biological Sciences, titled “Metascientific studies in reproducibility, bias and fraud”.

3 out of 4 chapters of my thesis have been published: “Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results”, “The entities enabling scientific fraud at scale are large, resilient, and growing rapidly” and “Widespread misidentification of scanning electron microscope instruments in the peer-reviewed materials science and engineering literature”. The remainder of my dissertation is currently under embargo.

The header image for this post is the Babylonian tablet Plimpton 322.

In 1947, a young Osamu Shimomura enrolled in the College of Pharmaceutical Sciences of Nagasaki Medical College [1]. The college had been destroyed in the bombing of Nagasaki and had relocated to a temporary campus near Shimomura’s home. Although he had no interest in pharmacy at the time, he saw it as his only opportunity to earn an education. He went on to discover green fluorescent protein (GFP), for which he was awarded the Nobel Prize in Chemistry in 2008. GFP has since shaped the design of countless biological experiments [2].

On May 10, 1989, a fire began in a mouse breeding facility at Jackson Laboratory in Bar Harbor, Maine that would grow to consume half of the breeding laboratories and kill almost half a million mice [3,4]. Jackson Laboratory was a leading supplier of mice for biomedical research in the United States. The fire caused many strains of mice to suddenly become unavailable to researchers for months, if not years. A recent analysis found that significantly fewer publications featured the affected strains in the coming months and scientists whose work relied on these strains tended to publish less and were more likely to change their topic of study entirely [5]. The authors of this analysis found that other laboratory disasters showed similar effects on scientific productivity [6].

In the 1990s, a family of proteins called nuclear receptors were characterized and rapidly earned substantial research attention [7]. Researchers believed every member of the family could be a tractable therapeutic target for one disease or another—hundreds of papers could be written about each protein. However, over the next couple of decades, only a small subset of these proteins would be investigated in depth, receiving thousands more citations than the rest of the family combined. This subset represents the proteins in this family for which a protein-specific chemical probe was widely commercially available. Because such probes have long been unavailable for other members of the family, biologists have largely elected not to study those members.

Mechanical Turk (MTurk) is a crowd-sourcing service launched by Amazon in 2005 [8]. In the MTurk framework, users (“requesters”) temporarily recruit employees (“workers”) to perform menial digital tasks (“microtasks”) for wages (“rewards”). These rewards are often pennies per task [9,10]. MTurk has proven to be of great use to social science researchers as a simple and cost-effective means of gathering participant data. To date, thousands of studies have been published featuring data collected through MTurk [11]; a 2016 study found that up to 40% of surveyed social science articles contained at least one MTurk-aided study [12]. However, because workers are paid per task completed, concerns have recently emerged that an increasing proportion of MTurk-collected survey responses are low-effort, bad faith, or are from a single worker using multiple accounts [13]. Indian and Venezuelan workers are suspected of using software to make it appear as though they are based in the United States, allowing them to participate in and earn rewards for surveys targeting Americans [11]. More recently, a large number of MTurk survey responses bear signatures of workers using AI chatbots to expedite their work [14]. The extent to which this widespread contamination of data has affected the social sciences literature is unclear.

These anecdotes serve to demonstrate what I believe has been the overarching theme of my doctoral studies: that the scientific enterprise is a human enterprise. Although trivial to scientists themselves, science’s human aspect is often outright ignored. Popular maxims like “trust the science”, “we believe in science” and “science is self-correcting” reduce the scientific enterprise to a singular, infallible entity, withholding due recognition of the human and material processes behind science’s working.

These processes are also neglected when any single scientist is lauded as a genius or trailblazer. While it is exciting to attribute grand discoveries to individuals’ tenacity or brilliance, this view fails to capture the mechanics underlying those discoveries. Advances in understanding are as much because of scientists’ individual cleverness and determination as because certain tools became commercially available to those scientists, or because standards evolved in response to changing consumer technologies, or because some funds were allocated to some program.

Science, and thus our collective knowledge, is inexorably shaped by the material and practical constraints of its societal context. In the abstract, a primary goal of metascience (or the science of science) is to better understand how the scientific enterprise functions within its human context. Metascientists contend that obtaining such an understanding allows policymakers to better direct the scientific enterprise, accelerating innovation and the widening of our collective knowledge and, in turn, improving human well-being [15,16].

Obtaining such a high-level mechanistic understanding is incredibly difficult. The principal object of analysis in most metascience studies is the scientific article. However, every scientific article is an overwrought palimpsest reflecting the intentions of the authors as well as their institutions, their sources of funding, the peer reviewers, the editors, and the publisher. Except in rare instances where intermediary records are available [17], this product is the sole lasting reflection of the scientific process at work. From these records and the citations that connect them, we are charged with extracting generalizable insights about scientific productivity, creativity and innovation.

These difficulties notwithstanding, this challenge has never been more urgent. Recent global events have strained the relatively strong public trust in science [18-20]. Recent large-scale evaluations have cast doubts on the reproducibility of large swaths of the published literature [21-24], fueling concerns about a “reproducibility crisis” [25]. Meanwhile, scientific articles are now being retracted at a record rate [26]. At the heart of this trend has been the apparent emergence of paper mills, businesses that exploit the career pressure on scientists to publish and serve to fill the peer-reviewed literature with low-quality and outright fraudulent studies [27-29]. These businesses seem to have found the most customers among academics and physicians in low- and middle-income countries, where the opportunities for growth of the scientific enterprise are greatest. In some countries with burgeoning scientific enterprises, some might argue that such businesses have completely upended ethical norms of scientific research and publishing [30-32]. In these contexts, even those that wish in good faith to have impactful and meaningful scientific careers may be forced into misconduct by their circumstances. Worse still, scientists trained in these contexts will be subject to the assumption that bogus science is all they could ever produce; a common reaction to the discovery of paper mill activity is to conclude that we just cannot trust any research coming out of China, or Iran, or Russia, or Egypt, or India, etc. Aspiring and established scientists in the global scientific periphery deserve better.

One might argue that the development of paper mills and other forms of systematic scientific fraud is the indirect result of metascientists creating and popularizing quantifiable heuristics for evaluating scientists like the h-index and the impact factor [33,34] (as an aspiring metascientist, I would be loath to not cite Goodhart’s law, stated by Strathern as “when a measure becomes a target, it ceases to be a good measure” [35,36]). The scientific enterprise is an unwieldy behemoth, ever-growing in size and complexity [37]. In taking up the challenge of characterizing this beast, metascientists of the past may be indirectly responsible for many of its current ills. It is then incumbent upon metascientists of the present and future to use their craft to work towards a more equal and more just scientific enterprise in addition to a more efficient one.

References

  1. Osamu Shimomura Nobel Lecture. Dec. 4, 2008.
  2. Aldo Roda. “Discovery and development of the green fluorescent protein, GFP: the 2008 Nobel Prize”. In: Analytical and bioanalytical chemistry 396.5 (2010), pp. 1619–1622.
  3. Joseph Palca. “Fire strikes Jackson Laboratory.” In: Nature 339.6221 (1989).
  4. Elaine Blume. “NCI awards Jackson Lab $9.5 million to rebuild”. In: Journal of the National Cancer Institute 82.21 (1990), p. 1674.
  5. Stefano Horst Baruffaldi, Dennis Byrski, and Fabian Gaessler. “Fire and Mice: The Effect of Supply Shocks on Basic Science”. In: Academy of Management Proceedings. Vol. 2020. Academy of Management Briarcliff Manor, NY 10510. 2020, p. 14405.
  6. Stefano Baruffaldi and Fabian Gaessler. “The returns to physical capital in knowledge production: Evidence from lab disasters”. In: Max Planck Institute for Innovation & Competition Research Paper 21-19 (2021).
  7. Aled M Edwards et al. “Too many roads not taken”. In: Nature 470.7333 (2011), pp. 163–165.
  8. Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. “Running experiments on Amazon Mechanical Turk”. In: Judgment and Decision making 5.5 (2010), pp. 411–419.
  9. Paul Hitlin. Research in the crowdsourcing age: A case study. 2016.
  10. Douglas J Ahler, Carolyn E Roush, and Gaurav Sood. “The micro-task market for lemons: Data quality on Amazon’s Mechanical Turk”. In: Political Science Research and Methods (2021), pp. 1–20.
  11. Ryan Kennedy et al. “The shape of and solutions to the MTurk quality crisis”. In: Political Science Research and Methods 8.4 (2020), pp. 614–629.
  12. Haotian Zhou and Ayelet Fishbach. “The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions.” In: Journal of personality and social psychology 111.4 (2016), p. 493.
  13. Catherine C Marshall et al. “Who broke Amazon Mechanical Turk? An analysis of crowdsourcing data quality over time”. In: Proceedings of the 15th ACM Web Science Conference 2023, pp. 335–345.
  14. Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West. “Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks”. In: arXiv arXiv:2306.07899 (2023).
  15. Dashun Wang and Albert-László Barabási. The Science of Science. Cambridge University Press, 2021.
  16. Santo Fortunato et al. “Science of science”. In: Science 359.6379 (2018), eaao0185.
  17. Mohammad Hosseini et al. “Ethical considerations in utilizing artificial intelligence for analyzing the NHGRI’s History of Genomics and Human Genome Project archives”. In: Journal of eScience Librarianship 13.1 (2024).
  18. Viktoria Cologna et al. “Trust in scientists and their role in society across 67 countries”. In: OSF Preprints (2024).
  19. Arthur Lupia et al. “Trends in US public confidence in science and opportunities for progress”. In: Proceedings of the National Academy of Sciences 121.11 (2024), e2319488121.
  20. Brian Kennedy and Alec Tyson. “Americans’ trust in scientists, positive views of science continue to decline”. Pew Research Center (2023).
  21. Open Science Collaboration. “Estimating the reproducibility of psychological science”. In: Science 349.6251 (2015), aac4716.
  22. Colin F Camerer et al. “Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015”. In: Nature human behaviour 2.9 (2018), pp. 637–644.
  23. Timothy M Errington et al. “Investigating the replicability of preclinical cancer biology”. In: Elife 10 (2021), e71601.
  24. Jongwoo Park, Joshua D Howe, and David S Sholl. “How reproducible are isotherm measurements in metal–organic frameworks?” In: Chemistry of Materials 29.24 (2017), pp. 10487–10495.
  25. Monya Baker. “1,500 scientists lift the lid on reproducibility”. In: Nature 533.7604 (2016).
  26. Richard Van Noorden. “More than 10,000 research papers were retracted in 2023—a new record”. In: Nature 624.7992 (2023), pp. 479–481.
  27. Richard Van Noorden. “How big is science’s fake-paper problem?” In: Nature 623.7987 (2023), pp. 466–467.
  28. Jennifer A Byrne et al. “Protection of the human gene research literature from contract cheating organizations known as research paper mills”. In: Nucleic Acids Research 50.21 (2022), pp. 12058–12070.
  29. Jennifer A Byrne and Cyril Labb´e. “Striking similarities between publications from China describing single gene knockdown experiments in human cancer cell lines”. In: Scientometrics 110.3 (2017), pp. 1471–1493.
  30. Lulin Chen et al. “Knowledge, attitudes and practices about research misconduct among medical residents in southwest China: a cross-sectional study”. In: BMC Medical Education 24.1 (2024), p. 284.
  31. Richard Stone. “In Iran, a shady market for papers flourishes”. In: Science (2016).
  32. Pratap R Patnaik. “Scientific misconduct in India: Causes and perpetuation”. In: Science and Engineering Ethics 22.4 (2016), pp. 1245–1249.
  33. Jorge E Hirsch. “An index to quantify an individual’s scientific research output”. In: Proceedings of the National academy of Sciences 102.46 (2005), pp. 16569–16572.
  34. Eugene Garfield. “The history and meaning of the journal impact factor”. In: JAMA 295.1 (2006), pp. 90–93.
  35. Marilyn Strathern. “‘Improving ratings’: audit in the British University system”. In: European review 5.3 (1997), pp. 305–321.
  36. Charles AE Goodhart. Problems of monetary management: the UK experience. Springer, 1984.
  37. Lutz Bornmann, Robin Haunschild, and Rüdiger Mutz. “Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases”. In: Humanities and Social Sciences Communications 8.1 (2021), pp. 1–15.

One response to “The scientific enterprise is a human enterprise”

  1. Sarah Elaine Eaton, Ph.D. Avatar

    Congratulations on recently defending your thesis — and for this excellent recap!

    Like

Leave a comment