All resources in Researchers

Discrimination and Collaboration in Science

(View Complete Item Description)

We use game theoretic models to take an in-depth look at the dynamics of discrimination and academic collaboration. We find that in collaboration networks, small minority groups may be more likely to end up being discriminated against while collaborating. We also find that discrimination can lead members of different social groups to mostly collaborate with in-group members, decreasing the effective diversity of the social network. Drawing on previous work, we discuss how decreases in the diversity of scientific collaborations might negatively impact the progress of epistemic communities.

Material Type: Reading

Authors: Cailin O'Connor, Hannah Rubin

Should I test more babies? Solutions for transparent data peeking

(View Complete Item Description)

Research with infants is often slow and time-consuming, so infant researchers face great pressure to use the available participants in an efficient way. One strategy that researchers sometimes use to optimize efficiency is data peeking (or “optional stopping”), that is, doing a preliminary analysis (whether a formal significance test or informal eyeballing) of collected data. Data peeking helps researchers decide whether to abandon or tweak a study, decide that a sample is complete, or decide to continue adding data points. Unfortunately, data peeking can have negative consequences such as increased rates of false positives (wrongly concluding that an effect is present when it is not). We argue that, with simple corrections, the benefits of data peeking can be harnessed to use participants more efficiently. We review two corrections that can be transparently reported: one can be applied at the beginning of a study to lay out a plan for data peeking, and a second can be applied after data collection has already started. These corrections are easy to implement in the current framework of infancy research. The use of these corrections, together with transparent reporting, can increase the replicability of infant research.

Material Type: Reading

Authors: Esther Schott, Krista Byers-Heinlein, Mijke Rhemtulla

Building a collaborative Psychological Science: Lessons learned from ManyBabies 1

(View Complete Item Description)

The field of infancy research faces a difficult challenge: some questions require samples that are simply too large for any one lab to recruit and test. ManyBabies aims to address this problem by forming large-scale collaborations on key theoretical questions in developmental science, while promoting the uptake of Open Science practices. Here, we look back on the first project completed under the ManyBabies umbrella – ManyBabies 1 – which tested the development of infant-directed speech preference. Our goal is to share the lessons learned over the course of the project and to articulate our vision for the role of large-scale collaborations in the field. First, we consider the decisions made in scaling up experimental research for a collaboration involving 100+ researchers and 70+ labs. Next, we discuss successes and challenges over the course of the project, including: protocol design and implementation, data analysis, organizational structures and collaborative workflows, securing funding, and encouraging broad participation in the project. Finally, we discuss the benefits we see both in ongoing ManyBabies projects and in future large-scale collaborations in general, with a particular eye towards developing best practices and increasing growth and diversity in infancy research and psychological science in general. Throughout the paper, we include first-hand narrative experiences, in order to illustrate the perspectives of researchers playing different roles within the project. While this project focused on the unique challenges of infant research, many of the insights we gained can be applied to large-scale collaborations across the broader field of psychology.

Material Type: Reading

Authors: Casey Lew-Williams, Catherine Davies, Christina Bergmann, Connor P. G. Waddell, Jessica E. Kosie, J. Kiley Hamlin, Jonathan F. Kominsky, Krista Byers-Heinlein, Leher Singh, Liquan Liu, Martin Zettersten, Meghan Mastroberardino, Melanie Soderstrom, Melissa Kline, Michael C. Frank

Open Science in Software Engineering

(View Complete Item Description)

Open science describes the movement of making any research artefact available to the public and includes, but is not limited to, open access, open data, and open source. While open science is becoming generally accepted as a norm in other scientific disciplines, in software engineering, we are still strugglingin adapting open science to the particularities of our discipline, rendering progress in our scientific community cumbersome. In this chapter, we reflect upon the essentials in open science for software engineering including what open science is, why we should engage in it, and how we should do it. We particularly draw from our experiences made as conference chairs implementing open science initiatives and as researchers actively engaging in open science to critically discuss challenges and pitfalls, and to address more advanced topics such as how and under which conditions to share preprints, what infrastructure and licence model to cover, or how do it within the limitations of different reviewing models, such as double-blind reviewing. Our hope is to help establishing a common ground and to contribute to make open science a norm also in software engineering.

Material Type: Reading

Authors: Daniel Graziotin, Daniel Mendez, Heidi Seibold, Stefan Wagner

The multiplicity of analysis strategies jeopardizes replicability: lessons learned across disciplines

(View Complete Item Description)

For a given research question, there are usually a large variety of possible analysis strategies acceptable according to the scientific standards of the field, and there are concerns that this multiplicity of analysis strategies plays an important role in the non-replicability of research findings. Here, we define a general framework on common sources of uncertainty arising in computational analyses that lead to this multiplicity, and apply this framework within an overview of approaches proposed across disciplines to address the issue. Armed with this framework, and a set of recommendations derived therefrom, researchers will be able to recognize strategies applicable to their field and use them to generate findings more likely to be replicated in future studies, ultimately improving the credibility of the scientific process.

Material Type: Reading

Authors: Anne-Laure Boulesteix, Felix Schönbrodt, Ralf Elsas, Rory Wilson, Sabine Hoffman, Ulrich Strasser

OpenML: An R Package to Connect to the Machine Learning Platform OpenML

(View Complete Item Description)

OpenML is an online machine learning platform where researchers can easily share data, machine learning tasks and experiments as well as organize them online to work and collaborate more efficiently. In this paper, we present an R package to interface with the OpenML platform and illustrate its usage in combination with the machine learning R package mlr (Bischl et al, 2016). We show how the OpenML package allows R users to easily search, download and upload data sets and machine learning tasks. Furthermore, we also show how to upload results of experiments, share them with others and download results from other users. Beyond ensuring reproducibility of results, the OpenML platform automates much of the drudge work, speeds up research, facilitates collaboration and increases the users’ visibility online.

Material Type: Reading

Authors: Benjamin Hofner, Bernd Bischl, Dominik Kirchhoff, Giuseppe Casalicchio, Heidi Seibold, Jakob Bossek, Joaquin Vanschoren, Michel Lang, Pascal Kerschke

The Methodologists: a Unique Category of Scientific Actors

(View Complete Item Description)

This essay introduces a new analytical category of scientific actors: the methodologists. These actors are distinguished by their tendency to continue to probing scientific objects that their peers consider to be settled. The methodologists are a useful category of actors for science and technology studies (STS) scholars to follow because they reveal contingencies and uncertainties in taken-for-granted science. Identifying methodologists is useful for STS analysts seeking a way into science in moments when it is no longer “in the making” or there is little active controversy. Studying methodologists is also useful for scholars seeking to understand the genesis of scientific controversies, particularly controversies about long-established methods, facts, or premises.

Material Type: Reading

Author: Nicole C. Nelson

Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless

(View Complete Item Description)

Current concerns about reproducibility in many research communities can be traced back to a high value placed on empirical reproducibility of the physical details of scientific experiments and observations. For example, the detailed descriptions by 17th century scientist Robert Boyle of his vacuum pump experiments are often held to be the ideal of reproducibility as a cornerstone of scientific practice. Victoria Stodden has claimed that the computer is an analog for Boyle’s pump – another kind of scientific instrument that needs detailed descriptions of how it generates results. In the place of Boyle’s hand-written notes, we now expect code in open source programming languages to be available to enable others to reproduce and extend computational experiments. In this paper we show that there is another genealogy for reproducibility, starting at least from Euclid, in the production of proofs in mathematics. Proofs have a distinctive quality of being necessarily reproducible, and are the cornerstone of mathematical science. However, the task of the modern mathematical scientist has drifted from that of blackboard rhetorician, where the craft of proof reigned, to a scientific workflow that now more closely resembles that of an experimental scientist. So, what is proof in modern mathematics? And, if proof is unattainable in other fields, what is due scientific diligence in a computational experimental environment? How do we measure truth in the context of uncertainty? Adopting a manner of Lakatosian conversant conjecture between two mathematicians, we examine how proof informs our practice of computational statistical inquiry. We propose that a reorientation of mathematical science is necessary so that its reproducibility can be readily assessed.

Material Type: Reading

Authors: Ben Marwick, Charles T. Gray

code::proof: Prepare for most weather conditions

(View Complete Item Description)

Computational tools for data analysis are being released daily on repositories such as the Comprehensive R Archive Network. How we integrate these tools to solve a problem in research is increasingly complex and requiring frequent updates. In this manuscript we propose a toolchain walkthrough, an opinionated documentation of a scientific workflow. As a practical complement to our proof-based argument (Gray and Marwick, arXiv, 2019) for reproducible data analysis, here we focus on the practicality of setting up a research compendia with unit tests as a measure of code::proof, a reproducible research compendia that provides a measure of confidence in computational algorithms.

Material Type: Reading

Author: Charles T. Gray

An empirical analysis of journal policy effectiveness for computational reproducibility

(View Complete Item Description)

A key component of scientific communication is sufficient information for other researchers in the field to reproduce published findings. For computational and data-enabled research, this has often been interpreted to mean making available the raw data from which results were generated, the computer code that generated the findings, and any additional information needed such as workflows and input parameters. Many journals are revising author guidelines to include data and code availability. This work evaluates the effectiveness of journal policy that requires the data and code necessary for reproducibility be made available postpublication by the authors upon request. We assess the effectiveness of such a policy by (i) requesting data and code from authors and (ii) attempting replication of the published findings. We chose a random sample of 204 scientific papers published in the journal Science after the implementation of their policy in February 2011. We found that we were able to obtain artifacts from 44% of our sample and were able to reproduce the findings for 26%. We find this policy—author remission of data and code postpublication upon request—an improvement over no policy, but currently insufficient for reproducibility.

Material Type: Reading

Authors: Jennifer Seiler, Victoria Stodden, Zhaokun Ma

Praxis of Reproducible Computational Science

(View Complete Item Description)

Among the top challenges of reproducible computational science are: (1) creation, curation, usage and publication of research software; (2) acceptance, adoption and standardization of open-science practices; (3) misalignment with academic incentive structures and institutional processes for career progression. I will address here mainly the first two, proposing a praxis of reproducible computational science.

Material Type: Reading

Author: Lorena A. Barba

Reproducible Workflow on a Public Cloud for Computational Fluid Dynamics

(View Complete Item Description)

In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The configuration files for these tools are part of an expanded “reproducibility package” that includes workflow definitions for cloud computing, in addition to input files and instructions. This facilitates re-creating the cloud environment to re-run the computations under the same conditions. Although cloud providers have improved their offerings, many researchers using high-performance computing (HPC) are still skeptical about cloud computing. Thus, we ran benchmarks for tightly coupled applications to confirm that the latest HPC nodes of Microsoft Azure are indeed a viable alternative to traditional on-site HPC clusters. We also show that cloud offerings are now adequate to complete computational fluid dynamics studies with in-house research software that uses parallel computing with GPUs. Finally, we share with the community what we have learned from nearly two years of using Azure cloud to enhance transparency and reproducibility in our computational simulations.

Material Type: Reading

Authors: Lorena A. Barba, Olivier Mesnard

cleanBib - measure gender bias in your citations

(View Complete Item Description)

The goal of the coding notebook is to clean your .bib file to only contain references that you have cited in your manuscript. This cleaned .bib will then be used to generate a data table of full first names that will be used to query the probabilistic gender (Gender API) and race (ethnicolr) classifier. Proportions of the predicted gender for first and last author pairs (man/man, man/woman, woman/man, and woman/woman) will be calculated.

Material Type: Interactive

Authors: Ann Sizemore Blevins, Christopher Camp, Cleanthis Michael, Dale Zhou, Eli Cornblath, Erin Teich, Jeni Stiso, Jordan Dworkin, Kendra Oudyk, Max Bertolero, virtualmarioe

Descriptive Psychology and Völkerpsychologie—in the Contexts of Historicism, Relativism, and Naturalism

(View Complete Item Description)

This special issue focuses on two important forms of psychology that emerged in late nineteenth-century German-speaking academia: Völkerpsychologie and descriptive psychology.1 The main representatives of these currents were Moritz Lazarus, Chaim H. Steinthal, and Wilhelm Dilthey. They had many followers, including Hermann Cohen, Gustav Glogau, Georg Simmel, Wilhelm Wundt, Karl Mannheim, Paul Natorp, Rudolf Carnap, Eduard Spranger, and Erich Rothacker.

Material Type: Reading

Authors: Christian Damböck, Martin Kusch, Uljana Feest

Why replication is overrated

(View Complete Item Description)

Current debates about the replication crisis in psychology take it for granted that direct replication is valuable, largely focusing on its role in uncovering questionable statistical practices. This article takes a broader look at the notion of replication in psychological experiments. It is argued that all experimentation/replication involves individuation judgments and that research in experimental psychology frequently turns on probing the adequacy of such judgments. In this vein, I highlight the ubiquity of conceptual and material questions in research, arguing that replication has its place but is not as central to psychological research as it is sometimes taken to be.

Material Type: Reading

Author: Uljana Feest

Mapping the discursive dimensions of the reproducibility crisis: A mixed methods analysis

(View Complete Item Description)

Addressing issues with the reproducibility of results is critical for scientific progress, but conflicting ideas about the sources of and solutions to irreproducibility are a barrier to change. Prior work has attempted to address this problem by creating analytical definitions of reproducibility. We take a novel empirical, mixed methods approach to understanding variation in reproducibility conversations, which yields a map of the discursive dimensions of these conversations. This analysis demonstrates that concerns about the incentive structure of science, the transparency of methods and data, and the need to reform academic publishing form the core of reproducibility discussions. We also identify three clusters of discussion that are distinct from the main group: one focused on reagents, another on statistical methods, and a final cluster focused the heterogeneity of the natural world. Although there are discursive differences between scientific and popular articles, there are no strong differences in how scientists and journalists write about the reproducibility crisis. Our findings show that conversations about reproducibility have a clear underlying structure, despite the broad scope and scale of the crisis. Our map demonstrates the value of using qualitative methods to identify the bounds and features of reproducibility discourse, and identifies distinct vocabularies and constituencies that reformers should engage with to promote change.

Material Type: Reading

Authors: Julie Chung, Kelsey Ichikawa, Momin Malik, Nicole C. Nelson

Improving the credibility of empirical legal research: practical suggestions for researchers, journals, and law schools

(View Complete Item Description)

Fields closely related to empirical legal research are enhancing their methods to improve the credibility of their findings. This includes making data, analysis code, and other materials openly available, and preregistering studies. Empirical legal research appears to be lagging behind other fields. This may be due, in part, to a lack of meta-research and guidance on empirical legal studies. The authors seek to fill that gap by evaluating some indicators of credibility in empirical legal research, including a review of guidelines at legal journals. They then provide both general recommendations for researchers, and more specific recommendations aimed at three commonly used empirical legal methods: case law analysis, surveys, and qualitative studies. They end with suggestions for policies and incentive systems that may be implemented by journals and law schools.

Material Type: Reading

Authors: Alexander DeHaven, Alex Holcombe, Crystal N. Steltenpohl, David Mellor, Jason Chin, Justin Pickett, Kathryn Zeiler, Simine Vazire, Tobias Heycke

Funder Mandates and Trends in Open Science

(View Complete Item Description)

Research funders are requiring or strongly encouraging open and reproducible methods at increased rates, leading researchers to rely on more data management tools while institutions continue to provide services to support them. Research support staff adapt quickly to guide their stakeholders and provide resources, while administrators must find methods to determine adoption and success across the community. In this webinar, COS Director of Policy David Mellor shares an update on funder expectations like preregistration, data sharing, and open access outputs, as well as strategies to highlight these practices in funding proposals. COS Director of Product Nici Pfeiffer also discusses OSF features that enable researchers to meet and exceed these expectations, as well as provide unique activity insights for administrators, and how COS continues to work with the funder and institution communities to facilitate transparent practices across the lifecycle.

Material Type: Primary Source

Streamline your research workflow using OSF and storage integrations

(View Complete Item Description)

Open Science accelerates the discovery of cures and advances new knowledge by improving the rigor and transparency of research and resusabilty of resulting data, materials, and code. But acceleration requires efficiency in managing the research lifecycle with integrated tools that work together to reduce the burden on investigators to manage, collaborate, and share as they work. OSF provides the interface for research collaboration, with increased efficiency for the research team through integrated storage provider tools and citation managers. Get familiar with the OSF project interface, understand how it can accelerate collaboration, transparency, sharing, and reuse of research outputs, and take a deep dive into integrations that connect directly into OSF for efficiency in researcher workflows. Visit osf.io and help.osf.io to learn more.

Material Type: Primary Source

Your Questions Answered: How to Retain Copyright While Others Distribute and Build Upon Your Work

(View Complete Item Description)

In this webinar, a panel discusses licensing options, fundamentals in choosing a license for your research, and answers questions about licensing scholarship. The panel consists of moderator Joanna Schimizzi, Professional Learning Specialist at the Institute for the Study of Knowledge Management in Education, along with panelists Brandon Butler, Director of Information Policy, University of Virginia Library and Becca Neel, Assistant Director for Resource Management & User Experience, University of Southern Indiana for an informative discussion on licensing your research. Accessible and further resources for this event are available on OSF: https://osf.io/s4wdf/

Material Type: Lesson