Tools for Reproducible Research

(View Complete Item Description)

Course summary A minimal standard for data analysis and other scientific computations is that they be reproducible: that the code and data are assembled in a way so that another group can re-create all of the results (e.g., the figures in a paper). The importance of such reproducibility is now widely recognized, but it is still not so widely practiced as it should be, in large part because many computational scientists (and particularly statisticians) have not fully adopted the required tools for reproducible research. In this course, we will discuss general principles for reproducible research but will focus primarily on the use of relevant tools (particularly make, git, and knitr), with the goal that the students leave the course ready and willing to ensure that all aspects of their computational research (software, data analyses, papers, presentations, posters) are reproducible.

Material Type: Full Course

Author: Karl Broman

Reproducible Research Methods

(View Complete Item Description)

This is the website for the Autumn 2014 course “Reproducible Research Methods” taught by Eric C. Anderson at NOAA’s Southwest Fisheries Science Center. The course meets on Tuesdays and Thursdays from 3:30 to 4:30 PM in Room 188 of the Fisheries Ecology Division. It runs from Oct 7 to December 18. The goal of this course is for scientists, researchers, and students to learn: to write programs in the R language to manipulate and analyze data, to integrate data analysis with report generation and article preparation using knitr, to work fluently within the Rstudio integrated development environment for R, to use git version control software and GitHub to effectively manage source code, collaborate efficiently with other researchers, and neatly package their research.

Material Type: Full Course

Author: Eric C. Anderson

Reproducible Science Curriculum Lesson for Publication

(View Complete Item Description)

Workshop goals - Why are we teaching this - Why is this important - For future and current you - For research as a whole - Lack of reproducibility in research is a real problem Materials and how we'll use them - Workshop landing page, with - links to the Materials - schedule Structure oriented along the Four Facets of Reproducibility: - Documentation - Organization - Automation - Dissemination Will be available after the Workshop How this workshop is run - This is a Carpentries Workshop - that means friendly learning environment - Code of Conduct - active learning - work with the people next to you - ask for help

Material Type: Module

Authors: Dave Clements, Hilmar Lapp, Karen Cranston

Reproducible Science Curriculum Lesson for Automation

(View Complete Item Description)

Material Type: Module

Authors: François Michonneau, Kim Gilbert, Matt Pennell

Reproducible Science Curriculum Lesson for Version Control

(View Complete Item Description)

Material Type: Module

Authors: Ciera Martinez, Hilmar Lapp, Karen Cranston

Reproducible Science Curriculum Lesson for Literate Programming

(View Complete Item Description)

Material Type: Module

Authors: Ciera Martinez, Courtney Soderberg, Hilmar Lapp, Jennifer Bryan, Kristina Riemer, Naupaka Zimmerman

Reproducible Science Curriculum Lesson for Organization

(View Complete Item Description)

Material Type: Module

Authors: Ciera Martinez, Courtney Soderberg, Hilmar Lapp, Jennifer Bryan, Kristina Riemer, Naupaka Zimmerman

Introduction materials for Reproducible Research Curriculum

(View Complete Item Description)

Material Type: Module

Authors: Kristina Riemer, Mine Çetinkaya-Rundel, Pat Schloss, Paul Magwene

Reproducible Science Workshop

(View Complete Item Description)

Material Type: Module

Author: Dan Leehr

Research project initialization and organization following reproducible research guidelines

(View Complete Item Description)

Material Type: Module

Author: Hilmar Lapp

Statistics of DOOM

(View Complete Item Description)

About Stats of DOOM Support Statistics of DOOM! This page and the YouTube channel to help people learn statistics by including step-by-step instructions for SPSS, R, Excel, and other programs. Demonstrations are provided including power, data screening, analysis, write up tips, effect sizes, and graphs. Help guides and course materials are also provided! When I originally started posting my videos on YouTube, I never really thought people would be interested in them - minus a few overachieving students. I am glad that I've been able to help so many folks! I have taught many statistics courses - you can view full classes by using the Learn tab in the top right. I have also taught cognitive and language courses, some with coding (see the NLP and Language Modeling courses), and some without (see Other Courses). I hope this website provides structure to all my materials for you to use for yourself or your classroom. Each page has an example syllabus, video lectures laid out with that syllabus (if I have them!), and links to the appropriate materials. Any broken links can be reported by sending me an email (linked at the bottom). Stats Tools was designed for learning statistics, which morphed into learning coding, open science, statistics, and more! Recommendations, comments, and other questions are welcome with the general suggestion to post on the specific video or page you have a question on. I do my best to answer, but also work a full-time job. These resources wouldn't be possible without the help of many fantastic people over the years including: All the Help Desk TAs: Rachel E. Monroe, Marshall Beauchamp, Louis Oberdiear, Simone Donaldson, Kim Koch, Jessica Willis, Samantha Hunter, Flora Forbes, Tabatha Hopke Research colleagues: K.D. Valentine, John E. Scofield, Jeff Pavlacic And more! Pages with specific content made by others are noted on that page.

Material Type: Lecture

Author: Erin M. Buchanan

Reproducible Research

(View Complete Item Description)

Modern scientific research takes advantage of programs such as Python and R that are open source. As such, they can be modified and shared by the wider community. Additionally, there is added functionality through additional programs and packages, such as IPython, Sweave, and Shiny. These packages can be used to not only execute data analyses, but also to present data and results consistently across platforms (e.g., blogs, websites, repositories and traditional publishing venues). The goal of the course is to show how to implement analyses and share them using IPython for Python, Sweave and knitr for RStudio to create documents that are shareable and analyses that are reproducible. Course outline is as follows: 1) Use of IPython notebooks to demonstrate and explain code, visualize data, and display analysis results 2) Applications of Python modules such as SymPy, NumPy, pandas, and SciPy 3) Use of Sweave to demonstrate and explain code, visualize data, display analysis results, and create documents and presentations 4) Integration and execution of IPython and R code and analyses using the IPython notebook

Material Type: Full Course

Author: Christopher Ahern

Open Source Tools: Train-the-Trainer Course

(View Complete Item Description)

An ecosystem of free open source tools for improving the rigor and reproducibility of research is thriving. Information professionals at research institutions must stay informed about what tools are available and how they compare. Ideally, information professionals can also onboard researchers to kickstart adoption of these tools. However, developing quality curriculum to train researchers on new tools requires expertise in the tool itself, which leaves many researchers without training on tools that may benefit their research. This course will train participants to run hands-on, quality modules designed to onboard researchers to four free open source tools. Participants will experience each module, practice the exercises, and explore the training material needed to run the module themselves. An instructor guide that includes the module outline, objectives, description, frequently asked questions, pre- and post-participant surveys, target audience, and instructions for running a successful module is provided for each tool taught. This course will train participants to run modules on unique aspects of four free open source tools for researchers: Binder: Share your computational environment, code, and research notebooks. Renku: Document and share your analysis pipelines. Open Science Framework: Create a centralized, structured workspace for your research materials. KnitR: Knit your R code with your analysis narrative in one executable research notebook and capture your dependencies. Many participants already run short-duration training events at their institutions. This course is ideal for those participants who wish to improve the quality and variety of the training they already offer to researchers. Participants who do not currently run short-duration training events at their institutions will benefit from the course by learning an accessible and efficient way of getting started with these four modules.

Material Type: Full Course

Authors: April Clyburne-Sherin, Seth Ariel Green

The case for formal methodology in scientific reform

(View Complete Item Description)

Current attempts at methodological reform in sciences come in response to an overall lack of rigor in methodological and scientific practices in experimental sciences. However, some of these reform attempts suffer from the same mistakes and over-generalizations they purport to address. Considering the costs of allowing false claims to become canonized, we argue for more rigor and nuance in methodological reform. By way of example, we present a formal analysis of three common claims in the metascientific literature: (a) that reproducibility is the cornerstone of science; (b) that data must not be used twice in any analysis; and (c) that exploratory projects are characterized by poor statistical practice. We show that none of these three claims are correct in general and we explore when they do and do not hold.

Material Type: Primary Source

Authors: Berna Devezer, Danielle J. Navarro, Erkan Ozge Buzbas, Joachim Vandekerckhove

Openness and Reproducibility: Insights from a Model-Centric Approach

(View Complete Item Description)

This paper investigates the conceptual relationship between openness and reproducibility using a model-centric approach, heavily informed by probability theory and statistics. We first clarify the concepts of reliability, auditability, replicability, and reproducibility–each of which denotes a potential scientific objective. Then we advance a conceptual analysis to delineate the relationship between open scientific practices and these objectives. Using the notion of an idealized experiment, we identify which components of an experiment need to be reported and which need to be repeated to achieve the relevant objective. The model-centric framework we propose aims to contribute precision and clarity to the discussions surrounding the so-called reproducibility crisis.

Material Type: Primary Source

Authors: Berna Devezer, Bert Baumgaertner, Erkan Ozge Buzbas, Luis G. Nardin

Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity

(View Complete Item Description)

Consistent confirmations obtained independently of each other lend credibility to a scientific result. We refer to results satisfying this consistency as reproducible and assume that reproducibility is a desirable property of scientific discovery. Yet seemingly science also progresses despite irreproducible results, indicating that the relationship between reproducibility and other desirable properties of scientific discovery is not well understood. These properties include early discovery of truth, persistence on truth once it is discovered, and time spent on truth in a long-term scientific inquiry. We build a mathematical model of scientific discovery that presents a viable framework to study its desirable properties including reproducibility. In this framework, we assume that scientists adopt a model-centric approach to discover the true model generating data in a stochastic process of scientific discovery. We analyze the properties of this process using Markov chain theory, Monte Carlo methods, and agent-based modeling. We show that the scientific process may not converge to truth even if scientific results are reproducible and that irreproducible results do not necessarily imply untrue results. The proportion of different research strategies represented in the scientific population, scientists’ choice of methodology, the complexity of truth, and the strength of signal contribute to this counter-intuitive finding. Important insights include that innovative research speeds up the discovery of scientific truth by facilitating the exploration of model space and epistemic diversity optimizes across desirable properties of scientific discovery.

Material Type: Primary Source

Authors: Berna Devezer, Bert Baumgaertner, Erkan Ozge Buzbas, Luis G. Nardin

Implementations are not specifications: Specification, replication and experimentation in computational cognitive modeling

(View Complete Item Description)

Contemporary methods of computational cognitive modeling have recently been criticized by Addyman and French (2012) on the grounds that they have not kept up with developments in computer technology and human–computer interaction. They present a manifesto for change according to which, it is argued, modelers should devote more effort to making their models accessible, both to non-modelers (with an appropriate easy-to-use user interface) and modelers alike. We agree that models, like data, should be freely available according to the normal standards of science, but caution against confusing implementations with specifications. Models may embody theories, but they generally also include implementation assumptions. Cognitive modeling methodology needs to be sensitive to this. We argue that specification, replication and experimentation are methodological approaches that can address this issue.

Material Type: Primary Source

Authors: Olivia Guest, Richard P. Cooper

Is preregistration worthwhile?

(View Complete Item Description)

Proponents of preregistration argue that, among other benefits, it improves the diagnosticity of statistical tests. In the strong version of this argument, preregistration does this by solving statistical problems, such as family-wise error rates. In the weak version, it nudges people to think more deeply about their theories, methods, and analyses. We argue against both: the diagnosticity of statistical tests depend entirely on how well statistical models map onto underlying theories, and so improving statistical techniques does little to improve theories when the mapping is weak. There is also little reason to expect that preregistration will spontaneously help researchers to develop better theories (and, hence, better methods and analyses).

Material Type: Primary Source

Authors: Aba Szollosi, Chris Donkin, Danielle J. Navarro, David Kellen, Iris van Rooij, Richard Shiffrin, Trisha van Zandt

Paths in strange places: A comment on preregistration

(View Complete Item Description)

This is an archived version of a blog post on preregistration. The first half of the post argues that there is not a strong justification for preregistration as a tool to solve problems with statistical inference (p-hacking); the second half argues that preregistration has a stronger justification as one tool (among many) that can aid scientists in documenting our projects.

Material Type: Primary Source

Author: Danielle J. Navarro

Preregistration in infant research - A primer

(View Complete Item Description)

Preregistration, the act of specifying a research plan in advance, is becoming a central step in the way science is conducted. Preregistration for infant researchers might be different than in other fields, due to the specific challenges having to do with testing infants. Infants are a hard-to-reach population, usually yielding small sample sizes, they have a low attention span which usually can limit the number of trials, and they can be excluded based on hard to predict complications (e.g., parental interference, fussiness). In addition, as effects themselves potentially change with age and population, it is hard to calculate an a priori effect size. At the same time, these very factors make preregistration in infant studies a valuable tool. A priori examination of the planned study, including the hypotheses, sample size, and resulting statistical power, increase the credibility of single studies and thus add value to the field. It might arguably also improve explicit decision-making to create better studies. We present an in-depth discussion of the issues uniquely relevant to infant researchers, and ways to contend with them in preregistration and study planning. We provide recommendations to researchers interested in following current best practices.

Material Type: Primary Source

Authors: Christina Bergmann, Naomi Havron, Sho Tsuji

Researchers

All resources in Researchers

Tools for Reproducible Research

Reproducible Research Methods

Reproducible Science Curriculum Lesson for Publication

Reproducible Science Curriculum Lesson for Automation

Reproducible Science Curriculum Lesson for Version Control

Reproducible Science Curriculum Lesson for Literate Programming

Reproducible Science Curriculum Lesson for Organization

Introduction materials for Reproducible Research Curriculum

Reproducible Science Workshop

Research project initialization and organization following reproducible research guidelines

Statistics of DOOM

Reproducible Research

Open Source Tools: Train-the-Trainer Course

The case for formal methodology in scientific reform

Openness and Reproducibility: Insights from a Model-Centric Approach

Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity

Implementations are not specifications: Specification, replication and experimentation in computational cognitive modeling

Is preregistration worthwhile?

Paths in strange places: A comment on preregistration

Preregistration in infant research - A primer