In a new effort to make our research transparent and reproducible by …
In a new effort to make our research transparent and reproducible by others, we developed a workflow to run and share computational studies on the public cloud Microsoft Azure. It uses Docker containers to create an image of the application software stack. We also adopt several tools that facilitate creating and managing virtual machines on compute nodes and submitting jobs to these nodes. The configuration files for these tools are part of an expanded “reproducibility package” that includes workflow definitions for cloud computing, in addition to input files and instructions. This facilitates re-creating the cloud environment to re-run the computations under the same conditions. Although cloud providers have improved their offerings, many researchers using high-performance computing (HPC) are still skeptical about cloud computing. Thus, we ran benchmarks for tightly coupled applications to confirm that the latest HPC nodes of Microsoft Azure are indeed a viable alternative to traditional on-site HPC clusters. We also show that cloud offerings are now adequate to complete computational fluid dynamics studies with in-house research software that uses parallel computing with GPUs. Finally, we share with the community what we have learned from nearly two years of using Azure cloud to enhance transparency and reproducibility in our computational simulations.
Background There is wide agreement in the biomedical research community that research …
Background There is wide agreement in the biomedical research community that research data sharing is a primary ingredient for ensuring that science is more transparent and reproducible. Publishers could play an important role in facilitating and enforcing data sharing; however, many journals have not yet implemented data sharing policies and the requirements vary widely across journals. This study set out to analyze the pervasiveness and quality of data sharing policies in the biomedical literature. Methods The online author’s instructions and editorial policies for 318 biomedical journals were manually reviewed to analyze the journal’s data sharing requirements and characteristics. The data sharing policies were ranked using a rubric to determine if data sharing was required, recommended, required only for omics data, or not addressed at all. The data sharing method and licensing recommendations were examined, as well any mention of reproducibility or similar concepts. The data was analyzed for patterns relating to publishing volume, Journal Impact Factor, and the publishing model (open access or subscription) of each journal. Results A total of 11.9% of journals analyzed explicitly stated that data sharing was required as a condition of publication. A total of 9.1% of journals required data sharing, but did not state that it would affect publication decisions. 23.3% of journals had a statement encouraging authors to share their data but did not require it. A total of 9.1% of journals mentioned data sharing indirectly, and only 14.8% addressed protein, proteomic, and/or genomic data sharing. There was no mention of data sharing in 31.8% of journals. Impact factors were significantly higher for journals with the strongest data sharing policies compared to all other data sharing criteria. Open access journals were not more likely to require data sharing than subscription journals. Discussion Our study confirmed earlier investigations which observed that only a minority of biomedical journals require data sharing, and a significant association between higher Impact Factors and journals with a data sharing requirement. Moreover, while 65.7% of the journals in our study that required data sharing addressed the concept of reproducibility, as with earlier investigations, we found that most data sharing policies did not provide specific guidance on the practices that ensure data is maximally available and reusable.
The objective of this study was to evaluate the nature and extent …
The objective of this study was to evaluate the nature and extent of reproducible and transparent research practices in neurology publications. Methods The NLM catalog was used to identify MEDLINE-indexed neurology journals. A PubMed search of these journals was conducted to retrieve publications over a 5-year period from 2014 to 2018. A random sample of publications was extracted. Two authors conducted data extraction in a blinded, duplicate fashion using a pilot-tested Google form. This form prompted data extractors to determine whether publications provided access to items such as study materials, raw data, analysis scripts, and protocols. In addition, we determined if the publication was included in a replication study or systematic review, was preregistered, had a conflict of interest declaration, specified funding sources, and was open access. Results Our search identified 223,932 publications meeting the inclusion criteria, from which 400 were randomly sampled. Only 389 articles were accessible, yielding 271 publications with empirical data for analysis. Our results indicate that 9.4% provided access to materials, 9.2% provided access to raw data, 0.7% provided access to the analysis scripts, 0.7% linked the protocol, and 3.7% were preregistered. A third of sampled publications lacked funding or conflict of interest statements. No publications from our sample were included in replication studies, but a fifth were cited in a systematic review or meta-analysis. Conclusions Currently, published neurology research does not consistently provide information needed for reproducibility. The implications of poor research reporting can both affect patient care and increase research waste. Collaborative intervention by authors, peer reviewers, journals, and funding sources is needed to mitigate this problem.
Currently, there is a growing interest in ensuring the transparency and reproducibility …
Currently, there is a growing interest in ensuring the transparency and reproducibility of the published scientific literature. According to a previous evaluation of 441 biomedical journals articles published in 2000–2014, the biomedical literature largely lacked transparency in important dimensions. Here, we surveyed a random sample of 149 biomedical articles published between 2015 and 2017 and determined the proportion reporting sources of public and/or private funding and conflicts of interests, sharing protocols and raw data, and undergoing rigorous independent replication and reproducibility checks. We also investigated what can be learned about reproducibility and transparency indicators from open access data provided on PubMed. The majority of the 149 studies disclosed some information regarding funding (103, 69.1% [95% confidence interval, 61.0% to 76.3%]) or conflicts of interest (97, 65.1% [56.8% to 72.6%]). Among the 104 articles with empirical data in which protocols or data sharing would be pertinent, 19 (18.3% [11.6% to 27.3%]) discussed publicly available data; only one (1.0% [0.1% to 6.0%]) included a link to a full study protocol. Among the 97 articles in which replication in studies with different data would be pertinent, there were five replication efforts (5.2% [1.9% to 12.2%]). Although clinical trial identification numbers and funding details were often provided on PubMed, only two of the articles without a full text article in PubMed Central that discussed publicly available data at the full text level also contained information related to data sharing on PubMed; none had a conflicts of interest statement on PubMed. Our evaluation suggests that although there have been improvements over the last few years in certain key indicators of reproducibility and transparency, opportunities exist to improve reproducible research practices across the biomedical literature and to make features related to reproducibility more readily visible in PubMed.
The replicability of research findings has recently been disputed across multiple scientific …
The replicability of research findings has recently been disputed across multiple scientific disciplines. In constructive reaction, the research culture in psychology is facing fundamental changes, but investigations of research practices that led to these improvements have almost exclusively focused on academic researchers. By contrast, we investigated the statistical reporting quality and selected indicators of questionable research practices (QRPs) in psychology students' master's theses. In a total of 250 theses, we investigated utilization and magnitude of standardized effect sizes, along with statistical power, the consistency and completeness of reported results, and possible indications of p-hacking and further testing. Effect sizes were reported for 36% of focal tests (median r = 0.19), and only a single formal power analysis was reported for sample size determination (median observed power 1 − β = 0.67). Statcheck revealed inconsistent p-values in 18% of cases, while 2% led to decision errors. There were no clear indications of p-hacking or further testing. We discuss our findings in the light of promoting open science standards in teaching and student supervision.
Workshop goals - Why are we teaching this - Why is this …
Workshop goals - Why are we teaching this - Why is this important - For future and current you - For research as a whole - Lack of reproducibility in research is a real problem
Materials and how we'll use them - Workshop landing page, with
- links to the Materials - schedule
Structure oriented along the Four Facets of Reproducibility:
How this workshop is run - This is a Carpentries Workshop - that means friendly learning environment - Code of Conduct - active learning - work with the people next to you - ask for help
Confirmation through competent replication is a founding principle of modern science. However, …
Confirmation through competent replication is a founding principle of modern science. However, biomedical researchers are rewarded for innovation, and not for confirmation, and confirmatory research is often stigmatized as unoriginal and as a consequence faces barriers to publication. As a result, the current biomedical literature is dominated by exploration, which to complicate matters further is often disguised as confirmation. Only recently scientists and the public have begun to realize that high-profile research results in biomedicine can often not be replicated. Consequently, confirmation has become central stage in the quest to safeguard the robustness of research findings. Research which is pushing the boundaries of or challenges what is currently known must necessarily result in a plethora of false positive results. Thus, since discovery, the driving force of scientific progress, is unavoidably linked to high false positive rates and cannot support confirmatory inference, dedicated confirmatory investigation is needed for pivotal results. In this chapter I will argue that the tension between the two modes of research, exploration and confirmation, can be resolved if we conceptually and practically separate them. I will discuss the idiosyncrasies of exploratory and confirmatory studies, with a focus on the specific features of their design, analysis, and interpretation.
This lesson in part of Software Carpentry workshop and teach novice programmers …
This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches …
Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches how to analyse and visualise data used by social scientists. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Efforts to Instill the Fundamental Principles of Rigorous ResearchRigorous experimental procedures and …
Efforts to Instill the Fundamental Principles of Rigorous ResearchRigorous experimental procedures and transparent reporting of research results are vital to the continued success of the biomedical enterprise at both the preclinical and the clinical levels; therefore, NINDS convened major stakeholders in October 2018 to discuss how best to encourage rigorous biomedical research practices. The attendees discussed potential improvements to current training resources meant to instill the principles of rigorous research in current and future scientists, ideal attributes of a potential new educational resource, and cultural factors needed to ensure the success of such training. Please see the event website for more information about this workshop, including video recordings of the discussion, or the recent publication summarizing the workshop.Rigor ChampionsAs described in this publication, enthusiastic individuals ("champions") who want to drive improvements in rigorous research practices, transparent reporting, and comprehensive education may come from all career stages and sectors, including undergraduate students, graduate students, postdoctoral fellows, researchers, educators, institutional leaders, journal editors, scientific societies, private industry, and funders. We encouraged champions to organize themselves into intra- and inter-institutional communities to effect change within and across scientific institutions. These communities can then share resources and best practices, propose changes to current training and research infrastructure, build new tools to support better research practices, and support rigorous research on a daily basis.If you are interested learning more, you can join this grassroots online workspace or email us at RigorChampions@nih.gov.Rigor ResourcesIn order to understand the current landscape of training in the principles of rigorous research, NINDS is gathering a list of public resources that are, or can be made, freely accessible to the scientific community and beyond. We hope that compiling these resources will help identify gaps in training and stimulate discussion about proposed improvements and the building of new resources that facilitate training in transparency and other rigorous research practices. Please peruse the resources compiled thus far below, and contact us at RigorChampions@nih.gov to let us know about other potential resources.NINDS does not endorse any of these resources and leaves it to the scientific community to judge their quality.Resources TableCategories of resources listed in the table include Books and Articles, Guidelines and Protocols, Organizations and Training Programs, Software and Other Digital Resources, and Videos and Courses.
The information provided on this website is designed to assist the extramural …
The information provided on this website is designed to assist the extramural community in addressing rigor and transparency in NIH grant applications and progress reports. Scientific rigor and transparency in conducting biomedical research is key to the successful application of knowledge toward improving health outcomes.
Definition Scientific rigor is the strict application of the scientific method to ensure unbiased and well-controlled experimental design, methodology, analysis, interpretation and reporting of results.
Goals The NIH strives to exemplify and promote the highest level of scientific integrity, public accountability, and social responsibility in the conduct of science. Grant applications instructions and the criteria by which reviewers are asked to evaluate the scientific merit of the application are intended to:
• ensure that NIH is funding the best and most rigorous science, • highlight the need for applicants to describe details that may have been previously overlooked, • highlight the need for reviewers to consider such details in their reviews through updated review language, and • minimize additional burden.
Una introducción a R utilizando los datos de Gapminder. El objetivo de …
Una introducción a R utilizando los datos de Gapminder. El objetivo de esta lección es enseñar a las programadoras principiantes a escribir códigos modulares y adoptar buenas prácticas en el uso de R para el análisis de datos. R nos provee un conjunto de paquetes desarrollados por terceros que se usan comúnmente en diversas disciplinas científicas para el análisis estadístico. Encontramos que muchos científicos que asisten a los talleres de Software Carpentry utilizan R y quieren aprender más. Nuestros materiales son relevantes ya que proporcionan a los asistentes una base sólida en los fundamentos de R y enseñan las mejores prácticas del cómputo científico: desglose del análisis en módulos, automatización tareas y encapsulamiento. Ten en cuenta que este taller se enfoca en los fundamentos del lenguaje de programación R y no en el análisis estadístico. A lo largo de este taller se utilizan una variedad de paquetes desarrolados por terceros, los cuales no son necesariamente los mejores ni se encuentran explicadas todas sus funcionalidades, pero son paquetes que consideramos útiles y han sido elegidos principalmente por su facilidad de uso.
As part of the Swiss Open Research Data Grants, the Swiss Reproducibility …
As part of the Swiss Open Research Data Grants, the Swiss Reproducibility Network (SwissRN) organized two half-day workshops for researchers in all empirical disciplines and at all levels at SwissRN institutional members in Switzerland: one about preregistration and registered reports (presented by Evie Vergauwe and Caro Hautekiet) and one about data and research management (presented by Eva Furrer and Rachel Heyard). The two half-day workshops were held at four different locations: the University of Zurich and ETH Zurich (May 6th), the University of Bern (May 31st) and the University of Geneva (June 7th).
In the preregistration and registered report workshop, we covered questions such as (1) Why and how to preregister a study?, (2) What is the difference between a study preregistration and a registered report, and (3) How to deal with potential obstacles regarding study preregistration. In the practical part, we discussed situations one can encounter when preregistering a study or submitting a registered report, and how to deal with these situations. Additionally, participants got the opportunity to preregister a simplified example study to get a first, hands-on experience with preregistration.
In the data management workshop, we covered questions such as (1) How to best manage your data and research projects?, (2) What are the FAIR principles?, and (3) How can good meta data and documentation improve your research output? In the practical part, participants got a first taste of version control using Gitlab.
An academic scientist’s professional success depends on publishing. Publishing norms emphasize novel, …
An academic scientist’s professional success depends on publishing. Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results. Prior reports demonstrate how these incentives inflate the rate of false effects in published science. When incentives favor novelty over replication, false results persist in the literature unchallenged, reducing efficiency in knowledge accumulation. Previous suggestions to address this problem are unlikely to be effective. For example, a journal of negative results publishes otherwise unpublishable reports. This enshrines the low status of the journal and its content. The persistence of false findings can be meliorated with strategies that make the fundamental but abstract accuracy motive—getting it right—competitive with the more tangible and concrete incentive—getting it published. This article develops strategies for improving scientific practices and knowledge accumulation that account for ordinary human motivations and biases.
Consistent confirmations obtained independently of each other lend credibility to a scientific …
Consistent confirmations obtained independently of each other lend credibility to a scientific result. We refer to results satisfying this consistency as reproducible and assume that reproducibility is a desirable property of scientific discovery. Yet seemingly science also progresses despite irreproducible results, indicating that the relationship between reproducibility and other desirable properties of scientific discovery is not well understood. These properties include early discovery of truth, persistence on truth once it is discovered, and time spent on truth in a long-term scientific inquiry. We build a mathematical model of scientific discovery that presents a viable framework to study its desirable properties including reproducibility. In this framework, we assume that scientists adopt a model-centric approach to discover the true model generating data in a stochastic process of scientific discovery. We analyze the properties of this process using Markov chain theory, Monte Carlo methods, and agent-based modeling. We show that the scientific process may not converge to truth even if scientific results are reproducible and that irreproducible results do not necessarily imply untrue results. The proportion of different research strategies represented in the scientific population, scientists’ choice of methodology, the complexity of truth, and the strength of signal contribute to this counter-intuitive finding. Important insights include that innovative research speeds up the discovery of scientific truth by facilitating the exploration of model space and epistemic diversity optimizes across desirable properties of scientific discovery.
The open science movement is rapidly changing the scientific landscape. Because exact …
The open science movement is rapidly changing the scientific landscape. Because exact definitions are often lacking and reforms are constantly evolving, accessible guides to open science are needed. This paper provides an introduction to open science and related reforms in the form of an annotated reading list of seven peer-reviewed articles, following the format of Etz, Gronau, Dablander, Edelsbrunner, and Baribault (2018). Written for researchers and students – particularly in psychological science – it highlights and introduces seven topics: understanding open science; open access; open data, materials, and code; reproducible analyses; preregistration and registered reports; replication research; and teaching open science. For each topic, we provide a detailed summary of one particularly informative and actionable article and suggest several further resources. Supporting a broader understanding of open science issues, this overview should enable researchers to engage with, improve, and implement current open, transparent, reproducible, replicable, and cumulative scientific practices.
A crescendo of incidents have raised concerns about whether scientific practices in …
A crescendo of incidents have raised concerns about whether scientific practices in psychology may be suboptimal, sometimes leading to the publication, dissemination, and application of unreliable or misinterpreted findings. Psychology has been a leader in identifying possibly suboptimal practices and proposing reforms that might enhance the efficiency of the scientific process and the publication of robust evidence and interpretations. To help shape future efforts, this paper offers a model of the psychological and socio-structural forces and processes that may influence scientists’ practices. The model identifies practices targeted by interventions and reforms, and which practices remain unaddressed. The model also suggests directions for empirical research to assess how best to enhance the effectiveness of psychological inquiry.
Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim …
Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction. Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.
Since 1998, Software Carpentry has been teaching researchers the computing skills they …
Since 1998, Software Carpentry has been teaching researchers the computing skills they need to get more done in less time and with less pain. Our volunteer instructors have run hundreds of events for more than 34,000 researchers since 2012. All of our lesson materials are freely reusable under the Creative Commons - Attribution license.
No restrictions on your remixing, redistributing, or making derivative works. Give credit to the author, as required.
Your remixing, redistributing, or making derivatives works comes with some restrictions, including how it is shared.
Your redistributing comes with some restrictions. Do not remix or make derivative works.
Most restrictive license type. Prohibits most uses, sharing, and any changes.
Copyrighted materials, available under Fair Use and the TEACH Act for US-based educators, or other custom arrangements. Go to the resource provider to see their individual restrictions.