OER Commons

The citation advantage of linking publications to research data

Unrestricted Use

CC BY

The citation advantage of linking publications to research data

Rating

Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data, for example via a URL or permanent identifier, and if there is an added value in providing them. We consider 531,889 journal articles published by PLOS and BMC which are part of the PubMed Open Access collection, categorize their data availability statements according to their content and analyze the citation advantage of different statement categories via regression. We find that, following mandated publisher policies, data availability statements have become common by now, yet statements containing a link to a repository are still just a fraction of the total. We also find that articles with these statements, in particular, can have up to 25.36% higher citation impact on average: an encouraging result for all publishers and authors who make the effort of sharing their data. All our data and code are made available in order to reproduce and extend our results.

Subject:: Life Science; Social Science
Material Type:: Reading
Provider:: arXiv
Author:: Barbara McGillivray; Giovanni Colavizza; Iain Hrynaszkiewicz; Isla Staden; Kirstie Whitaker
Date Added:: 08/07/2020

More Less

datacarpentry/semester-biology: v4.1.0 - Journal of Open Source Education Submission

Unrestricted Use

CC BY

datacarpentry/semester-biology: v4.1.0 - Journal of Open Source Education Submission

Rating

Data Carpentry for Biologists is a set of teaching materials for teaching biologists how to work with data through programming, database management and computing more generally.

This repository contains the complete teaching materials (excluding exams and answers to assignments) and website for a university style and self-guided course teaching computational data skills to biologists. The course is designed to work primarily as a flipped classroom, with students reading and viewing videos before coming to class and then spending the bulk of class time working on exercises with the teacher answering questions and demoing the concepts.

More information can be found on the project's GitHub page: https://github.com/datacarpentry/semester-biology/tree/v4.1.0

Subject:: Applied Science; Biology; Information Science; Life Science
Material Type:: Full Course; Lecture Notes; Primary Source
Author:: Andrew J; David J; Ethan P; Kristina Riemer; Morgan Ernest; S K; Sergio Marconi; Virnaliz Cruz; Zachary T
Date Added:: 01/04/2022

More Less

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

Unrestricted Use

CC BY

The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

Rating

The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

Subject:: Mathematics; Statistics and Probability
Material Type:: Reading
Provider:: PeerJ
Author:: Fränzi Korner-Nievergelt; Tobias Roth; Valentin Amrhein
Date Added:: 08/07/2020

More Less

A funder-imposed data publication requirement seldom inspired data sharing

Unrestricted Use

CC BY

A funder-imposed data publication requirement seldom inspired data sharing

Rating

Growth of the open science movement has drawn significant attention to data sharing and availability across the scientific community. In this study, we tested the ability to recover data collected under a particular funder-imposed requirement of public availability. We assessed overall data recovery success, tested whether characteristics of the data or data creator were indicators of recovery success, and identified hurdles to data recovery. Overall the majority of data were not recovered (26% recovery of 315 data projects), a similar result to journal-driven efforts to recover data. Field of research was the most important indicator of recovery success, but neither home agency sector nor age of data were determinants of recovery. While we did not find a relationship between recovery of data and age of data, age did predict whether we could find contact information for the grantee. The main hurdles to data recovery included those associated with communication with the researcher; loss of contact with the data creator accounted for half (50%) of unrecoverable datasets, and unavailability of contact information accounted for 35% of unrecoverable datasets. Overall, our results suggest that funding agencies and journals face similar challenges to enforcement of data requirements. We advocate that funding agencies could improve the availability of the data they fund by dedicating more resources to enforcing compliance with data requirements, providing data-sharing tools and technical support to awardees, and administering stricter consequences for those who ignore data sharing preconditions.

Subject:: Applied Science; Biology; Health, Medicine and Nursing; Life Science; Social Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: Colette L. Ward; Gavin McDonald; Jessica L. Couture; Rachael E. Blake
Date Added:: 08/07/2020

More Less

Unrestricted Use

Public Domain

The good, The bad, The ugly (v.2)

Rating

DMP Bingo was developed as a hands-on activity for an introductory level data management workshop
for graduate students, faculty, and staff. The activity was designed as a way to include a wide variety of
participants at different stages of their career and with different data and grant proposal experience
levels. The activity is usually preceded by a slideshow/discussion that covers the basics of data
management planning and the purpose of a Data Management Plan (DMP) and followed by a short
discussion.

Subject:: Applied Science; Information Science
Material Type:: Activity/Lab; Game; Primary Source
Author:: Megan O'Donnell
Date Added:: 01/07/2022

More Less

The influence of journal submission guidelines on authors' reporting of statistics and use of open research practices

Unrestricted Use

CC BY

The influence of journal submission guidelines on authors' reporting of statistics and use of open research practices

Rating

From January 2014, Psychological Science introduced new submission guidelines that encouraged the use of effect sizes, estimation, and meta-analysis (the “new statistics”), required extra detail of methods, and offered badges for use of open science practices. We investigated the use of these practices in empirical articles published by Psychological Science and, for comparison, by the Journal of Experimental Psychology: General, during the period of January 2013 to December 2015. The use of null hypothesis significance testing (NHST) was extremely high at all times and in both journals. In Psychological Science, the use of confidence intervals increased markedly overall, from 28% of articles in 2013 to 70% in 2015, as did the availability of open data (3 to 39%) and open materials (7 to 31%). The other journal showed smaller or much smaller changes. Our findings suggest that journal-specific submission guidelines may encourage desirable changes in authors’ practices.

Subject:: Psychology; Social Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: David Giofrè; Geoff Cumming; Ingrid Boedker; Luca Fresc; Patrizio Tressoldi
Date Added:: 08/07/2020

More Less

Unrestricted Use

CC BY

The natural selection of bad science

Rating

Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.

Subject:: Mathematics; Statistics and Probability
Material Type:: Reading
Provider:: Royal Society Open Science
Author:: Paul E. Smaldino; Richard McElreath
Date Added:: 08/07/2020

More Less

An open investigation of the reproducibility of cancer biology research

Unrestricted Use

CC BY

An open investigation of the reproducibility of cancer biology research

Rating

It is widely believed that research that builds upon previously published findings has reproduced the original work. However, it is rare for researchers to perform or publish direct replications of existing results. The Reproducibility Project: Cancer Biology is an open investigation of reproducibility in preclinical cancer biology research. We have identified 50 high impact cancer biology articles published in the period 2010-2012, and plan to replicate a subset of experimental results from each article. A Registered Report detailing the proposed experimental designs and protocols for each subset of experiments will be peer reviewed and published prior to data collection. The results of these experiments will then be published in a Replication Study. The resulting open methodology and dataset will provide evidence about the reproducibility of high-impact results, and an opportunity to identify predictors of reproducibility.

Subject:: Applied Science; Biology; Health, Medicine and Nursing; Life Science
Material Type:: Reading
Provider:: eLife
Author:: Brian A Nosek; Elizabeth Iorns; Fraser Elisabeth Tan; Joelle Lomax; Timothy M Errington; William Gunn
Date Added:: 08/07/2020

More Less

A study of the impact of data sharing on article citations using journal policies as a natural experiment

Unrestricted Use

CC BY

A study of the impact of data sharing on article citations using journal policies as a natural experiment

Rating

This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.

Subject:: Economics; Social Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: Allan Dafoe; Andrew K. Rose; Don A. Moore; Edward Miguel; Garret Christensen
Date Added:: 08/07/2020

More Less

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

429 Results

Search Resources

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

429 Results