Reproducibility is unquestionably at the heart of science. Scientists face numerous challenges …
Reproducibility is unquestionably at the heart of science. Scientists face numerous challenges in this context, not least the lack of concepts, tools, and workflows for reproducible research in today's curricula.This short course introduces established and powerful tools that enable reproducibility of computational geoscientific research, statistical analyses, and visualisation of results using R (http://www.r-project.org/) in two lessons:1. Reproducible Research with R MarkdownOpen Data, Open Source, Open Reviews and Open Science are important aspects of science today. In the first lesson, basic motivations and concepts for reproducible research touching on these topics are briefly introduced. During a hands-on session the course participants write R Markdown (http://rmarkdown.rstudio.com/) documents, which include text and code and can be compiled to static documents (e.g. HTML, PDF).R Markdown is equally well suited for day-to-day digital notebooks as it is for scientific publications when using publisher templates.2. GitLab and DockerIn the second lesson, the R Markdown files are published and enriched on an online collaboration platform. Participants learn how to save and version documents using GitLab (http://gitlab.com/) and compile them using Docker containers (https://docker.com/). These containers capture the full computational environment and can be transported, executed, examined, shared and archived. Furthermore, GitLab's collaboration features are explored as an environment for Open Science.Prerequisites: Participants should install required software (R, RStudio, a current browser) and register on GitLab (https://gitlab.com) before the course.This short course is especially relevant for early career scientists (ECS).Participants are welcome to bring their own data and R scripts to work with during the course.All material by the conveners will be shared publicly via OSF (https://osf.io/qd9nf/).
This course packet seeks to develop the upper level engineering student’s sense …
This course packet seeks to develop the upper level engineering student’s sense of audience and purpose in a research-based context with workplace constraints. It requires the student to choose a technical topic of interest and research it to solve for a specific problem or to meet a typical industry need by way of several assignments: Unsolicited Research Proposal, Progress Report, Visual Aids, and Oral Presentation, all of which lead to the Formal Report. This approach readies students to write informatively and persuasively in the engineering workplace, providing excellent examples of each assignment contributed by former students whose Formal Reports have won first place in the annual Technical Writing Competition. Because users can rely on demonstrably excellent student examples to understand the concepts behind assignments that build on one another rather than on disparate textbook examples, they tend to write better and to be more confident producing documents and giving presentations. In short, they recognize they are among their own in a class that challenges many engineering students. Moreover, since all the Formal Reports have won awards, convincing students they are using good models with which to create their own documents is relatively easy. Finally, mining excellent student documents makes certain skill-sets clearer, according to former students. For instance, students can follow along as the writer does the following: identifies and proves a problem or need exists; creates the research objectives that lead to the method with which they will address the issue; and develops persuasive strategies for convincing both executive and engineering readers. Similarly, these student papers demonstrate how to discern among results, conclusions, and recommendations and show correct use of sources and visuals.
This collection uses primary sources to explore the Yellow Fever Epidemic of …
This collection uses primary sources to explore the Yellow Fever Epidemic of 1878. Digital Public Library of America Primary Source Sets are designed to help students develop their critical thinking skills and draw diverse material from libraries, archives, and museums across the United States. Each set includes an overview, ten to fifteen primary sources, links to related resources, and a teaching guide. These sets were created and reviewed by the teachers on the DPLA's Education Advisory Committee.
The widespread use of ‘statistical significance’ as a license for making a …
The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.
This resource is a video abstract of a research paper created by …
This resource is a video abstract of a research paper created by Research Square on behalf of its authors. It provides a synopsis that's easy to understand, and can be used to introduce the topics it covers to students, researchers, and the general public. The video's transcript is also provided in full, with a portion provided below for preview:
"The gut microbiota is a diverse ecosystem. While bacteria are present in the greatest numbers, other microorganisms such as fungi and protists are also present, influencing many physiological functions. Analyses of the gut microbiome in livestock species have increased recently with improvements in technology and decreased cost. However, little is known about host genetic control over gut microbial communities. A recent study examined this relationship using healthy Duroc pigs. Using genome-wide association studies, researchers identified a gene regulatory network comprising 3,561 genes and 738,913 connections. Within this complex and polygenic network, five main regulators stood out. The proteins were associated with immune cell development, cell signaling in immune cells, and the vaccine response and a large number of predicted targets were genes associated with microbiota in pigs, mice, and humans. Host genetic variants associated with microbial functions were also identified..."
The rest of the transcript, along with a link to the research itself, is available on the resource itself.
Poor research design and data analysis encourage false-positive findings. Such poor methods …
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.
No restrictions on your remixing, redistributing, or making derivative works. Give credit to the author, as required.
Your remixing, redistributing, or making derivatives works comes with some restrictions, including how it is shared.
Your redistributing comes with some restrictions. Do not remix or make derivative works.
Most restrictive license type. Prohibits most uses, sharing, and any changes.
Copyrighted materials, available under Fair Use and the TEACH Act for US-based educators, or other custom arrangements. Go to the resource provider to see their individual restrictions.