Updating search results...

Analysis

Deriving meaning and knowledge from data. Software, code, licensing, maintenance, statistics, methods, code sharing, documentation, and more.
 

110 affiliated resources

Search Resources

View
Selected filters:
The Unix Shell
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lesson on how to use the shell to navigate the filesystem and write simple loops and scripts. The Unix shell has been around longer than most of its users have been alive. It has survived so long because it’s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they aren’t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including “high-performance computing” supercomputers). These lessons will start you on a path towards using these resources effectively.

Subject:
Applied Science
Computer Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Huffman
Adam James Orr
Adam Richie-Halford
AidaMirsalehi
Alex Kassil
Alex Mac
Alexander Konovalov
Alexander Morley
Alix Keener
Amy Brown
Andrea Bedini
Andrew Boughton
Andrew Reid
Andrew T. T. McRae
Andrew Walker
Ariel Rokem
Armin Sobhani
Ashwin Srinath
Bagus Tris Atmaja
Bartosz Telenczuk
Ben Bolker
Benjamin Gabriel
Bertie Seyffert
Bill Mills
Brian Ballsun-Stanton
BrianBill
Camille Marini
Chris Mentzel
Christina Koch
Colin Morris
Colin Sauze
Damien Irving
Dan Jones
Dana Brunson
Daniel Baird
Daniel McCloy
Daniel Standage
Danielle M. Nielsen
Dave Bridges
David Eyers
David McKain
David Vollmer
Dean Attali
Devinsuit
Dmytro Lituiev
Donny Winston
Doug Latornell
Dustin Lang
Elena Denisenko
Emily Dolson
Emily Jane McTavish
Eric Jankowski
Erin Alison Becker
Ethan P White
Evgenij Belikov
Farah Shamma
Fatma Deniz
Filipe Fernandes
Francis Gacenga
François Michonneau
Gabriel A. Devenyi
Gerard Capes
Giuseppe Profiti
Greg Wilson
Halle Burns
Hannah Burkhardt
Harriet Alexander
Hugues Fontenelle
Ian van der Linde
Inigo Aldazabal Mensa
Jackie Milhans
Jake Cowper Szamosi
James Guelfi
Jan T. Kim
Jarek Bryk
Jarno Rantaharju
Jason Macklin
Jay van Schyndel
Jens vdL
John Blischak
John Pellman
John Simpson
Jonah Duckles
Jonny Williams
Joshua Madin
Kai Blin
Kathy Chung
Katrin Leinweber
Kevin M. Buckley
Kirill Palamartchouk
Klemens Noga
Kristopher Keipert
Kunal Marwaha
Laurence
Lee Zamparo
Lex Nederbragt
M Carlise
Mahdi Sadjadi
Marc Rajeev Gouw
Marcel Stimberg
Maria Doyle
Marie-Helene Burle
Marisa Lim
Mark Mandel
Martha Robinson
Martin Feller
Matthew Gidden
Matthew Peterson
Megan Fritz
Michael Zingale
Mike Henry
Mike Jackson
Morgan Oneka
Murray Hoggett
Nicola Soranzo
Nicolas Barral
Noah D Brenowitz
Noam Ross
Norman Gray
Orion Buske
Owen Kaluza
Patrick McCann
Paul Gardner
Pauline Barmby
Peter R. Hoyt
Peter Steinbach
Philip Lijnzaad
Phillip Doehle
Piotr Banaszkiewicz
Rafi Ullah
Raniere Silva
Robert A Beagrie
Ruud Steltenpool
Ry4an Brase
Rémi Emonet
Sarah Mount
Sarah Simpkin
Scott Ritchie
Stephan Schmeing
Stephen Jones
Stephen Turner
Steve Leak
Stéphane Guillou
Susan Miller
Thomas Mellan
Tim Keighley
Tobin Magle
Tom Dowrick
Trevor Bekolay
Varda F. Hagh
Victor Koppejan
Vikram Chhatre
Yee Mey
csqrs
earkpr
ekaterinailin
nther
reshama shaikh
s-boardman
sjnair
Date Added:
03/20/2017
Version Control with Git
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson is part of the Software Carpentry workshops that teach how to use version control with Git. Wolfman and Dracula have been hired by Universal Missions (a space services spinoff from Euphoric State University) to investigate if it is possible to send their next planetary lander to Mars. They want to be able to work on the plans at the same time, but they have run into problems doing this in the past. If they take turns, each one will spend a lot of time waiting for the other to finish, but if they work on their own copies and email changes back and forth things will be lost, overwritten, or duplicated. A colleague suggests using version control to manage their work. Version control is better than mailing files back and forth: Nothing that is committed to version control is ever lost, unless you work really, really hard at it. Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results. As we have this record of who made what changes when, we know who to ask if we have questions later on, and, if needed, revert to a previous version, much like the “undo” feature in an editor. When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s. Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded). Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Alexander G. Zimmerman
Amiya Maji
Amy L Olex
Andrew Lonsdale
Annika Rockenberger
Begüm D. Topçuoğlu
Ben Bolker
Bill Sacks
Brian Moore
Casey Youngflesh
Charlotte Moragh Jones-Todd
Christoph Junghans
David Jennings
Erin Alison Becker
François Michonneau
Garrett Bachant
Grant Sayer
Holger Dinkel
Ian Lee
Jake Lever
James E McClure
James Tocknell
Janoš Vidali
Jeremy Teitelbaum
Jeyashree Krishnan
Jimmy O'Donnell
Joe Atzberger
Jonah Duckles
Jonathan Cooper
João Rodrigues
Katherine Koziar
Katrin Leinweber
Kunal Marwaha
Kurt Glaesemann
L.C. Karssen
Lauren Ko
Lex Nederbragt
Madicken Munk
Maneesha Sane
Marie-Helene Burle
Mark Woodbridge
Martino Sorbaro
Matt Critchlow
Matteo Ceschia
Matthew Bourque
Matthew Hartley
Maxim Belkin
Megan Potterbusch
Michael Torpey
Michael Zingale
Mingsheng Zhang
Nicola Soranzo
Nima Hejazi
Oscar Arbeláez
Peace Ossom Williamson
Pey Lian Lim
Raniere Silva
Rayna Michelle Harris
Rene Gassmoeller
Rich McCue
Richard Barnes
Ruud Steltenpool
Rémi Emonet
Samniqueka Halsey
Samuel Lelièvre
Sarah Stevens
Saskia Hiltemann
Schlauch, Tobias
Scott Bailey
Simon Waldman
Stefan Siegert
Thomas Morrell
Tommy Keswick
Traci P
Tracy Teal
Trevor Keller
TrevorLeeCline
Tyler Crawford Kelly
Tyler Reddy
Umihiko Hoshijima
Veronica Ikeshoji-Orlati
Wes Harrell
Will Usher
Wolmar Nyberg Åkerström
abracarambar
butterflyskip
jonestoddcm
Date Added:
03/20/2017
Version control with the OSF
Unrestricted Use
CC BY
Rating
0.0 stars

This webinar will introduce the concept of version control and the version control features that are built into the Open Science Framework (OSF; https://osf.io). The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github and Mendeley, to streamline workflows and increase efficiency. This webinar will discuss how keeping track of the different file versions is important for efficient reproducible research practices, how version control works on the OSF, and how researchers can view and download previous versions of files.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
The What, Why, and How of Preregistration
Unrestricted Use
CC BY
Rating
0.0 stars

More researchers are preregistering their studies as a way to combat publication bias and improve the credibility of research findings. Preregistration is at its core designed to distinguish between confirmatory and exploratory results. Both are important to the progress of science, but when they are conflated, problems arise. In this webinar, we discuss the What, Why, and How of preregistration and what it means for the future of science. Visit cos.io/prereg for additional resources.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
What is statistical power
Unrestricted Use
CC BY
Rating
0.0 stars

This video is the first in a series of videos related to the basics of power analyses. All materials shown in the video, as well as content from the other videos in the power analysis series can be found here: https://osf.io/a4xhr/

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
Unrestricted Use
CC BY
Rating
0.0 stars

Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.

Subject:
Psychology
Social Science
Material Type:
Reading
Provider:
PLOS ONE
Author:
Dylan Molenaar
Jelte M. Wicherts
Marjan Bakker
Date Added:
08/07/2020
Workflow for Awarding Badges
Unrestricted Use
CC BY
Rating
0.0 stars

Badges are a great way to signal that a journal values transparent research practices. Readers see the papers that have underlying data or methods available, colleagues see that norms are changing within a community and have ample opportunities to emulate better practices, and authors get recognition for taking a step into new techniques. In this webinar, Professor Stephen Lindsay of University of Victoria discusses the workflow of a badging program, eligibility for badge issuance, and the pitfalls to avoid in launching a badging program. Visit cos.io/badges to learn more.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Writing reproducible geoscience papers using R Markdown, Docker, and GitLab
Unrestricted Use
CC BY
Rating
0.0 stars

Reproducibility is unquestionably at the heart of science. Scientists face numerous challenges in this context, not least the lack of concepts, tools, and workflows for reproducible research in today's curricula.This short course introduces established and powerful tools that enable reproducibility of computational geoscientific research, statistical analyses, and visualisation of results using R (http://www.r-project.org/) in two lessons:1. Reproducible Research with R MarkdownOpen Data, Open Source, Open Reviews and Open Science are important aspects of science today. In the first lesson, basic motivations and concepts for reproducible research touching on these topics are briefly introduced. During a hands-on session the course participants write R Markdown (http://rmarkdown.rstudio.com/) documents, which include text and code and can be compiled to static documents (e.g. HTML, PDF).R Markdown is equally well suited for day-to-day digital notebooks as it is for scientific publications when using publisher templates.2. GitLab and DockerIn the second lesson, the R Markdown files are published and enriched on an online collaboration platform. Participants learn how to save and version documents using GitLab (http://gitlab.com/) and compile them using Docker containers (https://docker.com/). These containers capture the full computational environment and can be transported, executed, examined, shared and archived. Furthermore, GitLab's collaboration features are explored as an environment for Open Science.Prerequisites: Participants should install required software (R, RStudio, a current browser) and register on GitLab (https://gitlab.com) before the course.This short course is especially relevant for early career scientists (ECS).Participants are welcome to bring their own data and R scripts to work with during the course.All material by the conveners will be shared publicly via OSF (https://osf.io/qd9nf/).

Subject:
Physical Science
Material Type:
Activity/Lab
Provider:
New York University
Author:
Daniel Nüst
Edzer Pebesma
Markus Konkol
Rémi Rampin
Vicky Steeves
Date Added:
05/11/2018
The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research
Unrestricted Use
CC BY
Rating
0.0 stars

The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

Subject:
Mathematics
Statistics and Probability
Material Type:
Reading
Provider:
PeerJ
Author:
Fränzi Korner-Nievergelt
Tobias Roth
Valentin Amrhein
Date Added:
08/07/2020
The natural selection of bad science
Unrestricted Use
CC BY
Rating
0.0 stars

Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modelling. We first present a 60-year meta-analysis of statistical power in the behavioural sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more ‘progeny,’ such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.

Subject:
Mathematics
Statistics and Probability
Material Type:
Reading
Provider:
Royal Society Open Science
Author:
Paul E. Smaldino
Richard McElreath
Date Added:
08/07/2020