Updating search results...

Analysis

Deriving meaning and knowledge from data. Software, code, licensing, maintenance, statistics, methods, code sharing, documentation, and more.
 

110 affiliated resources

Search Resources

View
Selected filters:
OSF101
Unrestricted Use
CC BY
Rating
0.0 stars

This webinar walks you through the basics of creating an OSF project, structuring it to fit your research needs, adding collaborators, and tying your favorite online tools into your project structure. OSF is a free, open source web application built by the Center for Open Science, a non-profit dedicated to improving the alignment between scientific values and scientific practices. OSF is part collaboration tool, part version control software, and part data archive. It is designed to connect to popular tools researchers already use, like Dropbox, Box, Github, and Mendeley, to streamline workflows and increase efficiency.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
OSF In The Lab: Organizing related projects  with Links, Forks, and Templates
Unrestricted Use
CC BY
Rating
0.0 stars

Files for this webinar are available at: https://osf.io/ewhvq/ This webinar focuses on how to use the Open Science Framework (OSF) to tie together and organize multiple projects. We look at example structures appropriate for organizing classroom projects, a line of research, or a whole lab's activity. We discuss the OSF's capabilities for using projects as templates, linking projects, and forking projects as well as some considerations for using each of those capabilities when designing a structure for your own project. The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github and Mendeley, to streamline workflows and increase efficiency.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
OSF in the Classroom
Unrestricted Use
CC BY
Rating
0.0 stars

This webinar will introduce how to use the Open Science Framework (OSF; https://osf.io) in a Classroom. The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github and Mendeley, to streamline workflows and increase efficiency. This webinar will discuss how to introduce reproducible research practices to students, show ways of tracking student activity, and introduce the use of Templates and Forks on the OSF to allow students to easily make new class projects. The OSF is the flagship product of the Center for Open Science, a non-profit technology start-up dedicated to improving the alignment between scientific values and scientific practices. Learn more at cos.io and osf.io, or email contact@cos.io.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments
Unrestricted Use
CC BY
Rating
0.0 stars

How likely are published findings in the functional neuroimaging literature to be false? According to a recent mathematical model, the potential for false positives increases with the flexibility of analysis methods. Functional MRI (fMRI) experiments can be analyzed using a large number of commonly used tools, with little consensus on how, when, or whether to apply each one. This situation may lead to substantial variability in analysis outcomes. Thus, the present study sought to estimate the flexibility of neuroimaging analysis by submitting a single event-related fMRI experiment to a large number of unique analysis procedures. Ten analysis steps for which multiple strategies appear in the literature were identified, and two to four strategies were enumerated for each step. Considering all possible combinations of these strategies yielded 6,912 unique analysis pipelines. Activation maps from each pipeline were corrected for multiple comparisons using five thresholding approaches, yielding 34,560 significance maps. While some outcomes were relatively consistent across pipelines, others showed substantial methods-related variability in activation strength, location, and extent. Some analysis decisions contributed to this variability more than others, and different decisions were associated with distinct patterns of variability across the brain. Qualitative outcomes also varied with analysis parameters: many contrasts yielded significant activation under some pipelines but not others. Altogether, these results reveal considerable flexibility in the analysis of fMRI experiments. This observation, when combined with mathematical simulations linking analytic flexibility with elevated false positive rates, suggests that false positive results may be more prevalent than expected in the literature. This risk of inflated false positive rates may be mitigated by constraining the flexibility of analytic choices or by abstaining from selective analysis reporting.

Subject:
Applied Science
Biology
Health, Medicine and Nursing
Life Science
Psychology
Social Science
Material Type:
Reading
Provider:
Frontiers in Neuroscience
Author:
Joshua Carp
Date Added:
08/07/2020
OpenRefine for Social Science Data
Unrestricted Use
CC BY
Rating
0.0 stars

Lesson on OpenRefine for social scientists. A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identifed and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis. OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Subject:
Applied Science
Information Science
Mathematics
Measurement and Data
Social Science
Material Type:
Module
Provider:
The Carpentries
Author:
Erin Becker
François Michonneau
Geoff LaFlair
Karen Word
Lachlan Deer
Peter Smyth
Tracy Teal
Date Added:
08/07/2020
Open Science in Latin America
Unrestricted Use
CC BY
Rating
0.0 stars

Note: This webinar was presented in Spanish. The slides presented during this webinar can be found here:https://osf.io/6qnse/ The slides presented during this seminar can be found here: https://osf.io/6qnse/ Este seminario web se centrará en el estado de la ciencia abierta en América Latina, desde los esfuerzos de los investigadores individuales para abrir sus flujos de trabajo, herramientas para ayudar a los investigadores a ser abiertos y nuevas redes e iniciativas prometedoras en ciencia abierta. Ricardo Hartley (@ametodico) es profesor de metodología de la investigación de la Universidad Central de Chile, investigador en biología de la reproducción y en comunicación - valoración del conocimiento. Organizador de las OpenCon Santiago 2016 y 2017 y embajador COS. Erin McKiernan es profesora del Departamento de Física, Programa de Física Biomédica de la Universidad Nacional Autónoma de México. También es la fundadora del Why Open Research? proyecto, un sitio educativo para que los investigadores aprendan cómo compartir su trabajo, financiado en parte por la Fundación Shuttleworth. Fernan Federici Noe es profesor asistente e investigador de la Universidad Católica de Chile y fellow internacional del OpenPlant Synthetic Biology Center, University of Cambridge. Fernan es miembro del Global For Open Science Hardware (GOSH) y TECNOx (www.tecnox.org).

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Open Source Guides
Unrestricted Use
CC BY
Rating
0.0 stars

Open Source Guides (https://opensource.guide/) are a collection of resources for individuals, communities, and companies who want to learn how to run and contribute to an open source project.

Background: Open Source Guides were created and are curated by GitHub, along with input from outside community reviewers, but they are not exclusive to GitHub products. One reason we started this project is because we felt that there weren't enough resources for people creating open source projects.

Our goal is to aggregate community best practices, not what GitHub (or any other individual or entity) thinks is best. Therefore, we try to use examples and quotations from others to illustrate our points.

Subject:
Applied Science
Life Science
Physical Science
Social Science
Material Type:
Reading
Provider:
github
Author:
GitHub
Date Added:
06/18/2020
Optimizing Research Collaboration
Unrestricted Use
CC BY
Rating
0.0 stars

In this webinar, we demonstrate the OSF tools available for contributors, labs, centers, and institutions that support stronger collaborations. The demo includes useful practices like: contributor management, the OSF wiki as an electronic lab notebook, using OSF to manage online courses and syllabi, and more. Finally, we look at how OSF Institutions can provide discovery and intelligence gathering infrastructure so that you can focus on conducting and supporting exceptional research. The Center for Open Science’s ongoing mission is to provide community and technical resources to support your commitments to rigorous, transparent research practices. Visit cos.io/institutions to learn more.

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Plotting and Programming in Python
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson is part of Software Carpentry workshops and teach an introduction to plotting and programming using python. This lesson is an introduction to programming in Python for people with little or no previous programming experience. It uses plotting as its motivating example, and is designed to be used in both Data Carpentry and Software Carpentry workshops. This lesson references JupyterLab, but can be taught using a regular Python interpreter as well. Please note that this lesson uses Python 3 rather than Python 2.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Steer
Allen Lee
Andreas Hilboll
Ashley Champagne
Benjamin
Benjamin Roberts
CanWood
Carlos Henrique Brandt
Carlos M Ortiz Marrero
Cephalopd
Cian Wilson
Dan Mønster
Daniel W Kerchner
Daria Orlowska
Dave Lampert
David Matten
Erin Alison Becker
Florian Goth
Francisco J. Martínez
Greg Wilson
Jacob Deppen
Jarno Rantaharju
Jeremy Zucker
Jonah Duckles
Kees den Heijer
Keith Gilbertson
Kyle E Niemeyer
Lex Nederbragt
Logan Cox
Louis Vernon
Lucy Dorothy Whalley
Madeleine Bonsma-Fisher
Mark Phillips
Mark Slater
Maxim Belkin
Michael Beyeler
Mike Henry
Narayanan Raghupathy
Nigel Bosch
Olav Vahtras
Pablo Hernandez-Cerdan
Paul Anzel
Phil Tooley
Raniere Silva
Robert Woodward
Ryan Avery
Ryan Gregory James
SBolo
Sarah M Brown
Shyam Dwaraknath
Sourav Singh
Steven Koenig
Stéphane Guillou
Taylor Smith
Thor Wikfeldt
Timothy Warren
Tyler Martin
Vasu Venkateshwaran
Vikas Pejaver
ian
mzc9
Date Added:
08/07/2020
Poor replication validity of biomedical association studies reported by newspapers
Unrestricted Use
CC BY
Rating
0.0 stars

Objective To investigate the replication validity of biomedical association studies covered by newspapers. Methods We used a database of 4723 primary studies included in 306 meta-analysis articles. These studies associated a risk factor with a disease in three biomedical domains, psychiatry, neurology and four somatic diseases. They were classified into a lifestyle category (e.g. smoking) and a non-lifestyle category (e.g. genetic risk). Using the database Dow Jones Factiva, we investigated the newspaper coverage of each study. Their replication validity was assessed using a comparison with their corresponding meta-analyses. Results Among the 5029 articles of our database, 156 primary studies (of which 63 were lifestyle studies) and 5 meta-analysis articles were reported in 1561 newspaper articles. The percentage of covered studies and the number of newspaper articles per study strongly increased with the impact factor of the journal that published each scientific study. Newspapers almost equally covered initial (5/39 12.8%) and subsequent (58/600 9.7%) lifestyle studies. In contrast, initial non-lifestyle studies were covered more often (48/366 13.1%) than subsequent ones (45/3718 1.2%). Newspapers never covered initial studies reporting null findings and rarely reported subsequent null observations. Only 48.7% of the 156 studies reported by newspapers were confirmed by the corresponding meta-analyses. Initial non-lifestyle studies were less often confirmed (16/48) than subsequent ones (29/45) and than lifestyle studies (31/63). Psychiatric studies covered by newspapers were less often confirmed (10/38) than the neurological (26/41) or somatic (40/77) ones. This is correlated to an even larger coverage of initial studies in psychiatry. Whereas 234 newspaper articles covered the 35 initial studies that were later disconfirmed, only four press articles covered a subsequent null finding and mentioned the refutation of an initial claim. Conclusion Journalists preferentially cover initial findings although they are often contradicted by meta-analyses and rarely inform the public when they are disconfirmed.

Subject:
Applied Science
Health, Medicine and Nursing
Material Type:
Reading
Provider:
PLOS ONE
Author:
Andy Smith
Estelle Dumas-Mallet
François Gonon
Thomas Boraud
Date Added:
08/07/2020
Pre-analysis Plans: A Stocktaking
Read the Fine Print
Rating
0.0 stars

The evidence-based community has championed the public registration of pre-analysis plans (PAPs) as a solution to the problem of research credibility, but without any evidence that PAPs actually bolster the credibility of research. We analyze a representative sample of 195 pre-analysis plans (PAPs) from the American Economic Association (AEA) and Evidence in Governance and Politics (EGAP) registration platforms to assess whether PAPs are sufficiently clear, precise and comprehensive to be able to achieve their objectives of preventing “fishing” and reducing the scope for post-hoc adjustment of research hypotheses. We also analyze a subset of 93 PAPs from projects that have resulted in publicly available papers to ascertain how faithfully they adhere to their pre-registered specifications and hypotheses. We find significant variation in the extent to which PAPs are accomplishing the goals they were designed to achieve

Subject:
Economics
Social Science
Material Type:
Reading
Author:
Daniel Posner
George Ofosu
Date Added:
08/07/2020
Preregistration: Improve Research Rigor, Reduce Bias
Unrestricted Use
CC BY
Rating
0.0 stars

In this webinar Professor Brian Nosek, Executive Director of the Center for Open Science (https://cos.io), outlines the practice of Preregistration and how it can aid in increasing the rigor and reproducibility of research. The webinar is co-hosted by the Health Research Alliance, a collaborative member organization of nonprofit research funders. Slides available at: https://osf.io/9m6tx/

Subject:
Applied Science
Computer Science
Information Science
Material Type:
Lecture
Provider:
Center for Open Science
Author:
Center for Open Science
Date Added:
08/07/2020
Preregistration in Complex Contexts: A Preregistration Template for the Application of Cognitive Models
Unrestricted Use
CC BY
Rating
0.0 stars

In recent years, open science practices have become increasingly popular in psychology and related sciences. These practices aim to increase rigour and transparency in science as a potential response to the challenges posed by the replication crisis. Many of these reforms -- including the highly influential preregistration -- have been designed for experimental work that tests simple hypotheses with standard statistical analyses, such as assessing whether an experimental manipulation has an effect on a variable of interest. However, psychology is a diverse field of research, and the somewhat narrow focus of the prevalent discussions surrounding and templates for preregistration has led to debates on how appropriate these reforms are for areas of research with more diverse hypotheses and more complex methods of analysis, such as cognitive modelling research within mathematical psychology. Our article attempts to bridge the gap between open science and mathematical psychology, focusing on the type of cognitive modelling that Crüwell, Stefan, & Evans (2019) labelled model application, where researchers apply a cognitive model as a measurement tool to test hypotheses about parameters of the cognitive model. Specifically, we (1) discuss several potential researcher degrees of freedom within model application, (2) provide the first preregistration template for model application, and (3) provide an example of a preregistered model application using our preregistration template. More broadly, we hope that our discussions and proposals constructively advance the debate surrounding preregistration in cognitive modelling, and provide a guide for how preregistration templates may be developed in other diverse or complex research contexts.

Subject:
Applied Science
Life Science
Physical Science
Social Science
Material Type:
Reading
Author:
Nathan Evans
Sophia Crüwell
Date Added:
12/07/2019
Programming with MATLAB
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to MATLAB is built around a common scientific task: data analysis. Our real goal isn’t to teach you MATLAB, but to teach you the basic concepts that all programming depends on. We use MATLAB in our lessons because: we have to use something for examples; it’s well-documented; it has a large (and growing) user base among scientists in academia and industry; and it has a large library of packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so that you can share your work with them easily, and to use that language well.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Gerard Capes
Date Added:
03/20/2017
Programming with Python
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis. Arthritis Inflammation We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, columns represent successive days. The first three rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 Each number represents the number of inflammation bouts that a particular patient experienced on a given day. For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study. So, we want to: Calculate the average inflammation per day across all patients. Plot the result to discuss and share with colleagues. To do all that, we’ll have to learn a little bit about programming.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Anne Fouilloux
Lauren Ko
Maxim Belkin
Trevor Bekolay
Valentina Staneva
Date Added:
08/07/2020
Programming with R
Unrestricted Use
CC BY
Rating
0.0 stars

The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis. Our real goal isn’t to teach you R, but to teach you the basic concepts that all programming depends on. We use R in our lessons because: we have to use something for examples; it’s free, well-documented, and runs almost everywhere; it has a large (and growing) user base among scientists; and it has a large library of external packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so you can share your work with them easily, and to use that language well. We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 We want to: load that data into memory, calculate the average inflammation per day across all patients, and plot the result. To do all that, we’ll have to learn a little bit about programming.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Diya Das
Katrin Leinweber
Rohit Goswami
Date Added:
03/20/2017
Project Organization and Management for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:
Business and Communication
Genetics
Life Science
Management
Material Type:
Module
Provider:
The Carpentries
Author:
Amanda Charbonneau
Bérénice Batut
Daniel O. S. Ouso
Deborah Paul
Erin Alison Becker
François Michonneau
Jason Williams
Juan A. Ugalde
Kevin Weitemier
Laura Williams
Paula Andrea Martinez
Peter R. Hoyt
Rayna Michelle Harris
Taylor Reiter
Toby Hodges
Tracy Teal
Date Added:
08/07/2020
P values in display items are ubiquitous and almost invariably significant: A survey of top science journals
Unrestricted Use
CC BY
Rating
0.0 stars

P values represent a widely used, but pervasively misunderstood and fiercely contested method of scientific inference. Display items, such as figures and tables, often containing the main results, are an important source of P values. We conducted a survey comparing the overall use of P values and the occurrence of significant P values in display items of a sample of articles in the three top multidisciplinary journals (Nature, Science, PNAS) in 2017 and, respectively, in 1997. We also examined the reporting of multiplicity corrections and its potential influence on the proportion of statistically significant P values. Our findings demonstrated substantial and growing reliance on P values in display items, with increases of 2.5 to 14.5 times in 2017 compared to 1997. The overwhelming majority of P values (94%, 95% confidence interval [CI] 92% to 96%) were statistically significant. Methods to adjust for multiplicity were almost non-existent in 1997, but reported in many articles relying on P values in 2017 (Nature 68%, Science 48%, PNAS 38%). In their absence, almost all reported P values were statistically significant (98%, 95% CI 96% to 99%). Conversely, when any multiplicity corrections were described, 88% (95% CI 82% to 93%) of reported P values were statistically significant. Use of Bayesian methods was scant (2.5%) and rarely (0.7%) articles relied exclusively on Bayesian statistics. Overall, wider appreciation of the need for multiplicity corrections is a welcome evolution, but the rapid growth of reliance on P values and implausibly high rates of reported statistical significance are worrisome.

Subject:
Mathematics
Statistics and Probability
Material Type:
Reading
Provider:
PLOS ONE
Author:
Ioana Alina Cristea
John P. A. Ioannidis
Date Added:
08/07/2020