Collection

Analysis

Deriving meaning and knowledge from data. Software, code, licensing, maintenance, statistics, methods, code sharing, documentation, and more.

110 affiliated resources

Open filters Close filters

Python for Harvesting Data on the Web

Conditional Remix & Share Permitted

CC BY-NC

Python for Harvesting Data on the Web

Rating

This session is an intermediate-to-advanced level class that offers some ideas for how to approach the following common data wrangling needs in research: 1) Obtain data and load it into a suitable data "container" for analysis, often via a web interface, especially an API, 2) parse the data retrieved via an API and turn it into a useful object for manipulation and analysis, and 3) perform some basic summary counts of records in a dataset and work up a quick visualization.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Activity/Lab
Provider:: New York University
Author:: Nick Wolf; Vicky Steeves
Date Added:: 01/06/2020

Python for Humanities

Unrestricted Use

CC BY

Python for Humanities

Rating

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Iain Emsley
Date Added:: 08/07/2020

Qualitative Research Using Open Tools

Unrestricted Use

CC BY

Qualitative Research Using Open Tools

Rating

Qualitative research has long suffered from a lack of free tools for analysis, leaving no options for researchers without significant funds for software licenses. This presents significant challenges for equity. This panel discussion will explore the first two free/libre open source qualitative analysis tools out there: qcoder (R package) and Taguette (desktop application). Drawing from the diverse backgrounds of the presenters (social science, library & information science, software engineering), we will discuss what openness and extensibility means for qualitative research, and how the two tools we've built facilitate equitable, open sharing.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Lesson
Provider:: New York University
Author:: Beth M. Duckles; Vicky Steeves
Date Added:: 05/07/2019

Questionable research practices among italian research psychologists

Unrestricted Use

CC BY

Questionable research practices among italian research psychologists

Rating

A survey in the United States revealed that an alarmingly large percentage of university psychologists admitted having used questionable research practices that can contaminate the research literature with false positive and biased findings. We conducted a replication of this study among Italian research psychologists to investigate whether these findings generalize to other countries. All the original materials were translated into Italian, and members of the Italian Association of Psychology were invited to participate via an online survey. The percentages of Italian psychologists who admitted to having used ten questionable research practices were similar to the results obtained in the United States although there were small but significant differences in self-admission rates for some QRPs. Nearly all researchers (88%) admitted using at least one of the practices, and researchers generally considered a practice possibly defensible if they admitted using it, but Italian researchers were much less likely than US researchers to consider a practice defensible. Participants’ estimates of the percentage of researchers who have used these practices were greater than the self-admission rates, and participants estimated that researchers would be unlikely to admit it. In written responses, participants argued that some of these practices are not questionable and they have used some practices because reviewers and journals demand it. The similarity of results obtained in the United States, this study, and a related study conducted in Germany suggest that adoption of these practices is an international phenomenon and is likely due to systemic features of the international research and publication processes.

Subject:: Psychology; Social Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: Coosje L. S. Veldkamp; Franca Agnoli; Jelte M. Wicherts; Paolo Albiero; Roberto Cubelli
Date Added:: 08/07/2020

Questionable research practices in ecology and evolution

Unrestricted Use

CC BY

Questionable research practices in ecology and evolution

Rating

We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution.

Subject:: Biology; Ecology; Life Science
Material Type:: Reading
Provider:: PLOS ONE
Author:: Ashley Barnett; Fiona Fidler; Hannah Fraser; Shinichi Nakagawa; Tim Parker
Date Added:: 08/07/2020

Raiders of the lost HARK: a reproducible inference framework for big data science

Unrestricted Use

CC BY

Raiders of the lost HARK: a reproducible inference framework for big data science

Rating

Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchersâ€™ obligations to use big, â€˜organicâ€™ data sourcesâ€”from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins â€˜natural selectionâ€™ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

Subject:: Applied Science; Health, Medicine and Nursing
Material Type:: Reading
Provider:: Palgrave Communications
Author:: Iain E. Buchan; James S. Koopman; Jiang Bian; Matthew Sperrin; Mattia Prosperi; Mo Wang
Date Added:: 08/07/2020

Reproducibility Immersive Course

Conditional Remix & Share Permitted

CC BY-SA

Reproducibility Immersive Course

Rating

Various fields in the natural and social sciences face a ‘crisis of confidence’. Broadly, this crisis amounts to a pervasiveness of non-reproducible results in the published literature. For example, in the field of biomedicine, Amgen published findings that out of 53 landmark published results of pre-clinical studies, only 11% could be replicated successfully. This crisis is not confined to biomedicine. Areas that have recently received attention for non-reproducibility include biomedicine, economics, political science, psychology, as well as philosophy. Some scholars anticipate the expansion of this crisis to other disciplines.This course explores the state of reproducibility. After giving a brief historical perspective, case studies from different disciplines (biomedicine, psychology, and philosophy) are examined to understand the issues concretely. Subsequently, problems that lead to non-reproducibility are discussed as well as possible solutions and paths forward.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Activity/Lab
Provider:: New York University
Author:: Vicky Steeves
Date Added:: 06/01/2018

Reproducible Research

Read the Fine Print

Reproducible Research

Rating

Modern scientific research takes advantage of programs such as Python and R that are open source. As such, they can be modified and shared by the wider community. Additionally, there is added functionality through additional programs and packages, such as IPython, Sweave, and Shiny. These packages can be used to not only execute data analyses, but also to present data and results consistently across platforms (e.g., blogs, websites, repositories and traditional publishing venues).

The goal of the course is to show how to implement analyses and share them using IPython for Python, Sweave and knitr for RStudio to create documents that are shareable and analyses that are reproducible.

Course outline is as follows:
1) Use of IPython notebooks to demonstrate and explain code, visualize data, and display analysis results
2) Applications of Python modules such as SymPy, NumPy, pandas, and SciPy
3) Use of Sweave to demonstrate and explain code, visualize data, display analysis results, and create documents and presentations
4) Integration and execution of IPython and R code and analyses using the IPython notebook

Subject:: Applied Science; Information Science
Material Type:: Full Course
Author:: Christopher Ahern
Date Added:: 08/07/2020

Reproducible Research Methods

Read the Fine Print

Reproducible Research Methods

Rating

This is the website for the Autumn 2014 course “Reproducible Research Methods” taught by Eric C. Anderson at NOAA’s Southwest Fisheries Science Center. The course meets on Tuesdays and Thursdays from 3:30 to 4:30 PM in Room 188 of the Fisheries Ecology Division.
It runs from Oct 7 to December 18.

The goal of this course is for scientists, researchers, and students to learn:

to write programs in the R language to manipulate and analyze data,
to integrate data analysis with report generation and article preparation using knitr,
to work fluently within the Rstudio integrated development environment for R,
to use git version control software and GitHub to effectively manage source code, collaborate efficiently with other researchers, and neatly package their research.

Subject:: Applied Science; Information Science
Material Type:: Full Course
Author:: Eric C. Anderson
Date Added:: 08/07/2020

Reproducible Research: Walking the Walk

Read the Fine Print

Reproducible Research: Walking the Walk

Rating

Description

This hands-on tutorial will train reproducible research warriors on the practices and tools that make experimental verification possible with an end-to-end data analysis workflow. The tutorial will expose attendees to open science methods during data gathering, storage, analysis, up to publication into a reproducible article.

Attendees are expected to have basic familiarity with scientific Python and Git.

Subject:: Applied Science; Information Science
Material Type:: Module
Author:: Matt McCormick
Date Added:: 08/07/2020

Reproducible Science Curriculum Lesson for Automation

Read the Fine Print

Reproducible Science Curriculum Lesson for Automation

Rating

Workshop goals
- Why are we teaching this
- Why is this important
- For future and current you
- For research as a whole
- Lack of reproducibility in research is a real problem

Materials and how we'll use them
- Workshop landing page, with

- links to the Materials
- schedule

Structure oriented along the Four Facets of Reproducibility:

- Documentation
- Organization
- Automation
- Dissemination

Will be available after the Workshop

How this workshop is run
- This is a Carpentries Workshop
- that means friendly learning environment
- Code of Conduct
- active learning
- work with the people next to you
- ask for help

Subject:: Applied Science; Information Science
Material Type:: Module
Author:: François Michonneau; Kim Gilbert; Matt Pennell
Date Added:: 08/07/2020

Research Project Management Using the Open Science Framework

Conditional Remix & Share Permitted

CC BY-NC

Research Project Management Using the Open Science Framework

Rating

An introduction to managing, annotating, organizing, archiving, and publishing research data using the Open Science Framework.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Activity/Lab
Provider:: New York University
Author:: Nick Wolf; Vicky Steeves
Date Added:: 01/06/2020

R for Reproducible Scientific Analysis

Unrestricted Use

CC BY

R for Reproducible Scientific Analysis

Rating

This lesson in part of Software Carpentry workshop and teach novice programmers to write modular code and best practices for using R for data analysis. an introduction to R for non-programmers using gapminder data The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages. We find that many scientists who come to Software Carpentry workshops use R and want to learn more. The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing: breaking down analyses into modular units, task automation, and encapsulation. Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis. The lesson contains more material than can be taught in a day. The instructor notes page has some suggested lesson plans suitable for a one or half day workshop. A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam H. Sparks; Ahsan Ali Khoja; Amy Lee; Ana Costa Conrado; Andrew Boughton; Andrew Lonsdale; Andrew MacDonald; Andris Jankevics; Andy Teucher; Antonio Berlanga-Taylor; Ashwin Srinath; Ben Bolker; Bill Mills; Bret Beheim; Clare Sloggett; Daniel; Dave Bridges; David J. Harris; David Mawdsley; Dean Attali; Diego Rabatone Oliveira; Drew Tyre; Elise Morrison; Erin Alison Becker; Fernando Mayer; François Michonneau; Giulio Valentino Dalla Riva; Gordon McDonald; Greg Wilson; Harriet Dashnow; Ido Bar; Jaime Ashander; James Balamuta; James Mickley; Jamie McDevitt-Irwin; Jeffrey Arnold; Jeffrey Oliver; John Blischak; Jonah Duckles; Josh Quan; Julia Piaskowski; Kara Woo; Kate Hertweck; Katherine Koziar; Katrin Leinweber; Kellie Ottoboni; Kevin Weitemier; Kiana Ashley West; Kieran Samuk; Kunal Marwaha; Kyriakos Chatzidimitriou; Lachlan Deer; Lex Nederbragt; Liz Ing-Simmons; Lucy Chang; Luke W Johnston; Luke Zappia; Marc Sze; Marie-Helene Burle; Marieke Frassl; Mark Dunning; Martin John Hadley; Mary Donovan; Matt Clark; Melissa Kardish; Mike Jackson; Murray Cadzow; Narayanan Raghupathy; Naupaka Zimmerman; Nelly Sélem; Nicholas Lesniak; Nicholas Potter; Nima Hejazi; Nora Mitchell; Olivia Rata Burge; Paula Andrea Martinez; Pete Bachant; Phil Bouchet; Philipp Boersch-Supan; Piotr Banaszkiewicz; Raniere Silva; Rayna Michelle Harris; Remi Daigle; Research Bazaar; Richard Barnes; Robert Bagchi; Rémi Emonet; Sam Penrose; Sandra Brosda; Sarah Munro; Sasha Lavrentovich; Scott Allen Funkhouser; Scott Ritchie; Sebastien Renaut; Thea Van Rossum; Timothy Eoin Moore; Timothy Rice; Tobin Magle; Trevor Bekolay; Tyler Crawford Kelly; Vicken Hillis; Yuka Takemon; bippuspm; butterflyskip; waiteb5
Date Added:: 03/20/2017

R for Social Scientists

Unrestricted Use

CC BY

R for Social Scientists

Rating

From Data Carpentry: Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Subject:: Social Science
Material Type:: Activity/Lab
Provider:: New York University
Author:: Vicky Steeves
Date Added:: 01/15/2020

R for Social Scientists

Unrestricted Use

CC BY

R for Social Scientists

Rating

Data Carpentry lesson part of the Social Sciences curriculum. This lesson teaches how to analyse and visualise data used by social scientists. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.

Subject:: Applied Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: Angela Li; Ben Marwick; Christina Maimone; Danielle Quinn; Erin Alison Becker; Francois Michonneau; Geoffrey LaFlair; Hao Ye; Jake Kaupp; Juan Fung; Katrin Leinweber; Martin Olmos; Murray Cadzow
Date Added:: 08/07/2020

R para Análisis Científicos Reproducibles

Unrestricted Use

CC BY

R para Análisis Científicos Reproducibles

Rating

Una introducción a R utilizando los datos de Gapminder. El objetivo de esta lección es enseñar a las programadoras principiantes a escribir códigos modulares y adoptar buenas prácticas en el uso de R para el análisis de datos. R nos provee un conjunto de paquetes desarrollados por terceros que se usan comúnmente en diversas disciplinas científicas para el análisis estadístico. Encontramos que muchos científicos que asisten a los talleres de Software Carpentry utilizan R y quieren aprender más. Nuestros materiales son relevantes ya que proporcionan a los asistentes una base sólida en los fundamentos de R y enseñan las mejores prácticas del cómputo científico: desglose del análisis en módulos, automatización tareas y encapsulamiento. Ten en cuenta que este taller se enfoca en los fundamentos del lenguaje de programación R y no en el análisis estadístico. A lo largo de este taller se utilizan una variedad de paquetes desarrolados por terceros, los cuales no son necesariamente los mejores ni se encuentran explicadas todas sus funcionalidades, pero son paquetes que consideramos útiles y han sido elegidos principalmente por su facilidad de uso.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: A. s; Alejandra Gonzalez-Beltran; Ana Beatriz Villaseñor Altamirano; Antonio; AntonioJBT; Belinda Weaver; Claudia Engel; Cynthia Monastirsky; Daniel Beiter; David Mawdsley; David Pérez-Suárez; Erin Becker; EuniceML; François Michonneau; Gordon McDonald; Guillermina Actis; Guillermo Movia; Hely Salgado; Ido Bar; Ivan Ogasawara; Ivonne Lujano; James J Balamuta; Jamie McDevitt-Irwin; Jeff Oliver; Jonah Duckles; Juan M. Barrios; Katrin Leinweber; Kevin Alquicira; Kevin Martínez-Folgar; Laura Angelone; Laura-Gomez; Leticia Vega; Marcela Alfaro Córdoba; Marceline Abadeer; Maria Florencia D'Andrea; Marie-Helene Burle; Marieke Frassl; Matias Andina; Murray Cadzow; Narayanan Raghupathy; Naupaka Zimmerman; Paola Prieto; Paula Andrea Martinez; Raniere Silva; Rayna M Harris; Richard Barnes; Richard McCosh; Romualdo Zayas-Lagunas; Sandra Brosda; Sasha Lavrentovich; Shirley Alquicira Hernandez; Silvana Pereyra; Tobin Magle; Veronica Jimenez; juli arancio; raynamharris; saynomoregrl
Date Added:: 08/07/2020

Social Science Workshop Overview

Unrestricted Use

CC BY

Social Science Workshop Overview

Rating

Workshop overview for the Data Carpentry Social Sciences curriculum. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop teaches data management and analysis for social science research including best practices for data organization in spreadsheets, reproducible data cleaning with OpenRefine, and data analysis and visualization in R. This curriculum is designed to be taught over two full days of instruction. Materials for teaching data analysis and visualization in Python and extraction of information from relational databases using SQL are in development. Interested in teaching these materials? We have an onboarding video and accompanying slides available to prepare Instructors to teach these lessons. After watching this video, please contact team@carpentries.org so that we can record your status as an onboarded Instructor. Instructors who have completed onboarding will be given priority status for teaching at centrally-organized Data Carpentry Social Sciences workshops.

Subject:: Applied Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: Angela Li; Erin Alison Becker; Francois Michonneau; Maneesha Sane; Sarah Brown; Tracy Teal
Date Added:: 08/07/2020

Software Carpentry

Unrestricted Use

CC BY

Software Carpentry

Rating

Since 1998, Software Carpentry has been teaching researchers the computing skills they need to get more done in less time and with less pain. Our volunteer instructors have run hundreds of events for more than 34,000 researchers since 2012. All of our lesson materials are freely reusable under the Creative Commons - Attribution license.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Full Course
Provider:: Software Carpentry Community
Author:: Software Carpentry Community
Date Added:: 06/18/2020

Statistics with JASP and the Open Science Framework

Unrestricted Use

CC BY

Statistics with JASP and the Open Science Framework

Rating

This webinar will introduce the integration of JASP Statistical Software (https://jasp-stats.org/) with the Open Science Framework (OSF; https://osf.io). The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github, Mendeley, and now is integrated with JASP, to streamline workflows and increase efficiency.

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Two Years Later: Journals Are Not Yet Enforcing the ARRIVE Guidelines on Reporting Standards for Pre-Clinical Animal Studies

Unrestricted Use

CC BY

Two Years Later: Journals Are Not Yet Enforcing the ARRIVE Guidelines on Reporting Standards for Pre-Clinical Animal Studies

Rating

A study by David Baker and colleagues reveals poor quality of reporting in pre-clinical animal research and a failure of journals to implement the ARRIVE guidelines. There is growing concern that poor experimental design and lack of transparent reporting contribute to the frequent failure of pre-clinical animal studies to translate into treatments for human disease. In 2010, the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were introduced to help improve reporting standards. They were published in PLOS Biology and endorsed by funding agencies and publishers and their journals, including PLOS, Nature research journals, and other top-tier journals. Yet our analysis of papers published in PLOS and Nature journals indicates that there has been very little improvement in reporting standards since then. This suggests that authors, referees, and editors generally are ignoring guidelines, and the editorial endorsement is yet to be effectively implemented.

Subject:: Applied Science; Health, Medicine and Nursing; Life Science
Material Type:: Reading
Provider:: PLOS Biology
Author:: Ana Sottomayor; David Baker; Katie Lidster; Sandra Amor
Date Added:: 08/07/2020