Collection

Information Science

1343 affiliated resources

Open filters Close filters

Data Management

Conditional Remix & Share Permitted

CC BY-NC-SA

Data Management

Rating

The MIT Libraries Data Management Group hosts a set of workshops during IAP and throughout the year to assist MIT faculty and researchers with data set control, maintenance, and sharing. This resource contains a selection of presentations from those workshops. Topics include an introduction to data management, details on data sharing and storage, data management using the DMPTool, file organization, version control, and an overview of the open data requirements of various funding sources.

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Full Course
Provider Set:: MIT OpenCourseWare
Author:: None, MIT Libraries Data Management
Date Added:: 02/01/2016

Data Management Planning

Unrestricted Use

Public Domain

Data Management Planning

Rating

Data management planning is the starting point in the data life cycle. Creating a formal document that outlines what you will do with the data during and after the completion of research helps to ensure that the data is safe for current and future use. This lesson describes the benefits of a data management plan (DMP), outlines the components of a DMP, details tools for creating a DMP, provides NSF DMP information, and demonstrates the use of an example DMP.

Subject:: Applied Science; Education; Higher Education; Information Science
Material Type:: Lesson
Provider:: DataONE
Author:: DataONE Community Engagement & Outreach Working Group
Date Added:: 11/21/2020

Data Management Short Course for Scientists

Read the Fine Print

Educational Use

Data Management Short Course for Scientists

Rating

The ESIP Federation, in cooperation with NOAA and the Data Conservancy, seeks to share the community's knowledge with scientists who increasingly need to be better data managers, as well as to support workforce development for new data management professionals. Over the next several years, the ESIP Federation expects to evolve training courses which seeks to improve the understanding of scientific data management among scientists, emerging scientists, and data professionals of all sorts.

All courses are available under a Creative Commons Attribution 3.0 license that allows you to share and adapt the work as long as you cite the work according to the citation provided. Please send feedback upon the courses to shortcourseeditors@esipfed.org.

Subject:: Applied Science; Information Science
Material Type:: Lecture; Module; Primary Source
Author:: Earth Science Information Partners
Date Added:: 03/21/2022

Data Management Skillbuilding Hub - DataOne

Unrestricted Use

Public Domain

Data Management Skillbuilding Hub - DataOne

Rating

The Data Management Skillbuilding Hub is a repository for open educational resources regarding data management, meaning that it is a collection of learning resources freely contributed by anyone willing to share them. Materials such as lessons, best practices, and videos, are stored in the DataONEorg GitHub repository as well as searchable through the Data Management Training Clearinghouse. We invite you submit your own educational resources so that the Data Management Skillbuilding Hub can remain an up-to-date and sustainable educational tool for all to benefit from. You can easily contribute learning materials to the Skillbuilding Hub via GitHub online.

Subject:: Applied Science; Information Science
Material Type:: Lesson; Primary Source
Provider:: DataONE
Date Added:: 03/21/2022

Data Management and Governance Glossary

Conditional Remix & Share Permitted

CC BY-NC

Data Management and Governance Glossary

Rating

A Claremont Graduate University EDUC 448 Fall 2021 Course Publication

Short Description:
This glossary is intended to support professionals who are seeking to understand Data Management and Governance in the context of K-12 and higher education. The definitions included in this ebook provide a fundamental understanding of common Data Management and Governance terms. This glossary was co-created by education professionals and graduate students enrolled in Claremont Graduate University’s EDUC 448: Data Management & Governance course taught by Dr. Gwen Garrison, PhD during the Fall 2021 semester.

Word Count: 2578

(Note: This resource's metadata has been created automatically by reformatting and/or combining the information that the author initially provided as part of a bulk import process.)

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Textbook
Provider:: Claremont Colleges
Date Added:: 01/11/2021

Data Management with SQL for Ecologists

Unrestricted Use

CC BY

Data Management with SQL for Ecologists

Rating

Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Christina Koch; Donal Heidenblad; Katy Felkner; Rémi Rampin; Timothée Poisot
Date Added:: 03/20/2017

Data Management with SQL for Social Scientists

Unrestricted Use

CC BY

Data Management with SQL for Social Scientists

Rating

This is an alpha lesson to teach Data Management with SQL for Social Scientists, We welcome and criticism, or error; and will take your feedback into account to improve both the presentation and the content. Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: Peter Smyth
Date Added:: 08/07/2020

Data Organization in Spreadsheets for Ecologists

Unrestricted Use

CC BY

Data Organization in Spreadsheets for Ecologists

Rating

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Christie Bahlai; Peter R. Hoyt; Tracy Teal
Date Added:: 03/20/2017

Data Organization in Spreadsheets for Social Scientists

Unrestricted Use

CC BY

Data Organization in Spreadsheets for Social Scientists

Rating

Lesson on spreadsheets for social scientists. Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:: Applied Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: David Mawdsley; Erin Becker; François Michonneau; Karen Word; Lachlan Deer; Peter Smyth
Date Added:: 08/07/2020

Data Quality Control and Assurance

Unrestricted Use

Public Domain

Data Quality Control and Assurance

Rating

Quality assurance and quality control are phrases used to describe activities that prevent errors from entering or staying in a data set. These activities ensure the quality of the data before it is collected, entered, or analyzed, as well as actively monitoring and maintaining the quality of data throughout the study. In this lesson, we define and provide examples of quality assurance, quality control, data contamination and types of errors that may be found in data sets. After completing this lesson, participants will be able to describe best practices in quality assurance and quality control and relate them to different phases of data collection and entry.

Subject:: Applied Science; Education; Higher Education; Information Science
Material Type:: Lesson
Provider:: DataONE
Author:: DataONE Community Engagement & Outreach Working Group
Date Added:: 11/21/2020

Unrestricted Use

Public Domain

Data Sharing

Rating

When first sharing research data, researchers often raise questions about the value, benefits, and mechanisms for sharing. Many stakeholders and interested parties, such as funding agencies, communities, other researchers, or members of the public may be interested in research, results and related data. This lesson addresses data sharing in the context of the data life cycle, the value of sharing data, concerns about sharing data, and methods and best practices for sharing data.

Subject:: Applied Science; Education; Higher Education; Information Science
Material Type:: Lesson
Provider:: DataONE
Author:: DataONE Community Engagement & Outreach Working Group
Date Added:: 11/21/2020

Data Sharing, Mandates, and Repositories

Conditional Remix & Share Permitted

CC BY-NC

Data Sharing, Mandates, and Repositories

Rating

Some research funders have a mandate for data resulting from their funded research to be shared. This presentation provides a general definition of data sharing and how scholars can identify and follow data sharing mandates.

Subject:: Applied Science; Education; Higher Education; Information Science
Material Type:: Lecture
Author:: Kristy Padron
Date Added:: 11/22/2020

Data Training Engaging End-users

Conditional Remix & Share Permitted

CC BY-SA

Data Training Engaging End-users

Rating

Data Tree is a free online course with all you need to know for research data management, along with ways to engage and share data with business, policymakers, media and the wider public. The self-paced training course will take 15 to 20 hours to complete in eight structured modules. The course is packed with video, quizzes and real-life examples of data management, along with valuable tips from experts in data management, data sharing and science communication. The training course materials will be available for structured learning, but also to dip into for immediate problem solving.

Data Tree is funded by the Natural Environment Research Council (NERC) through the National Productivity Investment Fund (NPIF), delivered by the Institute for Environmental Analytics and Stats4SD and supported by the Institute of Physics.

Subject:: Applied Science; Information Science
Material Type:: Module; Primary Source
Date Added:: 05/16/2022

Data Wrangling and Processing for Genomics

Unrestricted Use

CC BY

Data Wrangling and Processing for Genomics

Rating

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Thomas; Ahmed R. Hasan; Aniello Infante; Anita Schürch; Dev Paudel; Erin Alison Becker; Fotis Psomopoulos; François Michonneau; Gaius Augustus; Gregg TeHennepe; Jason Williams; Jessica Elizabeth Mizzi; Karen Cranston; Kari L Jordan; Kate Crosby; Kevin Weitemier; Lex Nederbragt; Luis Avila; Peter R. Hoyt; Rayna Michelle Harris; Ryan Peek; Sheldon John McKay; Sheldon McKay; Taylor Reiter; Tessa Pierce; Toby Hodges; Tracy Teal; Vasilis Lenis; Winni Kretzschmar; dbmarchant
Date Added:: 08/07/2020

Data Wrangling with R

Conditional Remix & Share Permitted

CC BY-NC-SA

Data Wrangling with R

Rating

Cleaning, reshaping, and transforming data for analysis and visualization, with R and the Tidyverse

Word Count: 3515

(Note: This resource's metadata has been created automatically by reformatting and/or combining the information that the author initially provided as part of a bulk import process.)

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Social Science; Sociology; Statistics and Probability
Material Type:: Textbook
Provider:: College of DuPage Press, 2022
Author:: Christine Monnier
Date Added:: 07/13/2022

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Unrestricted Use

CC BY

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition

Rating

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

Subject:: Applied Science; Information Science
Material Type:: Reading
Provider:: Royal Society Open Science
Author:: Alicia Hofelich Mohr; Bria Long; Elizabeth Clayton; Erica J. Yoon; George C. Banks; Gustav Nilsonne; Kyle MacDonald; Mallory C. Kidwell; Maya B. Mathur; Michael C. Frank; Michael Henry Tessler; Richie L. Lenne; Sara Altman; Tom E. Hardwicke
Date Added:: 08/07/2020

Database (08:01): Database Fundamentals

Only Sharing Permitted

CC BY-ND

Database (08:01): Database Fundamentals

Rating

The first video in our database lesson, part of the Introduction to Computer series.
This video looks at the basics of databases. We define database, as well as key terms to know.

Subject:: Applied Science; Business and Communication; Information Science
Material Type:: Lecture
Provider:: Mr. Ford's Class
Author:: Scott Ford
Date Added:: 09/26/2014

Database (08:02): Database Management Systems

Only Sharing Permitted

CC BY-ND

Database (08:02): Database Management Systems

Rating

Database Management Systems is the software that allows us to create and use a database. This video looks at the DBMS, their functions, some examples of popular software solutions and a quick look at Structured Query Language (SQL)

Subject:: Applied Science; Business and Communication; Information Science
Material Type:: Lecture
Provider:: Mr. Ford's Class
Author:: Scott Ford
Date Added:: 09/26/2014

Database (08:03): Database Models

Only Sharing Permitted

CC BY-ND

Database (08:03): Database Models

Rating

The database management software is the program used to create and mange the database. The database model is the architecture the DBMS used to store objects within that database.

Subject:: Applied Science; Business and Communication; Information Science
Material Type:: Lecture
Provider:: Mr. Ford's Class
Author:: Scott Ford
Date Added:: 09/26/2014

Database (08:04): Some Final Bits

Only Sharing Permitted

CC BY-ND

Database (08:04): Some Final Bits

Rating

Our final database video. This one looks at some odds and ends. We examine: Data Warehouse, Data Mining, Big Data. I also talk about the ethics of data mining from the NSA and CDC, and how they are different.

We also give out top picks for the lesson.

Links from Video:
•http://www.w3schools.com/sql/
•What is Database & SQL by Guru99 http://youtu.be/FR4QIeZaPeM
•What is a database http://youtu.be/t8jgX1f8kc4
•MySQL Database For Beginners https://www.udemy.com/mysql-database-for-beginners2/

Subject:: Applied Science; Business and Communication; Information Science
Material Type:: Lecture
Provider:: Mr. Ford's Class
Author:: Scott Ford
Date Added:: 09/26/2014