OER Commons

Data Organization in Spreadsheets for Ecologists

Unrestricted Use

CC BY

Data Organization in Spreadsheets for Ecologists

Rating

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Christie Bahlai; Peter R. Hoyt; Tracy Teal
Date Added:: 03/20/2017

More Less

Data Wrangling and Processing for Genomics

Unrestricted Use

CC BY

Data Wrangling and Processing for Genomics

Rating

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:: Applied Science; Computer Science; Genetics; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Thomas; Ahmed R. Hasan; Aniello Infante; Anita Schürch; Dev Paudel; Erin Alison Becker; Fotis Psomopoulos; François Michonneau; Gaius Augustus; Gregg TeHennepe; Jason Williams; Jessica Elizabeth Mizzi; Karen Cranston; Kari L Jordan; Kate Crosby; Kevin Weitemier; Lex Nederbragt; Luis Avila; Peter R. Hoyt; Rayna Michelle Harris; Ryan Peek; Sheldon John McKay; Sheldon McKay; Taylor Reiter; Tessa Pierce; Toby Hodges; Tracy Teal; Vasilis Lenis; Winni Kretzschmar; dbmarchant
Date Added:: 08/07/2020

More Less

Project Organization and Management for Genomics

Unrestricted Use

CC BY

Project Organization and Management for Genomics

Rating

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:: Business and Communication; Genetics; Life Science; Management
Material Type:: Module
Provider:: The Carpentries
Author:: Amanda Charbonneau; Bérénice Batut; Daniel O. S. Ouso; Deborah Paul; Erin Alison Becker; François Michonneau; Jason Williams; Juan A. Ugalde; Kevin Weitemier; Laura Williams; Paula Andrea Martinez; Peter R. Hoyt; Rayna Michelle Harris; Taylor Reiter; Toby Hodges; Tracy Teal
Date Added:: 08/07/2020

More Less

Unrestricted Use

CC BY

The Unix Shell

Rating

Software Carpentry lesson on how to use the shell to navigate the filesystem and write simple loops and scripts. The Unix shell has been around longer than most of its users have been alive. It has survived so long because itâ€™s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they arenâ€™t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including â€œhigh-performance computingâ€ supercomputers). These lessons will start you on a path towards using these resources effectively.

Subject:: Applied Science; Computer Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Huffman; Adam James Orr; Adam Richie-Halford; AidaMirsalehi; Alex Kassil; Alex Mac; Alexander Konovalov; Alexander Morley; Alix Keener; Amy Brown; Andrea Bedini; Andrew Boughton; Andrew Reid; Andrew T. T. McRae; Andrew Walker; Ariel Rokem; Armin Sobhani; Ashwin Srinath; Bagus Tris Atmaja; Bartosz Telenczuk; Ben Bolker; Benjamin Gabriel; Bertie Seyffert; Bill Mills; Brian Ballsun-Stanton; BrianBill; Camille Marini; Chris Mentzel; Christina Koch; Colin Morris; Colin Sauze; Damien Irving; Dan Jones; Dana Brunson; Daniel Baird; Daniel McCloy; Daniel Standage; Danielle M. Nielsen; Dave Bridges; David Eyers; David McKain; David Vollmer; Dean Attali; Devinsuit; Dmytro Lituiev; Donny Winston; Doug Latornell; Dustin Lang; Elena Denisenko; Emily Dolson; Emily Jane McTavish; Eric Jankowski; Erin Alison Becker; Ethan P White; Evgenij Belikov; Farah Shamma; Fatma Deniz; Filipe Fernandes; Francis Gacenga; François Michonneau; Gabriel A. Devenyi; Gerard Capes; Giuseppe Profiti; Greg Wilson; Halle Burns; Hannah Burkhardt; Harriet Alexander; Hugues Fontenelle; Ian van der Linde; Inigo Aldazabal Mensa; Jackie Milhans; Jake Cowper Szamosi; James Guelfi; Jan T. Kim; Jarek Bryk; Jarno Rantaharju; Jason Macklin; Jay van Schyndel; Jens vdL; John Blischak; John Pellman; John Simpson; Jonah Duckles; Jonny Williams; Joshua Madin; Kai Blin; Kathy Chung; Katrin Leinweber; Kevin M. Buckley; Kirill Palamartchouk; Klemens Noga; Kristopher Keipert; Kunal Marwaha; Laurence; Lee Zamparo; Lex Nederbragt; M Carlise; Mahdi Sadjadi; Marc Rajeev Gouw; Marcel Stimberg; Maria Doyle; Marie-Helene Burle; Marisa Lim; Mark Mandel; Martha Robinson; Martin Feller; Matthew Gidden; Matthew Peterson; Megan Fritz; Michael Zingale; Mike Henry; Mike Jackson; Morgan Oneka; Murray Hoggett; Nicola Soranzo; Nicolas Barral; Noah D Brenowitz; Noam Ross; Norman Gray; Orion Buske; Owen Kaluza; Patrick McCann; Paul Gardner; Pauline Barmby; Peter R. Hoyt; Peter Steinbach; Philip Lijnzaad; Phillip Doehle; Piotr Banaszkiewicz; Rafi Ullah; Raniere Silva; Robert A Beagrie; Ruud Steltenpool; Ry4an Brase; Rémi Emonet; Sarah Mount; Sarah Simpkin; Scott Ritchie; Stephan Schmeing; Stephen Jones; Stephen Turner; Steve Leak; Stéphane Guillou; Susan Miller; Thomas Mellan; Tim Keighley; Tobin Magle; Tom Dowrick; Trevor Bekolay; Varda F. Hagh; Victor Koppejan; Vikram Chhatre; Yee Mey; csqrs; earkpr; ekaterinailin; nther; reshama shaikh; s-boardman; sjnair
Date Added:: 03/20/2017

More Less

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

4 Results

Search Resources

Education Standards

Subject Area

Education Level

Material Type

License Types

Content Source

Primary User

Media Format

Educational Use

Language

Providers

4 Results