Updating search results...

Search Resources

4 Results

View
Selected filters:
Data Organization in Spreadsheets for Ecologists
Unrestricted Use
CC BY
Rating
0.0 stars

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:
Applied Science
Computer Science
Information Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Christie Bahlai
Peter R. Hoyt
Tracy Teal
Date Added:
03/20/2017
Data Wrangling and Processing for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry lesson to learn how to use command-line tools to perform quality control, align reads to a reference genome, and identify and visualize between-sample variation. A lot of genomics analysis is done using command-line tools for three reasons: 1) you will often be working with a large number of files, and working through the command-line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks, 2) you will often need more compute power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface, and 3) you will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if in fact a GUI tool even exists). In a previous lesson, you learned how to use the bash shell to interact with your computer through a command line interface. In this lesson, you will be applying this new knowledge to carry out a common genomics workflow - identifying variants among sequencing samples taken from multiple individuals within a population. We will be starting with a set of sequenced reads (.fastq files), performing some quality control steps, aligning those reads to a reference genome, and ending by identifying and visualizing variations among these samples. As you progress through this lesson, keep in mind that, even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatic tools. What you learn here will enable you to use a variety of bioinformatic tools with confidence and greatly enhance your research efficiency and productivity.

Subject:
Applied Science
Computer Science
Genetics
Information Science
Life Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Thomas
Ahmed R. Hasan
Aniello Infante
Anita Schürch
Dev Paudel
Erin Alison Becker
Fotis Psomopoulos
François Michonneau
Gaius Augustus
Gregg TeHennepe
Jason Williams
Jessica Elizabeth Mizzi
Karen Cranston
Kari L Jordan
Kate Crosby
Kevin Weitemier
Lex Nederbragt
Luis Avila
Peter R. Hoyt
Rayna Michelle Harris
Ryan Peek
Sheldon John McKay
Sheldon McKay
Taylor Reiter
Tessa Pierce
Toby Hodges
Tracy Teal
Vasilis Lenis
Winni Kretzschmar
dbmarchant
Date Added:
08/07/2020
Project Organization and Management for Genomics
Unrestricted Use
CC BY
Rating
0.0 stars

Data Carpentry Genomics workshop lesson to learn how to structure your metadata, organize and document your genomics data and bioinformatics workflow, and access data on the NCBI sequence read archive (SRA) database. Good data organization is the foundation of any research project. It not only sets you up well for an analysis, but it also makes it easier to come back to the project later and share with collaborators, including your most important collaborator - future you. Organizing a project that includes sequencing involves many components. There’s the experimental setup and conditions metadata, measurements of experimental parameters, sequencing preparation and sample information, the sequences themselves and the files and workflow of any bioinformatics analysis. So much of the information of a sequencing project is digital, and we need to keep track of our digital records in the same way we have a lab notebook and sample freezer. In this lesson, we’ll go through the project organization and documentation that will make an efficient bioinformatics workflow possible. Not only will this make you a more effective bioinformatics researcher, it also prepares your data and project for publication, as grant agencies and publishers increasingly require this information. In this lesson, we’ll be using data from a study of experimental evolution using E. coli. More information about this dataset is available here. In this study there are several types of files: Spreadsheet data from the experiment that tracks the strains and their phenotype over time Spreadsheet data with information on the samples that were sequenced - the names of the samples, how they were prepared and the sequencing conditions The sequence data Throughout the analysis, we’ll also generate files from the steps in the bioinformatics pipeline and documentation on the tools and parameters that we used. In this lesson you will learn: How to structure your metadata, tabular data and information about the experiment. The metadata is the information about the experiment and the samples you’re sequencing. How to prepare for, understand, organize and store the sequencing data that comes back from the sequencing center How to access and download publicly available data that may need to be used in your bioinformatics analysis The concepts of organizing the files and documenting the workflow of your bioinformatics analysis

Subject:
Business and Communication
Genetics
Life Science
Management
Material Type:
Module
Provider:
The Carpentries
Author:
Amanda Charbonneau
Bérénice Batut
Daniel O. S. Ouso
Deborah Paul
Erin Alison Becker
François Michonneau
Jason Williams
Juan A. Ugalde
Kevin Weitemier
Laura Williams
Paula Andrea Martinez
Peter R. Hoyt
Rayna Michelle Harris
Taylor Reiter
Toby Hodges
Tracy Teal
Date Added:
08/07/2020
The Unix Shell
Unrestricted Use
CC BY
Rating
0.0 stars

Software Carpentry lesson on how to use the shell to navigate the filesystem and write simple loops and scripts. The Unix shell has been around longer than most of its users have been alive. It has survived so long because it’s a power tool that allows people to do complex things with just a few keystrokes. More importantly, it helps them combine existing programs in new ways and automate repetitive tasks so they aren’t typing the same things over and over again. Use of the shell is fundamental to using a wide range of other powerful tools and computing resources (including “high-performance computing” supercomputers). These lessons will start you on a path towards using these resources effectively.

Subject:
Applied Science
Computer Science
Mathematics
Measurement and Data
Material Type:
Module
Provider:
The Carpentries
Author:
Adam Huffman
Adam James Orr
Adam Richie-Halford
AidaMirsalehi
Alex Kassil
Alex Mac
Alexander Konovalov
Alexander Morley
Alix Keener
Amy Brown
Andrea Bedini
Andrew Boughton
Andrew Reid
Andrew T. T. McRae
Andrew Walker
Ariel Rokem
Armin Sobhani
Ashwin Srinath
Bagus Tris Atmaja
Bartosz Telenczuk
Ben Bolker
Benjamin Gabriel
Bertie Seyffert
Bill Mills
Brian Ballsun-Stanton
BrianBill
Camille Marini
Chris Mentzel
Christina Koch
Colin Morris
Colin Sauze
Damien Irving
Dan Jones
Dana Brunson
Daniel Baird
Daniel McCloy
Daniel Standage
Danielle M. Nielsen
Dave Bridges
David Eyers
David McKain
David Vollmer
Dean Attali
Devinsuit
Dmytro Lituiev
Donny Winston
Doug Latornell
Dustin Lang
Elena Denisenko
Emily Dolson
Emily Jane McTavish
Eric Jankowski
Erin Alison Becker
Ethan P White
Evgenij Belikov
Farah Shamma
Fatma Deniz
Filipe Fernandes
Francis Gacenga
François Michonneau
Gabriel A. Devenyi
Gerard Capes
Giuseppe Profiti
Greg Wilson
Halle Burns
Hannah Burkhardt
Harriet Alexander
Hugues Fontenelle
Ian van der Linde
Inigo Aldazabal Mensa
Jackie Milhans
Jake Cowper Szamosi
James Guelfi
Jan T. Kim
Jarek Bryk
Jarno Rantaharju
Jason Macklin
Jay van Schyndel
Jens vdL
John Blischak
John Pellman
John Simpson
Jonah Duckles
Jonny Williams
Joshua Madin
Kai Blin
Kathy Chung
Katrin Leinweber
Kevin M. Buckley
Kirill Palamartchouk
Klemens Noga
Kristopher Keipert
Kunal Marwaha
Laurence
Lee Zamparo
Lex Nederbragt
M Carlise
Mahdi Sadjadi
Marc Rajeev Gouw
Marcel Stimberg
Maria Doyle
Marie-Helene Burle
Marisa Lim
Mark Mandel
Martha Robinson
Martin Feller
Matthew Gidden
Matthew Peterson
Megan Fritz
Michael Zingale
Mike Henry
Mike Jackson
Morgan Oneka
Murray Hoggett
Nicola Soranzo
Nicolas Barral
Noah D Brenowitz
Noam Ross
Norman Gray
Orion Buske
Owen Kaluza
Patrick McCann
Paul Gardner
Pauline Barmby
Peter R. Hoyt
Peter Steinbach
Philip Lijnzaad
Phillip Doehle
Piotr Banaszkiewicz
Rafi Ullah
Raniere Silva
Robert A Beagrie
Ruud Steltenpool
Ry4an Brase
Rémi Emonet
Sarah Mount
Sarah Simpkin
Scott Ritchie
Stephan Schmeing
Stephen Jones
Stephen Turner
Steve Leak
Stéphane Guillou
Susan Miller
Thomas Mellan
Tim Keighley
Tobin Magle
Tom Dowrick
Trevor Bekolay
Varda F. Hagh
Victor Koppejan
Vikram Chhatre
Yee Mey
csqrs
earkpr
ekaterinailin
nther
reshama shaikh
s-boardman
sjnair
Date Added:
03/20/2017