Data pattern recognition exercise where students compare the two daily datasets (for one month) to find relationships.
- Material Type:
- Activity/Lab
- Lesson Plan
- Provider:
- Lowell High School
- Author:
- Mark Wenning
- Date Added:
- 06/15/2011
Data pattern recognition exercise where students compare the two daily datasets (for one month) to find relationships.
STUDENT ACTIVITY - 4th - NCThis is a distance-learning lesson students can complete at home. The student will explore and record data about different plants that they observe. Then they will create a bar graph to reflect the data that that they collected. This activity was created by Out Teach (out-teach.org), a nonprofit providing outdoor experiential learning to transform Science education for students in under-served communities.
In this lesson, students will explore and record data about different plants that they observe. Students will then create a bar graph to reflect the data that that they collected. For example, a student might find 7 tomato plants, 5 cabbages, and 4 squash plants; their graph would reflect these numbers.
This lesson is part of Software Carpentry workshops and teach an introduction to plotting and programming using python. This lesson is an introduction to programming in Python for people with little or no previous programming experience. It uses plotting as its motivating example, and is designed to be used in both Data Carpentry and Software Carpentry workshops. This lesson references JupyterLab, but can be taught using a regular Python interpreter as well. Please note that this lesson uses Python 3 rather than Python 2.
Objective To investigate the replication validity of biomedical association studies covered by newspapers. Methods We used a database of 4723 primary studies included in 306 meta-analysis articles. These studies associated a risk factor with a disease in three biomedical domains, psychiatry, neurology and four somatic diseases. They were classified into a lifestyle category (e.g. smoking) and a non-lifestyle category (e.g. genetic risk). Using the database Dow Jones Factiva, we investigated the newspaper coverage of each study. Their replication validity was assessed using a comparison with their corresponding meta-analyses. Results Among the 5029 articles of our database, 156 primary studies (of which 63 were lifestyle studies) and 5 meta-analysis articles were reported in 1561 newspaper articles. The percentage of covered studies and the number of newspaper articles per study strongly increased with the impact factor of the journal that published each scientific study. Newspapers almost equally covered initial (5/39 12.8%) and subsequent (58/600 9.7%) lifestyle studies. In contrast, initial non-lifestyle studies were covered more often (48/366 13.1%) than subsequent ones (45/3718 1.2%). Newspapers never covered initial studies reporting null findings and rarely reported subsequent null observations. Only 48.7% of the 156 studies reported by newspapers were confirmed by the corresponding meta-analyses. Initial non-lifestyle studies were less often confirmed (16/48) than subsequent ones (29/45) and than lifestyle studies (31/63). Psychiatric studies covered by newspapers were less often confirmed (10/38) than the neurological (26/41) or somatic (40/77) ones. This is correlated to an even larger coverage of initial studies in psychiatry. Whereas 234 newspaper articles covered the 35 initial studies that were later disconfirmed, only four press articles covered a subsequent null finding and mentioned the refutation of an initial claim. Conclusion Journalists preferentially cover initial findings although they are often contradicted by meta-analyses and rarely inform the public when they are disconfirmed.
The Journal of Physiology and British Journal of Pharmacology jointly published an editorial series in 2011 to improve standards in statistical reporting and data analysis. It is not known whether reporting practices changed in response to the editorial advice. We conducted a cross-sectional analysis of reporting practices in a random sample of research papers published in these journals before (n = 202) and after (n = 199) publication of the editorial advice. Descriptive data are presented. There was no evidence that reporting practices improved following publication of the editorial advice. Overall, 76-84% of papers with written measures that summarized data variability used standard errors of the mean, and 90-96% of papers did not report exact p-values for primary analyses and post-hoc tests. 76-84% of papers that plotted measures to summarize data variability used standard errors of the mean, and only 2-4% of papers plotted raw data used to calculate variability. Of papers that reported p-values between 0.05 and 0.1, 56-63% interpreted these as trends or statistically significant. Implied or gross spin was noted incidentally in papers before (n = 10) and after (n = 9) the editorial advice was published. Overall, poor statistical reporting, inadequate data presentation and spin were present before and after the editorial advice was published. While the scientific community continues to implement strategies for improving reporting practices, our results indicate stronger incentives or enforcements are needed.
The Portage Network offers a range of training materials – everything from one-page guides to online training modules and videos – that span the research data life cycle.
With the assistance of the Portage National Training Expert Group, the Portage Network of Experts continues to develop new bilingual training aids and online modules to support a community of practice for research data management in Canada.
These materials are intended for researchers, library data specialists, research data managers, and discipline and functional experts across the research data landscape. All training resources created by Portage are licensed under CC BY-NC 4.0 and are free to share and adapt for your own needs.
If you have questions about developing RDM training at your institution or would like assistance with creating in-person or online training resources or opportunities, please contact RDM-GDR@alliancecan.ca.
This activity focuses on applying analytic tools such as pie charts and bar graphs to gain a better understanding of practical energy use issues. It also provides experience with how different types of data collected affect the outcome of statistical visualization tools.
The goals of this course are to give students a greater knowledge and understanding of mathematics and to help students develop new skills and concepts and enhance their problem-solving ability, all of which are necessary for the study of a science and engineering oriented calculus. MAT 196 is also designed to help students further develop and extend their critical thinking skills in a contextualized environment. You will achieve this goal by applying strategies presented by the instructor which are designed to help you interpret, analyze, evaluate, infer, and synthesize concepts studied in preparation for Calculus.
Computational analyses are playing an increasingly central role in research. Journals, funders, and researchers are calling for published research to include associated data and code. However, many involved in research have not received training in best practices and tools for sharing code and data. This course aims to address this gap in training while also providing those who support researchers with curated best practices guidance and tools.This course is unique compared to other reproducibility courses due to its practical, step-by-step design. It is comprised of hands-on exercises to prepare research code and data for computationally reproducible publication. Although the course starts with some brief introductory information about computational reproducibility, the bulk of the course is guided work with data and code. Participants move through preparing research for reuse, organization, documentation, automation, and submitting their code and data to share. Tools that support reproducibility will be introduced (Code Ocean), but all lessons will be platform agnostic.Level: IntermediateIntended audience: The course is targeted at researchers and research support staff who are involved in the preparation and publication of research materials. Anyone with an interest in reproducible publication is welcome. The course is especially useful for those looking to learn practical steps for improving the computational reproducibility of their own research.
This video shows interested researchers how to get started on their own preregistration as part of the Preregistration Challenge. Learn how to create a new draft, find example preregistrations from different fields, respond to comments from the preregistration review team, and turn your final draft into a formal preregistration. For more information, check out https://www.cos.io/initiatives/prereg-more-information.
In this webinar Professor Brian Nosek, Executive Director of the Center for Open Science (https://cos.io), outlines the practice of Preregistration and how it can aid in increasing the rigor and reproducibility of research. The webinar is co-hosted by the Health Research Alliance, a collaborative member organization of nonprofit research funders. Slides available at: https://osf.io/9m6tx/
Hear from Andrew Foster, editor at the Journal of Development Economics, and Irenaeus Wolff, a guest editor for Experimental Economics, as they discuss their experiences with implementing the Registered Reports format, how it was received by authors, and the trends they noticed after adoption. Aleksandar Bogdanoski of BITSS also joins us to explore pre-results review, how to facilitate the process at journals, and best practices for supporting authors and reviewers.
A collection of slides for virtually all presentations given by Center for Open Science staff since its founding in 2013.
How researchers structure their data varies by disciplines and research questions. Still, there are general guidelines for structuring data that make it more likely to be usable in the future. The following questions should be considered for any project that gathers data. These questions should be considered first at the planning stage, again as data is being gathered and stored, and once more prior to final deposit into a digital archive or repository.
1. What are the data organization standards for your field? For example, there are often standards for labeling data fields that will make your data machinereadable. There may also be specific variables and coding guidelines that you can use that will make your work interoperable with other datasets. Lastly, there may be accepted hierarchies and directory structures in your discipline that you can build upon.
2. What are the data export options in the software you are using? If using proprietary and/or highly specialized software to analyze large data sets, export the data in a format that is likely to be supported in the future, and that will be accessible from other software programs. This usually means choosing an open format that is not proprietary. Remember that you may not have access to the same software in the future, and not all software upgrades can read old file types.
3. What forms of the data will be needed for future access? Consider the various forms the data may take, and the scale of the data involved. You may need to preserve not only the underlying raw data, but also the resulting analyses you have created from it.
14. Brave New World: Privacy, Data Sharing and Evidence Based Policy Making
The trifecta of globalization, urbanization and digitization have created new opportunities and challenges across our nation, cities, boroughs and urban centers. Cities in particular are in a unique position at the center of commerce and technology becoming hubs for innovation and practical application of emerging technology. In this rapidly changing 24/7 digitized world, governments are leveraging innovation and technology to become more effective, efficient, transparent and to be able to better plan for and anticipate the needs of its citizens, businesses and community organizations. This class will provide the framework for how cities and communities can become smarter and more accessible with technology and more connected.
A work in progress, this FlexBook is an introduction to theoretical probability and data organization. Students learn about events, conditions, random variables, and graphs and tables that allow them to manage data.
The best way to learn how to program is to do something useful, so this introduction to MATLAB is built around a common scientific task: data analysis. Our real goal isn’t to teach you MATLAB, but to teach you the basic concepts that all programming depends on. We use MATLAB in our lessons because: we have to use something for examples; it’s well-documented; it has a large (and growing) user base among scientists in academia and industry; and it has a large library of packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so that you can share your work with them easily, and to use that language well.
The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis. Arthritis Inflammation We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in comma-separated values (CSV) format: each row holds information for a single patient, columns represent successive days. The first three rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 Each number represents the number of inflammation bouts that a particular patient experienced on a given day. For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study. So, we want to: Calculate the average inflammation per day across all patients. Plot the result to discuss and share with colleagues. To do all that, we’ll have to learn a little bit about programming.
The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis. Our real goal isn’t to teach you R, but to teach you the basic concepts that all programming depends on. We use R in our lessons because: we have to use something for examples; it’s free, well-documented, and runs almost everywhere; it has a large (and growing) user base among scientists; and it has a large library of external packages available for performing diverse tasks. But the two most important things are to use whatever language your colleagues are using, so you can share your work with them easily, and to use that language well. We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this: 0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0 0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1 0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1 0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1 0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1 We want to: load that data into memory, calculate the average inflammation per day across all patients, and plot the result. To do all that, we’ll have to learn a little bit about programming.