Access to data is a critical feature of an efficient, progressive and …
Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.
This unit discusses the purposes of databases, a relational database, and the …
This unit discusses the purposes of databases, a relational database, and the querying language SQL. Students will design a simple database using data modeling and normalization. This unit will define basic data operations, provide instruction on how to create common query statements, and discuss SQL implementation.
Software Carpentry lesson that teaches how to use databases and SQL In …
Software Carpentry lesson that teaches how to use databases and SQL In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy. Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.
All epidemiological investigations require some form of data description. A number of …
All epidemiological investigations require some form of data description. A number of methods are available for describing data, and the most appropriate one will depend upon both the type of data available and the aims of the investigation. If these issues are not considered, useful information may be lost, or more seriously, a misleading estimate may be made.
The home of the U.S. Government’s open data. Here you will find …
The home of the U.S. Government’s open data. Here you will find data, tools, and resources to conduct research, develop web and mobile applications, design data visualizations, and more. Topics include Agriculture, Business, Climate, Education, Energy, Ecosystems, Manufacturing and more.
Those workshops help to gain new skills in research data management. Created …
Those workshops help to gain new skills in research data management. Created by MIT Libraries, under CC-BY license, others can adapt and utilize this resources to develop thier own slides in teaching data management.
By encouraging and requiring that authors share their data in order to …
By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor in the movement to improve the openness of data and the reproducibility of research. But how many social science journals encourage or mandate that authors share the data supporting their research findings? How does the share of journal data policies vary by discipline? What influences these journals’ decisions to adopt such policies and instructions? And what do those policies and instructions look like? We discuss the results of our analysis of the instructions and policies of 291 highly-ranked journals publishing social science research, where we studied the contents of journal data policies and instructions across 14 variables, such as when and how authors are asked to share their data, and what role journal ranking and age play in the existence and quality of data policies and instructions. We also compare our results to the results of other studies that have analyzed the policies of social science journals, although differences in the journals chosen and how each study defines what constitutes a data policy limit this comparison.We conclude that a little more than half of the journals in our study have data policies. A greater share of the economics journals have data policies and mandate sharing, followed by political science/international relations and psychology journals. Finally, we use our findings to make several recommendations: Policies should include the terms “data,� “dataset� or more specific terms that make it clear what to make available; policies should include the benefits of data sharing; journals, publishers, and associations need to collaborate more to clarify data policies; and policies should explicitly ask for qualitative data.
Background. Attribution to the original contributor upon reuse of published data is …
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
A number of publishers and funders, including PLOS, have recently adopted policies …
A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis. In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016. Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method. More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy. These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.
Epidemiological investigation requires a good understanding of different data types, as this …
Epidemiological investigation requires a good understanding of different data types, as this will strongly influence data analysis and interpretation. Data can broadly be classified as qualitative and quantitative, and within each of these groups, data can be further categorised as shown below. Although different grouping systems are available, it is important to consider the type of data being dealt with prior to any analysis. If desired, data can often be changed into different types through manipulation (for example, the quantitative variable weight can be converted to qualitative variables such as low/medium/high or low/not low).
This course explores the reciprocal relationships among design, science, and technology by …
This course explores the reciprocal relationships among design, science, and technology by covering a wide range of topics including industrial design, architecture, visualization and perception, design computation, material ecology, and environmental design and sustainability. Students will examine how transformations in science and technology have influenced design thinking and vice versa, as well as develop methodologies for design research and collaborate on design solutions to interdisciplinary problems.
Sharing data and code are important components of reproducible research. Data sharing …
Sharing data and code are important components of reproducible research. Data sharing in research is widely discussed in the literature; however, there are no well-established evidence-based incentives that reward data sharing, nor randomized studies that demonstrate the effectiveness of data sharing policies at increasing data sharing. A simple incentive, such as an Open Data Badge, might provide the change needed to increase data sharing in health and medical research. This study was a parallel group randomized controlled trial (protocol registration: doi:10.17605/OSF.IO/PXWZQ) with two groups, control and intervention, with 80 research articles published in BMJ Open per group, with a total of 160 research articles. The intervention group received an email offer for an Open Data Badge if they shared their data along with their final publication and the control group received an email with no offer of a badge if they shared their data with their final publication. The primary outcome was the data sharing rate. Badges did not noticeably motivate researchers who published in BMJ Open to share their data; the odds of awarding badges were nearly equal in the intervention and control groups (odds ratio = 0.9, 95% CI [0.1, 9.0]). Data sharing rates were low in both groups, with just two datasets shared in each of the intervention and control groups. The global movement towards open science has made significant gains with the development of numerous data sharing policies and tools. What remains to be established is an effective incentive that motivates researchers to take up such tools to share their data.
In a multi-week experiment, student teams gather biogas data from the mini-anaerobic …
In a multi-week experiment, student teams gather biogas data from the mini-anaerobic digesters that they build to break down different types of food waste with microbes. Using plastic soda bottles for the mini-anaerobic digesters and gas measurement devices, they compare methane gas production from decomposing hot dogs, diced vs. whole. They monitor and measure the gas production, then graph and analyze the collected data. Students learn how anaerobic digestion can be used to biorecycle waste (food, poop or yard waste) into valuable resources (nutrients, biogas, energy).
Digitize Me, Visualize Me, Search Me takes as its starting point the …
Digitize Me, Visualize Me, Search Me takes as its starting point the so-called ‘computational turn’ to data-intensive scholarship in the humanities. What Digitize Me, Visualize Me, Search Me endeavours to show is that such data-focused transformations in research can be seen as part of a major alteration in the status and nature of knowledge. It is an alteration that, according to the philosopher Jean François Lyotard, has been taking place since at least the 1950s, and involves nothing less than a shift away from a concern with questions of what is right and just, and toward a concern with legitimating power by optimizing the social system’s performance in instrumental, functional terms. This shift has significant consequences for our idea of knowledge.
Objectives To identify and appraise empirical studies on publication and related biases …
Objectives To identify and appraise empirical studies on publication and related biases published since 1998; to assess methods to deal with publication and related biases; and to examine, in a random sample of published systematic reviews, measures taken to prevent, reduce and detect dissemination bias. Data sources The main literature search, in August 2008, covered the Cochrane Methodology Register Database, MEDLINE, EMBASE, AMED and CINAHL. In May 2009, PubMed, PsycINFO and OpenSIGLE were also searched. Reference lists of retrieved studies were also examined. Review methods In Part I, studies were classified as evidence or method studies and data were extracted according to types of dissemination bias or methods for dealing with it. Evidence from empirical studies was summarised narratively. In Part II, 300 systematic reviews were randomly selected from MEDLINE and the methods used to deal with publication and related biases were assessed. Results Studies with significant or positive results were more likely to be published than those with non-significant or negative results, thereby confirming findings from a previous HTA report. There was convincing evidence that outcome reporting bias exists and has an impact on the pooled summary in systematic reviews. Studies with significant results tended to be published earlier than studies with non-significant results, and empirical evidence suggests that published studies tended to report a greater treatment effect than those from the grey literature. Exclusion of non-English-language studies appeared to result in a high risk of bias in some areas of research such as complementary and alternative medicine. In a few cases, publication and related biases had a potentially detrimental impact on patients or resource use. Publication bias can be prevented before a literature review (e.g. by prospective registration of trials), or detected during a literature review (e.g. by locating unpublished studies, funnel plot and related tests, sensitivity analysis modelling), or its impact can be minimised after a literature review (e.g. by confirmatory large-scale trials, updating the systematic review). The interpretation of funnel plot and related statistical tests, often used to assess publication bias, was often too simplistic and likely misleading. More sophisticated modelling methods have not been widely used. Compared with systematic reviews published in 1996, recent reviews of health-care interventions were more likely to locate and include non-English-language studies and grey literature or unpublished studies, and to test for publication bias. Conclusions Dissemination of research findings is likely to be a biased process, although the actual impact of such bias depends on specific circumstances. The prospective registration of clinical trials and the endorsement of reporting guidelines may reduce research dissemination bias in clinical research. In systematic reviews, measures can be taken to minimise the impact of dissemination bias by systematically searching for and including relevant studies that are difficult to access. Statistical methods can be useful for sensitivity analyses. Further research is needed to develop methods for qualitatively assessing the risk of publication bias in systematic reviews, and to evaluate the effect of prospective registration of studies, open access policy and improved publication guidelines.
Students pass around and distort messages written on index cards to learn …
Students pass around and distort messages written on index cards to learn how we use signals from GPS occultations to study the atmosphere. The cards represent information sent from GPS satellites being distorted as they pass through different locations in the Earth's atmosphere and reach other satellites. Analyzing GPS occultations enables better global weather forecasting, storm tracking and climate change monitoring.
Using the same method for measuring friction that was used in the …
Using the same method for measuring friction that was used in the previous lesson (Discovering Friction), students design and conduct an experiment to determine if weight added incrementally to an object affects the amount of friction encountered when it slides across a flat surface. After graphing the data from their experiments, students can calculate the coefficients of friction between the object and the surface it moved upon, for both static and kinetic friction.
No restrictions on your remixing, redistributing, or making derivative works. Give credit to the author, as required.
Your remixing, redistributing, or making derivatives works comes with some restrictions, including how it is shared.
Your redistributing comes with some restrictions. Do not remix or make derivative works.
Most restrictive license type. Prohibits most uses, sharing, and any changes.
Copyrighted materials, available under Fair Use and the TEACH Act for US-based educators, or other custom arrangements. Go to the resource provider to see their individual restrictions.