Master statistical and econometric tools along with coding in R, to analyze data and answer economic questions.
- Subject:
- Economics
- Social Science
- Material Type:
- Full Course
- Author:
- Div Bhagia
- Date Added:
- 11/09/2024
Master statistical and econometric tools along with coding in R, to analyze data and answer economic questions.
A Data Carpentry curriculum for Economics is being developed by Dr. Miklos Koren at Central European University. These materials are being piloted locally. Development for these lessons has been supported by a grant from the Sloan Foundation.
These are slides for a masters-level course with the following course description:An intensive course in which students will create sophisticated work in multiple modalities (e.g. text, images, audio, etc.) that develops and expresses ideas focused on the needs of the audience to increase its knowledge, foster understanding, or promote a change in its attitudes.The audience for this course was primarily working professionals with a wide range of prior knowledge of these practices, so this material could easily be (and was) adapted for undergraduate students. I think it would also be possible to adapt for high schoolers, if desired.
Software Carpentry lección para control de versiones con Git Para ilustrar el poder de Git y GitHub, usaremos la siguiente historia como un ejemplo motivador a través de esta lección. El Hombre Lobo y Drácula han sido contratados por Universal Missions para investigar si es posible enviar su próximo explorador planetario a Marte. Ellos quieren poder trabajar al mismo tiempo en los planes, pero ya han experimentado ciertos problemas anteriormente al hacer algo similar. Si se rotan por turnos entonces cada uno gastará mucho tiempo esperando a que el otro termine, pero si trabajan en sus propias copias e intercambian los cambios por email, las cosas se perderán, se sobreescribirán o se duplicarán. Un colega sugiere utilizar control de versiones para lidiar con el trabajo. El control de versiones es mejor que el intercambio de ficheros por email: Nada se pierde una vez que se incluye bajo control de versiones, a no ser que se haga un esfuerzo sustancial. Como se van guardando todas las versiones precedentes de los ficheros, siempre es posible volver atrás en el tiempo y ver exactamente quién escribió qué en un día en particular, o qué versión de un programa fue utilizada para generar un conjunto de resultados en particular. Como se tienen estos registros de quién hizo qué y en qué momento, es posible saber a quién preguntar si se tiene una pregunta en un momento posterior y, si es necesario, revertir el contenido a una versión anterior, de forma similar a como funciona el comando “deshacer” de los editores de texto. Cuando varias personas colaboran en el mismo proyecto, es posible pasar por alto o sobreescribir de manera accidental los cambios hechos por otra persona. El sistema de control de versiones notifica automáticamente a los usuarios cada vez que hay un conflicto entre el trabajo de una persona y la otra. Los equipos no son los únicos que se benefician del control de versiones: los investigadores independientes se pueden beneficiar en gran medida. Mantener un registro de qué ha cambiado, cuándo y por qué es extremadamente útil para todos los investigadores si alguna vez necesitan retomar el proyecto en un momento posterior (e.g. un año después, cuando se ha desvanecido el recuerdo de los detalles).
The ETD+ Virtual Workshop Series, taught by Dr. Katherine Skinner, is a set of free introductory training resources on crucial data curation and digital longevity techniques. Focusing on the Electronic Thesis and Dissertation (ETD) as a mile-marker in a student’s research trajectory, it provides in-time advice to students and faculty about avoiding common digital loss scenarios for the ETD and all of its affiliated files.
About the ETDplus Project
The ETDplus project is helping institutions ensure the longevity and availability of ETD research data and complex digital objects (e.g., software, multimedia files) that comprise an integral component of student theses and dissertations. The project was generously funded by the Institute of Museum and Library Services (IMLS) and led by the Educopia Institute, in collaboration with the NDLTD, HBCU Alliance, bepress, ProQuest, and the libraries of Carnegie Mellon, Colorado State, Indiana State, Morehouse, Oregon State, Penn State, Purdue, University of Louisville, University of Tennessee, the University of North Texas, and Virginia Tech.
Acknowledgements
This project was made possible in part by the Institute of Museum and Library Services.
In this course, the student will learn the theoretical and practical aspects of algorithms and Data Structures. The student will also learn to implement Data Structures and algorithms in C/C++, analyze those algorithms, and consider both their worst-case complexity and practical efficiency. Upon successful completion of this course, students will be able to: Identify elementary Data Structures using C/C++ programming languages; Analyze the importance and use of Abstract Data Types (ADTs); Design and implement elementary Data Structures such as arrays, trees, Stacks, Queues, and Hash Tables; Explain best, average, and worst-cases of an algorithm using Big-O notation; Describe the differences between the use of sequential and binary search algorithms. (Computer Science 201)
Background Many journals now require authors share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies are explicit, but remain largely untested. We sought to determine how well authors comply with such policies by requesting data from authors who had published in one of two journals with clear data sharing policies. Methods and Findings We requested data from ten investigators who had published in either PLoS Medicine or PLoS Clinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of the journal's data sharing guidelines. If we did not receive a response to our initial request, a second request was made. Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to share their data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS's explicit requirement that authors share data did not change the reply from the four authors who initially refused. Only one author sent an original data set. Conclusions We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.
We have empirically assessed the distribution of published effect sizes and estimated power by analyzing 26,841 statistical records from 3,801 cognitive neuroscience and psychology papers published recently. The reported median effect size was D = 0.93 (interquartile range: 0.64–1.46) for nominally statistically significant results and D = 0.24 (0.11–0.42) for nonsignificant results. Median power to detect small, medium, and large effects was 0.12, 0.44, and 0.73, reflecting no improvement through the past half-century. This is so because sample sizes have remained small. Assuming similar true effect sizes in both disciplines, power was lower in cognitive neuroscience than in psychology. Journal impact factors negatively correlated with power. Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature. In light of our findings, the recently reported low replication success in psychology is realistic, and worse performance may be expected for cognitive neuroscience.
Effective Research Data Management (RDM) is a key component of research integrity and reproducible research, and its importance is increasingly emphasised by funding bodies, governments, and research institutions around the world. However, many researchers are unfamiliar with RDM best practices, and research support staff are faced with the difficult task of delivering support to researchers across different disciplines and career stages. What strategies can institutions use to solve these problems?
Engaging Researchers with Data Management is an invaluable collection of 24 case studies, drawn from institutions across the globe, that demonstrate clearly and practically how to engage the research community with RDM. These case studies together illustrate the variety of innovative strategies research institutions have developed to engage with their researchers about managing research data. Each study is presented concisely and clearly, highlighting the essential ingredients that led to its success and challenges encountered along the way. By interviewing key staff about their experiences and the organisational context, the authors of this book have created an essential resource for organisations looking to increase engagement with their research communities.
This handbook is a collaboration by research institutions, for research institutions. It aims not only to inspire and engage, but also to help drive cultural change towards better data management. It has been written for anyone interested in RDM, or simply, good research practice.
This paper presents real-world data, a problem statement, and discussion of a common approach to modeling that data, including student responses. In particular, we provide time-series data on the number of boys bedridden due to an outbreak of influenza at an English boarding school and ask students to build a mathematical model, either discrete or continuous, of this epidemic, and to estimate the parameters in their model and validate it against the data. Students will need access to a computer or computer lab with spreadsheet software, a computer algebra system, or a sufficient statistical analysis system such as R.
Psychological science is navigating an unprecedented period of introspection about the credibility and utility of its research. A number of reform initiatives aimed at increasing adoption of transparency and reproducibility-related research practices appear to have been effective in specific contexts; however, their broader, collective impact amidst a wider discussion about research credibility and reproducibility is largely unknown. In the present study, we estimated the prevalence of several transparency and reproducibility-related indicators in the psychology literature published between 2014-2017 by manually assessing these indicators in a random sample of 250 articles. Over half of the articles we examined were publicly available (154/237, 65% [95% confidence interval, 59% to 71%]). However, sharing of important research resources such as materials (26/183, 14% [10% to 19%]), study protocols (0/188, 0% [0% to 1%]), raw data (4/188, 2% [1% to 4%]), and analysis scripts (1/188, 1% [0% to 1%]) was rare. Pre-registration was also uncommon (5/188, 3% [1% to 5%]). Although many articles included a funding disclosure statement (142/228, 62% [56% to 69%]), conflict of interest disclosure statements were less common (88/228, 39% [32% to 45%]). Replication studies were rare (10/188, 5% [3% to 8%]) and few studies were included in systematic reviews (21/183, 11% [8% to 16%]) or meta-analyses (12/183, 7% [4% to 10%]). Overall, the findings suggest that transparent and reproducibility-related research practices are far from routine in psychological science. Future studies can use the present findings as a baseline to assess progress towards increasing the credibility and utility of psychology research.
Students learn how to determine the authority of an information source. They examine different sources of information that all use the same dataset. Students define each source’s type of authority and recognize the context in which the data are being used. They learn to consider the source of authority for various information sources, and understand the ways that information sources with different levels of authority can base their credibility on the same dataset.
This lesson introduces undergraduates to personal digital archiving (PDA) as an instructional bridge to research data management.
PDA is the study of how people organize, maintain, use and share personal digital information in their daily lives. PDA skills closely parallel research data management skills, with the added benefit of being directly relevant to undergraduate students, most of whom manage complex personal digital content on a daily basis.
By teaching PDA, librarians encourage authentic learning experiences that immediately resonate with students' day-to-day activities. Teaching PDA builds a foundation of knowledge that not only helps students manage their personal digital materials, but can be translated into research data management skills that will enhance students' academic and professional careers.
Students will use a stopwatch to time themselves performing in various events, record data, and then compare and order decimals to determine bronze, silver and gold medal winners.
This course provides business students an alternative to the mechanistic view of strategy execution that reframes an organization as a complex network of teams continuously adjusting to market conditions and to other teams. The Flexible Execution Model is introduced consisting of seven elements that together shape how well an organization executes its strategy. Practical tools that help leaders achieve their organizations’ strategic priorities are discussed. The course also explores novel ways to use data including surveys, Glassdoor reviews, and other sources to measure strategy execution and identify what is and is not working.
List of exercises, presentation slides, and poster on research data management and scholarly communication topics by Chealsye Bowley.
This unit will explore the concepts of bias and confirmation bias and how they affect people's presentation and interpretation of data. It includes 5 days of lessons and independent work that culminate in students being able to show what bias and confirmation bias are and how they affect the way we interpret data.
These resources were created to compliment our undergraduate statistics lab manual, Applied Data Analysis in Psychology: Exploring Diversity with Statistics, published by Kendall Hunt publishing company. Like our lab manual, these JASP walk-through guides meaningfully and purposefully integrate and highlight diversity research to teach students how to analyze data in an open-source statistical program. The data sets utilized in these guides are from open-access databases (e.g., Pew Research Center, PLoS One, ICPSR, and more). Guides with step-by-step instructions, including annotated images and examples of how to report findings in APA format, are included for the following statistical tests: independent samples t test, paired samples t test, one-way ANOVA, two factor ANOVA, chi-square test, Pearson correlation, simple regression, and multiple regression.
In this activity, students use real water chemistry data and descriptive statistics in Excel to examine primary productivity in an urban estuary of the Salish Sea. They will consider how actual data do or do not support expected annual trends.
In this activity, students use real water chemistry data and descriptive statistics in Excel to examine primary productivity in an urban estuary of the Salish Sea. They will consider how actual data do or do not support expected annual trends.