Subjects

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

Applied regression in R - ASGV01002

Title:	Aplikovaná regrese v R
Guaranteed by:	Department of Sociology (21-KSOC)
Faculty:	Faculty of Arts
Actual:	from 2023 to 2023
Semester:	summer
Points:	0
E-Credits:	3
Examination process:	summer s.:
Hours per week, examination:	summer s.:2/0, C [HT]
Capacity:	unlimited / unknown (unknown)
Min. number of students:	unlimited
4EU+:	no
Virtual mobility / capacity:	no
Key competences:
State of the course:	taught
Language:	Czech
Teaching methods:	full-time
Teaching methods:	full-time
Level:
Note:	course can be enrolled in outside the study plan enabled for web enrollment

Guarantor:	Mgr. Aleš Vomáčka
Teacher(s):	Mgr. Aleš Vomáčka

SS schedule Noticeboard

Annotation -

Last update: Mgr. Aleš Vomáčka (08.02.2024)

The goal of this course is to introduce students to linear regression analysis with an emphasis on application in the programming language R. The emphasis is primarily on conceptual understanding of statistical modeling, intuitive interpretation/visualization of results, and evaluation of the quality of the analysis. The first half of the course introduces tools for creating and interpreting regression models. In the second half of the course, we will discuss what the assumptions of linear regression do, what they are for, and what to do when our model does not meet them. In addition to good practice, we'll also review common mistakes and how to avoid them.

Graduates of the course will be able to perform statistical analysis using linear regression from start to finish - from selecting variables to analyze, to building and checking the model, to interpreting and visualizing it. Above all, they will gain the knowledge necessary to defend the decisions they make in statistical data analysis. Not only will they be able to defend the conclusions of their analyses to an audience, but they will (hopefully) increase their confidence in their own analytical abilities.

The course assumes a basic understanding of statistics (at the level of the Statistics 2 course) and the R programming language (at the level of Introduction to Data Analysis in R).

Course completion requirements -

Last update: Mgr. Petra Poncarová (18.05.2023)

To succesfuly complete this course, students are required to do the following:

Pick a dataset featured on the TidyTuesday project (any year).

Formulate a research problem related to the data. This research problem can be either predictive or inferential in nature (e.g. Can we predict the popularity of a song on Spotify based on its characteristics? Does the gender wage gap in the US depend on the proportion of women in the field? Are more expensive video games rated better?).

Analyze the data using a linear regression model and write a report on your findings. This report should include clear definition of your research problems, description of your data (including descriptive statistics), description of your regression model (both tables and graphs where appropriate), diagnostics of your regression model and overall conclusion. You can transform and filter data as necessary, but clearly describe all data transformations.
Prepare two documents for submission: (1) a script which must be fully operational: it has to run without error from start (including downloading data from TidyTuesday website) to finish without any need for outside interference and produce all analytic outputs (models, charts) used for the assignment, (2) final report (e.g. Word or Pdf) as described above. If you get stuck don’t be afraid to ask for a consultation.

Literature - Czech

Last update: Mgr. Petra Poncarová (18.05.2023)

Primary literature

Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and Other Stories. Cambridge University Press. https://doi.org/10.1017/9781139161879
Harrell, F. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer-Verlag. https://doi.org/10.1007/978-1-4757-3462-1

Secondary literature

Cole, S. R., Platt, R. W., Schisterman, E. F., Chu, H., Westreich, D., Richardson, D., & Poole, C. (2010). Illustrating bias due to conditioning on a collider. International Journal of Epidemiology, 39(2), 417–420. https://doi.org/10.1093/ije/dyp334
Cook, R. D. (1977). Detection of Influential Observation in Linear Regression. Technometrics, 19(1), 15–18. https://doi.org/10.2307/1268249
Fox, J. (2015). Applied Regression Analysis and Generalized Linear Models (Third edition). SAGE Publications, Inc.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
King, G., & Roberts, M. E. (2015). How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It. Political Analysis, 23(2), 159–179.
Shmueli, G. (2010). To Explain or To Predict? (SSRN Scholarly Paper ID 1351252). Social Science Research Network. https://doi.org/10.2139/ssrn.1351252