Model Validation for Applied Data Science

Nov. 19, 2021, noon - Nov. 19, 2021, 2 p.m.

Organizer -

DataLab: Data Science and Informatics

Contact -

datalab-training@ucdavis.edu

Location -

Zoom

Description

In this workshop, we will discuss the basics of creating, comparing, and validating predictive models using a case study from the health sciences. We will demonstrate categorical prediction with logistic regression, and numerical predictions with a regression tree approach. We will calculate measurements of accuracy that are applicable to the different types of models, and use cross-validation to find the model parameters that generate the best predictions. Finally, we will interpret the results for insights about the real-world process being modeled. While this workshop features working with health data, the conceptual framework and principles discussed should be generalizable to research in other domains.

Learning Objectives

- Fit a logistic regression model

- Fit a random forest model

- Use cross-validation to tune model parameters

- Estimate the accuracy of predictions for future data

- Interpret model parameters.

Prerequisites

This workshop is open to learners at all levels, but prior experience with R is required in order to fully participate in this interactive, hands-on workshop.

Software

Please follow the DataLab install guides (https://datalab.ucdavis.edu/install-guide/) to install R and RStudio before the workshop. DataLab office hours are held via Zoom and in-person on Wednesdays from 1:30pm–3:00pm. Prior to the workshop, drop by office hours if you need help troubleshooting the installations. See the https://datalab.ucdavis.edu/office-hours/ for details.

Instructors: Wesley Brooks, Vladimir Filkov

Instructors’ Biographies

Wesley Brooks holds a Statistics Ph.D. from the University of Wisconsin. He works at the DataLab as a Data Scientist.

Vladimir Filkov is a Professor of Computer Science and DataLab's director for translational data science and leads the Health Data Science and Systems research and learning cluster.