Intermediate R: Cleaning Data and Automating Tasks (2-Part Series)
Jan. 24, 2023, 10 a.m. - Jan. 26, 2023, noon
Organizer -
DataLab: Data Science and Informatics
Contact -
datalab-training@ucdavis.edu
Location -
Zoom
Description: This 2-part workshop series for intermediate R programmers focuses on how to load and prepare data for analysis. First, you'll learn how to screen a data set for potential problems in its structure and data types, as well as how to correct these. Data sets loaded from text files or scraped from the web often have features in text format that need additional processing before they can be used, so the series will include a deep dive into R's "stringr” package for text processing. Dates and times are another kind of data that can be difficult to handle, so the series will explore the basics of using the "lubridate" package for processing temporal data. You'll also learn how to use loops to automate repetitive tasks such as loading and combining similar data sets from many different files.
This series is NOT an introduction to R. The target audience is motivated intermediate to advanced learners from all domains at UC Davis who want to hone their R skills. Please make sure you meet the prerequisites before registering.
Learning Objectives
After completing this workshop series, learners should be able to:
- Inspect data files to determine how best to load them into R
- Identify and convert features to appropriate data types
- Use the "stringr" package clean and extract data from text
- Use regular expressions describe patterns in text
- Use the "lubridate" package to parse dates and times
- Use loops to automate repetitive tasks.
Prerequisites: These workshops are not an introduction to R. Participants are expected to have prior experience using R, be comfortable with basic R syntax, and to have it pre-installed and running on their laptops. This series is appropriate for motivated intermediate to advanced users who want a better understanding of base R. While not required, we recommend learners also attend the prior 2-part "Intermediate R: Thinking in R" workshop series.
Software: Latest version of the R programming language.
Instructor: Nick Ulle
Instructor Bio: Nick Ulle is a statistician and computer scientist. Prior to DataLab he was a visiting assistant professor of Statistics at UC Berkeley, where he designed and taught courses in data science. During his PhD in Statistics at UC Davis, he developed source code analysis techniques for the R programming language. His research interests include statistical computing, programming languages, data visualization, and pedagogy.