Feb. 14, 2022, noon - Feb. 18, 2022, 2 p.m.

DataLab: Data Science and Informatics

This three-part workshop series covers the basics of text mining with Python. We will focus primarily on unstructured text data, discussing how to format and clean text to enable the discovery of significant patterns in collections of documents. Sessions will introduce participants to core terminology in text mining/natural language processing and will walk through different methods of ranking terms and documents. We will conclude by using these methods to classify texts and to build models of "topics." Basic familiarity with Python is required. We welcome students, postdocs, faculty, and staff from a variety of research domains, ranging from health informatics to the humanities. This workshop occurs during UC Love Data Week, and all members of the University of California system are welcome to register.

Workshop dates are February 14, February 16, and February 18, 2022, 12:00 PM–2:00 PM.

Learning Objectives

By the end of this series, you will be able to:


Instructors will distribute a zipped directory of notebooks and files the week prior to the workshop. Participants are required to load this data into their Google Drive account before our first session. We also ask that participants read the first two sections of the workshop reader in advance to prepare for the series.

In addition to this prep work, a basic knowledge of Python is required. Specifically, participants should be able to:

Instructors: Tyler Shoemaker, Carl Stahmer

Instructors’ Biographies

Tyler Shoemaker is a Postdoctoral Scholar at the DataLab, where he develops and implements methods for text analysis and natural language processing across a variety of research projects, ranging from the digital humanities to environmental and health sciences.

Carl Stahmer is a digital humanist.He is the Executive Director of UC Davis DataLab and Professor of English. He leverages his expertise as a computer programmer and system architect to tackle complex problems in the humanities and beyond.


Python; Google Colab (instructors will provide notebooks and data)

