Natural Language Processing with Python (3-part series)
May 23, 2022, 10 a.m. - May 27, 2022, noon
Organizer -
DataLab: Data Science and Informatics
Contact -
datalab-training@ucdavis.edu
Location -
Zoom
Description
This three-part workshop series introduces participants to natural language processing (NLP) with Python. It builds on our text mining series, "Getting Started with Textual Data," by extending the scope of data-inflected text analysis to include various methods of modeling meaning. Sessions will cover NLP topics ranging from segmentation and dependency parsing to sentiment analysis and context-sensitive modeling. We will also discuss how to implement such methods for tasks like classification. Basic familiarity with analyzing textual data in Python is required. We welcome students, postdocs, faculty, and staff from a variety of research domains, ranging from health informatics to the humanities.
Workshop dates are May 23, May 25, and May 27, 2022, 10:00 AM – 12:00 PM.
Learning Objectives
By the end of this series, you will be able to:
- Use popular NLP frameworks in Python, including Gensim and spaCy
- Explain key concepts and terminology in NLP, including dependency parsing, named entity recognition, and word embedding
- Process texts to glean information about sentiment, subject, and style
- Classify texts on the basis of their features
- Produce models of word meanings from a corpus
- Perform a few core NLP tasks including keyword analysis, relation extraction, document similarity analysis, and text summarization.
Prerequisites
Instructors will distribute a zipped directory of notebooks and files the week prior to the workshop. Participants are required to load this data into their Google Drive account before our first session.
In addition to this prep work, a basic knowledge of working with textual data in Python is required. Specifically, participants should attend DataLab's 3-part "Getting Started Working with Textual Data" workshop series and be able to do the following with Python:
- Load text data into Python
- Perform basic text cleaning actions
- Generate data structures like document-term matrices
- Conduct preliminary counting processes on corpora
Software: Python; Google Colab (instructors will provide notebooks and data).
Instructors: Tyler Shoemaker, Carl Stahmer
Instructors’ Biographies
Tyler Shoemaker is a Postdoctoral Scholar at DataLab, where he develops and implements methods for text analysis and natural language processing across a variety of research projects, ranging from the digital humanities to environmental and health sciences.
Carl Stahmer is a digital humanist. He is the Executive Director of UC Davis DataLab and Professor of English. He leverages his expertise as a computer programmer and system architect to tackle complex problems in the humanities and beyond.