Thesis Oral Defense - Lucio Mwinmaarong Dery

— 6:00pm

In Person and Virtual - ET - Reddy Conference Room, Gates Hillman 4405 and Zoom

LUCIO MWINMAARONG DERY , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University

On Resource Efficient Transfer Learning via End Task Aware Training

In Transfer learning, performance on a desired end task (or tasks) is improved by exploiting "knowledge" from other tasks. The technique has become a critical workhorse driving many of the advances in machine learning. The current formula is relatively simple -- train a large model on large amounts of data from the transfer task(s); then apply the learned model either zero-shot or adapted to the desired downstream task(s). 

This thesis recognizes that these powerful models are not developed in-vacuo but rather require non-trivial resources to train and deploy. As such, there are a wide range of salient problems and communities of researchers that the status-quo leaves behind. In the first part of this thesis, we will focus on the training time problem of data-efficient transfer learning. We will begin by making a case for exploiting advanced knowledge of the desired downstream task(s) — which is commonly the case in many ML settings — to inform different dimensions of transfer learning. We dub this end task aware transfer learning. Next, we will present a set of novel end task aware optimization algorithms that bias the learning trajectory towards data-efficient solutions with strong generalization on the end task. We will close this part by providing an automated approach to constructing and searching over task-relevant transfer objectives when only end task data is available and in limited amounts. 

We will proceed to develop algorithms for compute and memory efficient transfer learning. Our goal will be to deliver a small and efficient yet performant task specific model for deployment seeded from a large, generalist model that has already been pre-trained on a transfer task (or set of tasks). Focusing on structured pruning for making models smaller, we will investigate pruning under two resource constrained settings:

  1. limited task data, where we will exploit extra transfer tasks to learn pruning structures that, at the same task performance, lead to more compute and memory efficient models
  2. settings of limited memory, where many of the classical pruning techniques break down because they require gradient-based optimization which can have prohibitive memory overhead.

Thesis Committee: 

Graham Neubig (Co-Chair)
Ameet Talwalkar (Co-Chair)
Zico Kolter
Luke Zettlemoyer (University of Washington / Meta)
Marc'Aurelio Ranzato  (Google DeepMind)

In Person and Zoom Participation. See announcement.

Meeting ID:  797 670 0891
Passcode:  ldm-thesis

Event Website:

Add event to Google
Add event to iCal