Computer Science Thesis Oral

— 5:00pm

Location:
In Person and Virtual - ET - ASA Conference Room, Gates Hillman 6115 and Zoom

Speaker:
JIELIN QIU , Ph.D. Candidate, Computer Science Department, Carnegie Mellon University
https://www.cs.cmu.edu/~jielinq/

On the Alignment, Robustness, and Generalizability of Multimodal Learning

Multimodal intelligence, where AI systems exhibit intelligent behaviors by leveraging data from multiple modalities, has emerged as a key concept in today's data-driven era. This cross-modal approach finds diverse applications and transformative potential across industries. By fusing heterogeneous data streams, multimodal AI generates representations more akin to human-like intelligence than traditional unimodal techniques. 

In this thesis, we aim to advance the field of multimodal intelligence by focusing on three crucial dimensions: multimodal alignment, robustness, and generalizability. By introducing new approaches and methods, we aim to improve the performance, robustness, and interpretability of multimodal models in practical applications. In this thesis, we address these critical questions:

(1)  How do we explore the inner semantic alignment between different domains? How can the learned alignment help advance multimodal applications?

(2)  How robust are the multimodal models? How can we improve the models' robustness in real-world applications?

(3) How do we generalize the knowledge of one learned domain to another unlearned domain?

In essence, this thesis seeks to propel the field of multimodal AI forward by enhancing alignment, robustness, and generalizability, thus paving the way for more sophisticated and efficient multimodal AI systems.

Thesis Committee: 

Christos Faloutsos (Co-chair)
Lei Li (Co-chair)
Yonatan Bisk
William Wang (University of California, Santa Barbara)


In Person and Zoom Participation. See announcement.



Add event to Google
Add event to iCal