Artificial Intelligence Seminar

— 1:00pm

Location:
ASA Conference Room 6115 - Gates and Hillman Centers

Speaker:
SACHIN GOYAL , Ph.D. Student, Machine Learning Department, Carnegie Mellon University
https://saching007.github.io/

Think before you speak: Training Language Models With Pause Tokens

Transformer-based language models generate responses by producing a series of tokens in immediate succession: the (K + 1)th token is an outcome of manipulating K hidden vectors per layer, one vector per preceding token. What if instead we were to let the model manipulate say, K + 10 hidden vectors, before it outputs the (K + 1)th token?' In this talk, we will discuss how we can teach language models to use additional tokens (say pause tokens) to its advantage. Can the language model use these extra tokens for processing extra computations before committing to an answer. We will specifically explore if this can be done just by just finetuning an off-the-shelf language model or if it is necessary to pretrain from scratch to elicit such new behaviours. Finally, we will discuss a range of conceptual and practical future research questions raised by our work, spanning new notions of representation capacity beyond the parametric count and making delayed next-token prediction a widely applicable paradigm. 

— 

Sachin Goyal is a PhD student in the Machine Learning Department at CMU. He works on improving pretraining and robust finetuning for foundation models. 

The AI Seminar is sponsored by SambaNova Systems

In Person and Zoom Participation.  See announcement.

Event Website:
http://www.cs.cmu.edu/~aiseminar/

For More Information:
In Person and Virtual - ET


Add event to Google
Add event to iCal