Special Topic: Generative AI for Music and Audio

Course ID 15798

Description In this seminar class, we will discuss state-of-the-art methods in generative AI for music and general audio (everyday sounds, speech, bioacoustics, etc.), with applications to both generation and understanding. We will examine and compare the two primary families of methods that are used in modern audio generation research: large language models applied to discrete audio tokens, and diffusion models applied to continuous audio representations. With an eye towards offering intuitive controls for music generation, we will also examine classic methods and tasks in music information retrieval such as spectral analysis, synchronization, beat detection, and transcription. Moreover, we will explore emerging topics in generative AI for music and audio such as new architectures, training data attribution, interaction, compression, multimodality, and evaluation. Finally, we will discuss the ethical and societal implications of music generation specifically, and its potential effects on music both economically and culturally. Much of the course activity will center around (1) in-class lectures and demonstrations on small scale datasets, (2) student-led discussions of research papers, and (3) an open-ended research project.

Key Topics
Generative AI, music generation, audio generation, language models, diffusion models, music information retrieval, multimodal learning

Required Background Knowledge
Solid math skills ("just okay" is fine), strong programming skills, understanding of probability, at least some musical background, some previous exposure to AI / ML

Course Relevance
- Upper division CS undergrads, ideally those who have taken 15322 Intro to Computer Music
- Masters students across departments (music technology, CS, ECE)
- SCS PhD students

Course Goals
Students should emerge from the course with a better understanding of the following aspects of generative AI for music and audio: (1) the modern research landscape, (2) domain-specific considerations, including understanding of basic audio signal processing, (3) classical techniques in music information retrieval, (4) research frontiers in generative AI for music and audio.

Learning Resources
Textbooks: Meinard Muller Fundamentals of Music Processing, research papers
Software: Python, Google Colab, Numpy

Assessment Structure
Homeworks: 25%
Reading reflections: 25%
In-class reading presentations: 15%
In-class participation: 10%
Final project: 25%

Extra Time Commitment
n/a