Database Seminar

Wednesday, October 12, 2016 - 12:00pm to 1:00pm

Location:

Gates & Hillman Centers

Speaker:

SHAI BEN-DAVID, Professor https://cs.uwaterloo.ca/~shai/

For More Information, Contact:

crwhite@cs.cmu.edu

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays. However, unlike supervised learning, our current theoretical understanding of those tasks, and in particular of clustering, is very rudimentary. Although hundreds of clustering papers are being published every year, there is hardly any work reasoning about clustering independently of any particular algorithm, objective function, or generative data model. My talk will focus on such clustering research. I will discuss two aspects in which theory could play a significant role in guiding the use of clustering tools. The first is model selection - how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, in clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, there currently exist no methodical guidance for clustering tool selection for a given clustering task. I will describe some recent proposals aiming to address this crucial lacuna. The second aspect of clustering that I will address is the complexity of computing a cost minimizing clustering (given some clustering objective function). Once a clustering model (or objective) has been picked, the task becomes an optimization problem. While most of the clustering objective optimization problems are computationally infeasible, they are being carried out routinely in practice. This theory-practice gap has attracted significant research attention recently. I will describe some of the theoretical attempts to address this gap and discuss how close they bring us to a satisfactory understanding of the computational resources needed for achieving good clustering solutions. — Shai Ben-David is a professor in the School of Computer Science at the University of Waterloo. His research interests span a wide spectrum of topics in the foundations of computer science and its applications, with a particular emphasis on statistical and computational machine learning. Sponsored  by Yahoo! Labs and the Simons Foundation.

Keywords:

Seminar Series