Computer Science Thesis Oral

Wednesday, June 17, 2015 - 2:00pm to 4:00pm


8102 Gates & Hillman Centers



This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise intractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of statistical analysis. We develop interactive learning algorithms for three unsupervised problems: subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and computational improvements over non-interactive approaches. An over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets, where non-uniformity is quantified differently in each setting. In this talk, I will focus on interactive similarity-based hierarchical clustering. We propose an interactive procedure for recovering a clustering from a small number of carefully selected similarity measurements. The algorithm exploits non-uniformity of cluster size, using few measurements to recover larger clusters and focusing measurements on the smaller structures. In addition to coming with strong statistical and computational guarantees, this algorithm performs well in practice. I will also describe new results on a classical structure discover problem, where we focus on show fundamental limits for non-interactive approaches. Our main result is a precise characterization of the performance of non-interactive approaches, which shows that, on particular problems, all non-interactive approaches are statistically weaker than a simple interactive one. In particular, these results show that, on the hierarchical clustering problem, no non-interactive algorithm can achieve the statistical performance our interactive approach. Thesis Committee: Aarti Singh (Chair) Nina Balcan Barnabas Poczos Larry Wasserman Sanjoy Dasgupta (University of California, San Diego) John Langford (Microsoft Research)

For More Information, Contact:


Thesis Oral