Computer Science Thesis Oral

Wednesday, September 2, 2015 - 1:00pm

Location:

8102 Gates & Hillman Centers

Speaker:

YAIR MOSHOVITZ-ATTIAS, Ph.D. Student http://www.cs.cmu.edu/~ymovshov/

For More Information, Contact:

deb@cs.cmu.edu

In this thesis we demonstrate the benefits of automated labeled dataset creation for finegrained visual learning tasks. Specifically, we show that utilizing real-world, non-image information is a convenient way to reduce the human effort needed for building large scale datasets. Computer vision has seen great advances in recent years in a number of complex tasks, such as scene classification, object detection, and image segmentation. A key ingredient in such success stories is the use of large amounts of labeled data. In many cases, the limiting factor is the ability to create these training sets. Issues arise in three forms: (1) The act of labeling the data can be hard for human annotators, (2) in some cases it is hard to get a representative sample of the feature space, and (3) data for infrequent (yet potentially important) instances can be completely absent from the training set. Business storefront classification is an example of (1). The number of possible labels is large, and assigning all relevant labels to an image is a time consuming task for annotators. Moreover, when the image contains a business from a country other than their own, annotators can get confused by the foreign language and produce erroneous labels. Annotators are also not consistent in their categorization of businesses into categories. In vehicle viewpoint estimation, the images themselves are hard to come by. Getting sample images of all viewpoints is hard due to bias in the way people photograph cars. Current datasets for this task lack data for many viewpoints. In addition, the labeling task is hard for annotator. We address these issues by adding automation to the dataset creation process. Our approach is to utilize external information by matching the images to real world concepts. In the case of businesses, when images are mapped to an ontology of geographical entities, we are able to extract multiple relevant labels per image. For the viewpoint estimation problem, by using 3D CAD models we can render images in the desired viewpoint resolution, and assign precise labels to them. We provide a systematic examination of the rendering process, and conclude that render quality is key for training accurate models. Thesis Committee:Yaser Sheikh (Co-Chair)Takeo Kanade (Co-Chair)Abhinav GuptaLeonid Sigal (Disney Research)Trevor Darrell (University of California Berkeley)

Keywords:

Thesis Oral