Computer Science Thesis Oral

Wednesday, June 24, 2015 - 10:00am


8102 Gates & Hillman Centers



For More Information, Contact:

More than one million Internet search engine queries are made every minute, and people are asking an ever-increasing number of queries. Researchers have developed Information Extraction (IE) systems that are able to address some of these queries. Most of these IE systems, however, are designed for batch processing and favor high precision (i.e., few false positives) over high recall (i.e., few false negatives). These IE systems have also been developed to readily evaluate only factoid queries. By contrast, many real-world applications, such as servicing knowledge requests from humans or automated agents, require broad coverage and fast, yet customizable, response times for non-factoid and complex queries. In this thesis, we investigate anytime applications in the form of information extraction tasks initiated as queries from either automated agents or humans. The thesis will introduce new models and approaches for learning to respond to the truth of facts using unstructured web information. In addition to processing unstructured information on the Web, our approaches are able to determine the response to a new query by integrating opinions from multiple knowledge harvesting systems. If a response is desired within a specific time budget (e.g., in less than 2 seconds), then only a subset of these resources can be queried. We propose a new method which learns a policy that chooses which queries to send to which resources, by accommodating varying budget constraints that are available only at query (test) time. We further extend our information validation approaches to automatically measure and incorporate the credibility of different web information sources into their claim validation. We present a novel and integrated approach which, given a set of claims to validate, extracts a set of pro and con arguments from the Web, and jointly estimates the credibility of sources and the correctness of claims. Our approach uses Probabilistic Soft Logic (PSL), resulting in a flexible and principled framework which makes it easy to state and incorporate different forms of prior-knowledge. Finally, we show how our information extraction techniques can be used to provide knowledge to anytime intelligent agents, in particular, for a find-deliver task in a real mobile robot (CoBot) and for a trip planner agent. Thesis Committee:Manuel Blum (Co-Chair)Manuela Veloso (Co-Chair)Tom MitchellCraig Knoblock (USC/ISI)


Thesis Oral