Computer Science Thesis Oral

Wednesday, May 25, 2016 - 1:00pm to 2:30pm

Location:

Traffic 21 Classroom 6501 Gates & Hillman Centers

Speaker:

EVANGELOS E. PAPALEXAKIS, Ph.D. Student http://www.cs.cmu.edu/~epapalex/

For More Information, Contact:

deb@cs.cmu.edu

What does a person’s brain activity look like when they read the word apple? How does it differ from the activity of the same (or even a different person) when reading about an airplane? How can we identify parts of the human brain that are active for different semantic concepts? On a seemingly unrelated setting, how can we model and mine the knowledge on web (e.g., subject-verb-object triplets), in order to find hidden emerging patterns? Our proposed answer to both problems (and many more) is through bridging signal processing and large-scale multi-aspect data mining. Specifically, language in the brain, along with many other real-word processes and phenomena, have different aspects, such as the various semantic stimuli of the brain activity (apple or airplane), the particular person whose activity we analyze, and the measurement technique. In the above example, the brain regions with high activation for “apple” will likely differ from the ones for “airplane”. Nevertheless, each aspect of the activity is a signal of the same underlying physical phenomenon: language understanding in the human brain. Taking into account all aspects of brain activity results in more accurate models that can drive scientific discovery (e.g, identifying semantically coherent brain regions). In addition to the above Neurosemantics application, multi-aspect data appear in numerous applications such as mining knowledge on the web, where different aspects in the data include entities in a knowledge base and the links between them or search engine results for those entities, and multi-aspect graph mining, with the example of multi-view social networks, where we observe social interactions of people under different means of communication, and we use all views/aspects of the communication to extract communities more accurately. The main thesis of our work is that many real-world problems, such as the aforementioned, benefit from jointly modeling and analyzing the multi-aspect data associated with the underlying phenomenon we seek to uncover. In this thesis we develop scalable and interpretable algorithms for mining big multi-aspect data, with emphasis on tensor decomposition. We present algorithmic advances on scaling up and parallelizing tensor decomposition and assessing the quality of the results multi-aspect data applications, that have enabled the analysis of multi-aspect data using tensors that the state-of-the-art was unable to operate on. Furthermore, we present results on multi-aspect data applications focusing on Neurosemantics and Social Networks and the Web, demonstrating the effectiveness of modeling and mining multiple aspects of the data. We conclude with our future vision on bridging Signal Processing and Data Science for real-world applications and with concrete future directions on multi-aspect data mining algorithms and applications. Thesis Committee: Christos Faloutsos (Chair) Tom Mitchell Jeff Schneider Nikos Sidiropoulos (University of Minnesota)

Keywords:

Thesis Oral