Computer Science Thesis Proposal

Friday, December 9, 2016 -
3:00pm to 4:00pm


Reddy Conference Room 4405 Gates & Hillman Centers


NEIL SHAH, Ph.D. Student

For More Information, Contact:

Given the ever-growing prevalence of online social services, usefully leveraging massive datasets has become an increasingly important challenge for businesses and end-users alike. Online services capture a wealth of information about user behavior and platform interactions, such as who-follows-whom relationships in social networks and who-rates-what-and-when relationships in e-commerce networks. Since many of these services rely on data-driven algorithms to recommend relevant content to their users, authenticity of user behavior is paramount to success. But given anonymity on the internet, how do we know which users and actions are real and which are fake? My thesis focuses on this problem and introduces new techniques to discern anomalous and fraudulent behavior in online social graphs. Specifically, I propose to work on three thrusts: plain, dynamic and rich graphs. Firstly, we focus on plain graphs, in which only static connectivity information is known. We detail several proposed algorithms spanning the topics of spectral-based fraud detection in a single graph, blame attribution between graph snapshots, and evaluation of the descriptive power of various graph clustering and partitioning methods in identifying anomalous graph structures. Next, we broaden our scope to dynamic graphs, in which we are able to leverage connectivity information over time. We describe multiple relevant works which describe how to identify and summarize anomalous temporal graph structures, model interarrival time patterns in user queries to find anomalous search behavior, and identify “group” anomalies comprising of users acting in lockstep. Lastly, we expand our reach to rich graphs, in which connectivity information is supplemented by other attributes, such as time, rating, counts, etc. To this end, we propose a principled means for ranking abnormal users by  leveraging statistical patterns in edge attributes. The techniques described in this thesis span various disciplines including machine learning, graph theory, information theory and social science and are practically applicable to a number of real-world fraud and anomaly detection scenarios. They are carefully designed and scale to massive datasets, including social networks, telecommunication networks, e-commerce and collaboration networks with millions of nodes and billions of edges. Thesis Committee: Christos Faloutsos (Chair) Roni Rosenfeld Jaime Carbonell Jiawei Han (University of Illinois, Urbana-Champaign) Copy of Proposal Summary


Thesis Proposal