Database Seminar

Monday, April 4, 2016 - 4:45pm

Location:

8102 Gates & Hillman Centers

Speaker:

SRIJAN KUMAR, Ph.D. Student http://cs.umd.edu/~srijan/

Event Website:

http://db.cs.cmu.edu/events/db-seminar-spring-2016-srijan-kumar

For More Information, Contact:

epapalex@cs.cmu.edu

The web enables transmission of knowledge at a speed and breadth unprecedented in human history, which has had tremendous positive impact on the lives of billions of people. While benign users try to keep the web safe and usable, malicious users add and spread harmful content, manipulate information and twist things in their favor. Having malicious users and their content questions the usefulness, credibility and safety of web platforms. In this talk, we will discuss general graph mining and user behavior modeling techniques to detect malicious acts and actors on the web. First, we will discuss an unsupervised decluttering algorithm that iteratively removes suspicious edges from the network, which trolls use to masquerade themselves as legitimate users. This algorithm is faster and has twice the accuracy compared to existing techniques. Second, we develop behavior-modeling techniques to identify malicious editors on Wikipedia, called vandals, who add unconstructive information. We build the first vandal early warning system that models the editing behavior patterns of benign editors and vandal editors, based on the relation between the edited pages, temporal dependency between edits, and other editing attributes. Finally, we combine both graph and behavior modeling techniques to develop the first system to identify hoax articles on Wikipedia – fabricated articles that are purposefully created to misguide others. It leverages content, hyperlink network and editor attributes, and achieves an accuracy of 92% while humans perform at 66% accuracy. We find that the added advantage is due to network and editor features – meaning faking the content of articles is easy, while faking its importance on Wikipedia is not.

Keywords:

Seminar Series