Database Seminar

Monday, September 21, 2015 - 5:00pm


8102 Gates & Hillman Centers



Websites tend to be topically cohesive, that is, webpages in a website often talk about a few related topics even though the web at large may itself have a completely different topical distribution. The research question which then arises is: can a model of the website topic distribution help improve the classification of webpages? In this talk, I will describe an approach which models topic cohesion and uses it in a manner that allows it to: (i) scale well with the size of the web and (ii) be integrated easily with any underlying content-only classifier. Though this paper is set in the context of webpage classification, cohesion itself is a very generic phenomenon: for example, it is highly likely that a product introduced by a seller known to sell good products would be good. I'd like to compare this idea to belief propagation and get opinions on how it could feed into our approach to fraud detection in online reviews. Link to paper Here


