Catalina Vajiac

Detection and Visualization of Human Sex Trafficking in Online Escort Advertisements

Abstract

Human trafficking (HT) for forced sexual exploitation is incredibly pervasive, affecting an estimated 6.3 million people at any given time. The majority of victims are advertised online, mainly through online escort websites. Practitioners who want to help these victims, including criminologists, social workers, and law enforcement agencies, often manually scroll through these escort websites to try to find leads. This process is incredibly inefficient, as it requires lots of time, and ineffective, as the chance of finding HT while scrolling through ads at random is small. While a few industry tools exist to analyze ads, they use basic techniques such as connecting ads through metadata, and to our knowledge, don't analyze the ad text at all. Furthermore, these tools often have extremely limited functionality, requiring practitioners to export data out of one interface and into another to analyze a possible lead.

In recent years, some HT cases have been found, by chance, through a practitioner discovering that multiple ads have similar text, since many HT cases are part of organized crime groups. These cases are characterized by large numbers of similar looking ads in multiple locations and advertising multiple people, making it unlikely that only one individual posted all the content. These insights can be leveraged to automate lead generation using publicly data so that practitioners can help HT victims more quickly and effectively.

In this thesis, I propose to assist practitioners in identifying potential human trafficking cases by: (1) developing scalable and explainable clustering algorithms based on text and shared locations for finding large organized crime groups in escort ad data, and (2) creating intuitive visualization techniques for presenting the results of these to practitioners. These visualizations include an interface for quick label generation so that we can effectively evaluate our algorithms, as well as novel techniques for exploring connections in metadata throughout time. We also evaluate our algorithms to ensure they are fair across advertised race and ethnicity.

The share of individuals affected by human sex trafficking has only increased throughout time, and the amount of escort ad data has increased exponentially. Through the creation of better algorithms and effective visualizations, we hope to make the laborious task of lead generation much easier for practitioners, so that they can focus on case-building using private data and getting HT victims the help that they need.

Thesis Committee

Keywords

Thesis Document