5th Year Master's Thesis Presentation - Mihir Khare

— 1:30pm

Location:
In Person - Reddy Conference Room, Gates Hillman 4405

Speaker:
MIHIR KHARE , Master's Student, Computer Science Department, Carnegie Mellon University
https://www.linkedin.com/in/mihirkhare

Filters are not One-Size-Fits-All: Survey and Evaluation of Join Pre-Filtering in Database Management Systems

The current era of data storage is defined by the widespread adoption of data lakes, and the disaggregation of storage and compute hardware. Modern database management systems (DBMSs) are often operating on large volumes of data stored in object stores (like Amazon's S3), unmanaged file formats (like Apache's Parquet), or otherwise have outdated or nonexistent statistics. In join-heavy analytical workloads, the traditional approach of optimizing query plans to minimize the cost of joins breaks down if the available information to estimate cardinalities and costs is inaccurate. In recent years, a class of techniques known as ``join pre-filtering'' has gained focus as an attempt to reduce the reliance on a good optimizer for minimizing join costs by reducing the inputs of joins to the minimum set of tuples needed to produce the output. This thesis explores the current state-of-the-art in pre-filtering, and concludes that more work must be done to create a strategy that is applicable across a wide range of workloads. First, we provide an overview of the fundamental concepts of pre-filtering, and describe the key design decisions of and differences between currently studied techniques. We then evaluate two pre-filtering methods, min-max filters and Bloom filters, using a modified implementation of Dynamic Predicate Transfer (RPT+), a leading contemporary technique for join pre-filtering. Our analysis focuses on the performance interactions between these filter types. Finally, we discuss the implications of the results for modern DBMSs, and future directions of study in this area.

Thesis Committee
Jignesh Patel (Chair)
Andy Pavlo

Additional Information 
 

For More Information:
amalloy@cs.cmu.edu


Add event to Google
Add event to iCal