Master of Science in Information Networking Thesis Presentation - Shyamal Vaderia
— 11:30am
Location:
In Person and Virtual - ET
-
Wean Hall 5316 and Remote Call
Speaker:
SHYMAL VADERIA
,
Master's Student, Information Networking Institute, Carnegie Mellon University
https://svaderia.github.io/
Bloom filters are widely used in modern data systems to accelerate membership tests in databases, storage engines, and networking stacks. On modern CPUs, Bloom filter performance is typically limited by memory latency rather than computation: each query issues K seemingly random probes over a bit array that is often much larger than the last-level cache (LLC). Most prior work on making Bloom filters faster has focused on changing the data structure itself. We take a complementary approach and ask how far one can go by changing only the query strategy.
First, we show that reordering the K probes for each query--either by fully sorting them (cache-oblivious) or by partitioning them around a cache-sized pivot (cache-aware)--transforms the aggregate access distribution from uniform to strongly skewed toward low indices, greatly improving cache reuse without modifying the Bloom filter representation. Second, we refine the hash generation process to behave more like a low-discrepancy sequence over the filter, reducing the fraction of queries whose probes all fall outside the LLC-sized region.
Across a range of filter sizes for negative-query workloads, our designs reduce LLC-load misses by up to 8.3x and improve end-to-end runtime by up to 1.45x relative to a conventional Bloom filter, while remaining competitive with specialized cache-aware structures such as split-block Bloom filters.
Thesis Committee
Dave Andersen (Chair)
Michael Kaminsky
In Person and Video Call
For More Information:
svaderia@andrew.cmu.edu