Computer Science Thesis Proposal

— 2:30pm

Location:
In Person - Reddy Conference Room, Gates Hllman 4405

Speaker:
SARA McALLISTER , Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://saramcallister.github.io/

Efficient and Sustainable Data Retrieval at Scale

Datacenters are projected to account for 33% of the global carbon emissions by 2050. As datacenters increasingly rely on renewable energy for power, the majority of datacenter emissions will be embodied — emissions from lifecycle stages including acquiring raw materials, manufacturing, transportation, and disposal. To reach the ambitious emission reduction goals set by both companies and governments, datacenters need to reduce emissions throughout their operations, including (and particularly relevant for this thesis) the storage system. Unfortunately, while data storage and retrieval systems are large contributors to embodied emissions, reducing their embodied emissions have largely been overlooked. 

This thesis aims to address reducing emissions in data retrieval for large-scale storage systems. These systems can reduce their carbon footprint by enabling storage devices to have longer lifetimes and use denser media. However, storage hardware’s IO limits combined with software’s unnecessary additional IO often severely restrict emission reductions, or at worst cause increased emissions. Thus, this thesis focuses on reducing IO in several parts of the storage stack to enable efficient and sustainable data retrieval. 

First, this proposal addresses the efficiency and sustainability of flash caching, a critical layer in datacenter storage systems that is limited by flash write endurance. This improvement will be realized in two caching systems: Kangaroo and FairyWREN. Together, these caches dramatically reduce writes by over 28x, allowing flash devices to use denser flash for longer lifetimes, ultimately reducing emissions. Then, this thesis will discuss proposed work to enable more sustainable bulk storage, where bandwidth limitations prevent deployment of denser HDDs. I propose a new IO interface, Declarative IO, that empowers the storage system to eliminate duplicate IO accesses through exposing the time- and order-flexibility in maintenance tasks. This work will enable using larger HDDs, further reducing emissions from storage systems. 

Thesis Committee: Nathan Beckmann (Co-Chair) Gregory R. Ganger (Co-Chair) George Amvrosiadis Daniel S. Berger (Microsoft Azure/University of Washington) Margo Seltzer (University of British Columbia) Additional Information

Event Website:
https://www.cs.cmu.edu/afs/cs.cmu.edu/Web/Posters/CSProposal-SaraMcAllister24.pdf


Add event to Google
Add event to iCal