Saurabh Kadekodi

Senior Research Scientist (Google)

Publications

Projects

FastCommit: resource-efficient, performant and cost-effective file system journaling

FastCommit is a hybrid journaling approach for Ext4 which performs logical journaling for simple and frequent file system modifications, while relying on JBD2 for more complex and rare modifications. Key design elements of FastCommit are compact logging, selective flushing and inline journaling.

Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples

The Thesios methodology is to create accurate and representative I/O traces. This is done by combining down-sampled I/O traces (which are routinely collected in Google's data centers) from multiple disks across multiple storage servers.

Practical Design Considerations for Wide Locally Recoverable Codes (LRCs)

A practically-minded analysis of several popular and novel LRCs. Wide LRC reliability is a subtle phenomenon that is sensitive to several design choices, some of which are overlooked by theoreticians, and others by practitioners. Based on these insights, we construct novel LRCs called Uniform Cauchy LRCs, which show excellent performance in simulations, and a 33% improvement in reliability on unavailability events observed by a wide LRC deployed in a Google storage cluster. We also show that these codes are easy to deploy in a manner that improves their robustness to common maintenance events.

DARE: Disk-Adaptive Redundancy

Established the case for performing disk-adaptive redundancy where data redundancy is tailored to observed disk failure rate heterogeneity. Based on analysis of over 5.3 million disks spanning 65 makes/models from production environments of Google, NetApp and Backblaze, we designed the first two DARE systems: HeART and Pacemaker which provided over approximately 15–20% space savings in large-scale storage clusters while never compromising on reliability.

Geriatrix: File system aging suite

Designed and developed an efficient and reproducible file system aging tool to encourage realistic and fair and responsible benchmarking. Geriatrix takes as input the file age, file size and directory depth distributions from already aged file system images and performs a controlled sequence of file creations and deletions to age the intended file system to mimic the characteristics of the reference file system image. Geriatrix is open source with 8 built-in aging profiles.

Packing in cloud file systems

Augmented a cloud file system’s write-back cache with a packing and indexing layer that coalesces small files to transform arbitrary user workload(s) to a write pattern more ideal for cloud storage in terms of — transfer sizes, number of objects and price. The result is a >60x improvement in performance and >25000x reduction in cloud storage price.

SMRfs

Conducted under the guidance of Prof. Garth Gibson, this research aimed at ways to minimize the size of unshingled partitions (typically used for frequently updated data viz. metadata, small files, etc.) on shingled disks. It also involved the analysis and partial implementation of two cleaning algorithms originating from log structured file systems on SMRfs.

myFTL

Designed and implemented a flash translation layer (FTL) with block-mapping, garbage collection (with four policies) and wear-leveling in FlashSim (an FTL simulation software). This project was enhanced and released as a course project for a 70+ student graduate level storage systems course (15-746) at CMU.

Space Maps in Ext4

Designed and developed an extent-based free-space management technique for the Ext4 filesystem, called Space Maps, along with an allocator that uses Space Maps for disk-space allocation. Consisting of a red black tree and a log, Space Maps enhanced the allocation speed by 30% and deallocation speed by 80% and aided in reducing file and free space fragmentation.

SSD over Infiniband

This study compared the performance between a locally connected SSD and remotely connected SSD (over infiniband). Using the lightweght SCSI RDMA protocol (SRP) for communication, we analyzed the costs in accessing remote SSDs and gained insights into enhancing software architectures of next-gen data centers from the storage perspective.

Price of Ext4

Under the guidance of Prof. Remzi Arpaci-Dusseau this study measured the software overhead of the Ext4 file system with the advent of storage devices with microsecond latencies. We threw light on the shifting of bottlenecks in the various submodules of Ext4 and suggested optimizations to make it future-proof.

Database Garbage Collection

Designed, developed and evaluated a co-operative (i.e. not stop-the-world) multi-threaded, lock-free, epoch based garbage collection mechanism for Peloton, a hybrid in-memory database system. Explored tradeoffs between optimizing for average latency versus tail latency due to absence of dedicated garbage collection thread.

Checkpoint Compression

Studied the hazy nature of compression algorithms used in checkpoint / restore systems, and went on to suggest possible enhancements and future directions in library-level checkpoint compression for faster and more efficient checkpointing with reduced disk footprint.

Active Databases

Implemented a proof-of-concept of decentralized active databases on top of Kademlia - a distributed hash table on a decentralized peer-to-peer network. Active Databases essentially mean event-driven databases following event-condition-action (ECA) rules.

VM Co-Migration

Designed and developed a UDP based VM migration module in Palacios - an OS independent embeddable VM monitor. It supported multiple-source multiple-destination migrations specifically aimed at distributed applications in HPC environments (viz. supercomputers) to exploit page-sharing among participating nodes giving increased parallelism for migration.

DNA Compression

Explored a run-length based preprocessing scheme exploiting the power-law behavior of genomic data to reveal possibilities of Markovian compression and variable length encoding algorithms for higher compression ratio than provided by existing dictionary based compression algorithms.

NIC of Time

Designed and developed a tool for exploring the state space of all possible combinations of offloaded functionalities on the NIC vs their presence in the kernel. The tool performs extensive analysis of throughput and CPU utilization to suggest one or a group of features that should be offloaded to the NIC depending on the particular workload under consideration.

Education

  • Ph.D. in Computer Science
    Carnegie Mellon University
    2014 - 2020
  • M.S. in Computer Science
    Northwestern University
    2012 - 2013
  • B.E. in Computer Science
    Pune Institute of Computer Technology
    2005 - 2009

Resume

Art Circle