Optimizing Distributed Protocols with Query Rewrites
David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky E. Katanahas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, Heidi Howard
Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. Manual rule-driven applications of decoupling and partitioning improve the throughput of 2PC by 5x and Paxos by 3x, and match state-of-the-art throughput in recent work.
Shadaj Laddad*, Conor Power*, Mae Milano, Alvin Cheung, Natacha Crooks, Joseph M. Hellerstein
CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem.
Towards Auto-Generated Data Systems
Alvin Cheung, Maaz Bin Safeer Ahmad, Brandon Haynes, Chanwut Kittivorawong, Shadaj Laddad, Xiaoxuan Liu, Chenglong Wang, Cong Yan
After decades of progress, database management systems (DBMSs) are the backbones of applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we argue that automation techniques must address three aspects: user specification, code generation, and result validation.
Optimizing Stateful Dataflow with Local Rewrites
Shadaj Laddad, Conor Power, Tyler Hou, Alvin Cheung, Joseph M. Hellerstein
Optimizing a stateful dataflow language is a challenging task. There are strict correctness constraints for preserving properties expected by downstream consumers, a large space of possible optimizations, and complex analyses that must reason about the behavior of the program over time. But with e-graphs, we can dramatically simplify the process of building a correct optimizer while yielding more consistent results! In this short paper, we discuss our early work using e-graphs to develop an optimizer for the Hydroflow dataflow language.
Initial Steps Toward a Compiler for Distributed Programs
Joseph M. Hellerstein, Shadaj Laddad, Mae Milano, Conor Power, Mingwei Samuel
In the Hydro project we are designing a compiler toolkit that can optimize for the concerns of distributed systems, including scale-up and scale-down, availability, and consistency of outcomes across replicas. This invited paper overviews the project, and provides an early walk-through of the kind of optimization that is possible. We illustrate how type transformations as well as local program transformations can combine, step by step, to convert a singlenode program into a variety of distributed design points that offer the same semantics with different performance and deployment characteristics.
Code Transpilation for Hardware Accelerators
Yuto Nishida, Sahil Bhatia, Shadaj Laddad, Hasan Genc, Yakun Sophia Shao, Alvin Cheung
Hardware accelerators have proven to be very effective in optimizing computationally expensive workloads. In this paper, we propose a solution to the challenge of manually rewriting legacy or unoptimized code in domain-specific languages and hardware accelerators. We introduce an approach that integrates two open-source tools: Metalift, a program synthesis framework, and Gemmini, a DNN accelerator. The integration of these two tools enables developers to run legacy code on Gemmini accelerators reduces the effort to add new instructions.
Katara: Synthesizing CRDTs with Verified Lifting
Shadaj Laddad, Conor Power, Mae Milano, Alvin Cheung, Joseph M. Hellerstein
In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification.
VizSmith: Automated Visualization Synthesis by Mining Data-Science Notebooks
Rohan Bavishi, Shadaj Laddad, Hiroaki Yoshida, Mukul R. Prasad, Koushik Sen
We built a tool called VizSmith which enables code reuse for visualizations by mining visualization code from Kaggle notebooks and creating a database of 7176 reusable Python functions. Given a dataset, columns to visualize and a text query from the user, VizSmith searches this database for appropriate functions, runs them and displays the generated visualizations to the user.
Fluid Quotes: Metaprogramming across Abstraction Boundaries with Dependent Types
Shadaj Laddad and Koushik Sen
We introduce Fluid Quotes, a new technique that uses dependent types to let users pass quotes through abstraction boundaries in runtime code while splicing them ahead-of-time. This technique enables new metaprogramming capabilities by eliminating the traditional requirement of co-locating parameter expressions with call-sites. Fluid Quotes capture not only source code but also associated runtime context to ensure correctness.
ScalaPy: Seamless Python Interoperability for Cross-Platform Scala Programs
Shadaj Laddad and Koushik Sen
In this paper, we introduce ScalaPy, a system for interoperability between Scala and Python. With ScalaPy, developers can use Python libraries in Scala by treating Python values as Scala objects and exposing Scala values to Python. ScalaPy supports both Scala on the JVM and Scala Native, enabling its usage from data experiments in interactive notebook environments to performance-critical production systems.