Detail

Publication date: 1 de June, 2021

Space Complexity of Replicated Data Types

Data replication is an ubiquitous technique that improves fault-tolerance,
availability and responsiveness of a system. Many recent applications opt for
eventually consistent replication, trading strong consistency for low
latency and high availability. However, the implementation of eventually
consistent systems can be challenging, due to concurrent updates and
conflict resolution, unpredictable network conditions, and failures. Addressing
these problems requires complex metadata and logic. An emerging solution is
to encapsulate this complexity behind reusable Replicated Data Type (RDT)
abstractions, such as counters, sets or registers.

A number of different RDT implementations have been proposed recently,
leading to the natural questions: which one to choose? Do they implement
the same semantics? How much space overhead do they introduce? Do more
efficient implementation exist? What are the inherent limits, and what can be
improved?

To answer these questions, we introduce several new concepts for reasoning
about correctness and performance of RDTs:
1) We formalize more than 4 different data type specifications
2) We introduce a metric to characterize the asymptotic metadata overhead
3) We derive lower bounds for all 4 data types, that is, establish the
minimum space overhead required for ANY correct implementation
4) We present optimal implementations for all 4 data types, selected from
the literature

This talk will overview our framework for reasoning about RDTs with a
walk-through example study of a selected data type.

This is a joint-work with Sebastian Burckhardt (MSR Redmond), Alexey
Gotsman (IMDEA Software), and Hongseok Yang (Oxford) that appeared
as “Replicated Data Types: Specification, Verification, Optimality” in POPL’14.

Presenter

Marek Zawirski,

Date 09/04/2014
State Concluded