Projects details

  • SwiftComp - SwiftComp - Fast and Efficient Incremental Computation for Cloud Computing Environments
  • Jul 2013 - Jun 2015
  • Information has become a key commodity for most services provided over the Internet, including e-commerce, social networking or news sites. Analyzing personal and business data is now common practice; doing so efficiently, in real time, is networking or news sites. Analyzing personal and business data is now common practice; doing so efficiently, in real time, is increasingly more important for supporting new products and applications.

    Data processing solutions for cloud computing infrastructures, such as Map-Reduce and DryadLINQ, have been designed for batch processing of bulk data. Effective for their primary goal, they proved inadequate for real-time processing. Latency can be improved by extending batch processing systems, by exposing partial computations and early results, as in (e.g. Map-Reduce Online) or pursuing incremental models. However, pitfalls remain that stem from operating over persisted bulk data and the penalty inherent to secondary storage. Performing computations over soft state offsets such problem, and is one of the advantages associated with stream processing systems (e.g., Yahoo’s S4), but resilience to faults becomes a major concern, as faults impact consistency and determinism of ongoing computations. Ideally, one would like to combine the low latency benefits of soft state based decentralized computations with the fault tolerance guarantees of a backing store. This project will pursue such avenue.

    SwiftComp will address real-time, incremental data processing data by tightly integrating computations into a decentralized storage system, striving for both performance and resilience, to ensure deterministic computations in the presence of faults. Moreover, it will pursue a programming model for combining computation and storage abstractions is a seamless way, including provisions for computations that process live streams and globally consistent snapshots of partial or whole datasets.

    The system we envision will rely on the Conflict-free Replicated Data Type (CRDT) abstraction, proposed by the members of this project. CRDTs is a potential path to address the CAP theorem, by providing strong eventual consistency, availability and partition-tolerance. As such, CRDTs allow multiple replicas of the same data (structure) to be modified without coordination, guaranteeing that replicas can be merged later without need for any conflict resolution policy. These fundamentals led the team to begin the design of a decentralized storage system around the CRDT concept, following a model that leverages aggressive in memory replica caching to achieve high throughput and low latency, and secondary storage as a fallback for long term persistence and fault tolerance.

    In SwiftComp, one key challenge is to design specialized computation CRDTs, such as min/max registers, counters, accumulators, products, sorted sets/maps, that preform computations as new data is added to the CRDT, thus performing incremental computations. By relying on the properties of CRDTs, these computations are executed in a decentralized way with partial results being propagated during replica synchronization.

    Given our system design, since updates to CRDTs happen primarily in memory replicas, computations will benefit from the low latency and high throughput of soft state. For resilience to faults, we want to replicate computations or resume from a consistent state retrieved from persistent storage; either way, eventual consistency of CRDTs should provide the basis for deterministic computation results. As for efficient processing, we will focus on tailored design of processing oriented CRDTs: (1) that only trigger updates when a significant change, according to the particular invariant, occurs in a given replica, thus avoiding unnecessary chained computations (and communication); (2) that allow new states to be computed incrementally from previous values. Finally, using CRDTs as a shared abstraction for both storage and computations, we expect to provide a homogenous high-level programming model that allows to express complex computations as workflows of more elementary computations.

    SwiftComp will leverage past experience in stream and incremental data processing and use Participatory Sensing (P/S) as a reference scenario. P/S explores the ubiquity of sensor capable mobile phones to capture information about the environment, users’ routines, etc. Good examples of P/S applications combine accelerometer and GPS data to monitor road conservation and traffic congestion. Some P/S applications, such as traffic monitoring, have clear real time requirements. Large volumes of data may also need to be indexed and archived for historical value and detection of anomalies. Often, processing involves spatial and temporal partitioning, followed by aggregation and statistical analysis. Overall, the problems involved in processing P/S data are varied and should provide valuable insights in the pursue of the project goals.

  • PN
  • CITI - FCT/UNL - Centro de Informática e Tecnologias de Informação, FCT/UNL
  • FCT-MCTES - Fundação para a Ciência e a Tecnologia (MEC)
  • 93242
  • 93242
  • 1 Jul 2013
  • 30 Jun 2015
  • Nuno Preguiça [Coordinator], Rodrigo Rodrigues [Researcher], João Lourenço [Researcher], Margarida Mamede [Researcher], Sérgio Duarte [Researcher], David Navalho [Researcher]