Articles details

  • PIXIDA: Optimizing Data Parallel Jobs in Wide-Area Data Analytics
  • 01 Oct 2015
  • In the era of global-scale services, big data analytical queries are often required to process datasets that span multiple data centers (DCs). In this setting, cross-DC bandwidth is often the scarcest, most volatile, and/or most expensive resource. However, current widely deployed big data analytics frameworks make no attempt to minimize the traffic traversing these links. In this paper, we present PIXIDA, a scheduler that aims to minimize data movement across resource constrained links. To achieve this, we introduce a new abstraction called SILO, which is key to modeling PIXIDA’s scheduling goals as a graph partitioning problem. Furthermore, we show that existing graph partitioning problem formulations do not map to how big data jobs work, causing their solutions to miss opportunities for avoiding data movement. To address this, we formulate a new graph partitioning problem and propose a novel algorithm to solve it. We integrated PIXIDA in Spark and our experiments show that, when compared to existing schedulers, PIXIDA achieves a significant traffic reduction of up to ∼ 9× on the aforementioned links.
  • Proceedings of the VLDB Endowment
  • VLDB
  • Konstantinos Kloudas, Margarida Mamede, Nuno Preguiça, Rodrigo Rodrigues
  • 9
  • 2
  • 2150-8097
  • http://www.vldb.org/pvldb/vol9/p72-kloudas.pdf
  • 72 to 83
  • 1 Oct 2015