The goal of this project is to develop an efficient middleware system to support byzantine fault-tolerant database replication.
Relational database systems are employed for supporting a wide range of uses, from simple applications used by members of a small organization to complex applications used by thousands of users. For example, most web applications built upon the popular three-tier model rely on a database system to store all manipulated data.
In many applications, performance and availability are crucial, which implies that the underlying database engine must also offer good performance and high availability.
Replication is a technique that can be used to improve availability by maintaining multiple data replicas and relying on the replicas of the surviving nodes to continue operation in the presence of failures. Replication algorithms can be divided broadly into eager and lazy algorithms. Lazy replication algorithms [SS05] asynchronously propagate replica updates to other replicas after the updating transaction commits. This approach allows good performance by minimizing the coordination overhead among replicas when transactions are executed. However, in the presence of concurrent transactions, this approach may lead to temporary replica staleness, and even replica divergence (depending on the protocol used). In this case, maintaining consistency is usually left to the the application, requiring application-specific solutions. Thus, when consistency is important, as it is the case in many database-based applications, it is simpler to rely on eager replication schemes.
Another important choice replication algorithms make are the assumptions about the behavior of faulty replicas. Most earlier research has focused on techniques that tolerate benign faults: these algorithms assume components fail by stopping or by omitting some steps and may not provide correct service if a faulty component violates this assumption. This assumption can be problematic as malicious attacks, operator mistakes, and software errors can cause faulty nodes to exhibit arbitrary behavior. Techniques that tolerate Byzantine faults [LSP82] address this problem since they make no assumptions about the behavior of faulty components.
The central goal of this project is to build a replicated database system that tolerates Byzantine faults. The main challenge resides in the fact that Byzantine fault tolerance replication introduces a non-negligible performance overhead. This fact, combined with the necessity of the use of eager replication, has led to a low adoption of Byzantine fault tolerance in database replication. This project aims at developing novel techniques for improving the performance of Byzantine fault tolerant replicated databases.
The first technique this project will explore is to relax the isolation level [BBG+95] for transaction execution. This approach has the potential to improve scalability by allowing a higher degree of concurrency among transactions, as it is shown in recent papers [LKPMJP05, EZP05]. Moreover, it does not pose any major problem for applications that require strict serializability, as it has already been shown how to transform an application program so that its execution under snapshot isolation is equivalent to strict serializability [FLO+05]. In this project we will design novel database replication algorithms that combine eager replication, Byzantine fault tolerance, and provide snapshot-isolation semantics.
The second mechanism that we intend to integrate in our solution is support for speculative execution [NCF05] in the client. In such a scheme, the client program continues its execution before the final result of an operation is established, based on a tentative result. This approach will allow to partially mask delay times required for coordination in Byzantine eager replication algorithms, leading to better perceived performance from clients. This speculative execution will tend to use computing resources that would be otherwise wasted, as it is executed in the client when the client is waiting for the operation result, thus imposing no (or minimal) overhead in the clients. Additionally, as this approach reduces the importance of providing an immediate reply to clients, it also opens the opportunity to improve the performance of Byzantine replication by optimizing the replication protocol . e.g., the distinct primary replica, usually responsible for ordering the requests, will now be able to group more operations into a single request and execute them in batches.
Our solution will be built as a middleware layer. We intend to reduce to the minimum the dependencies on the underlying database engine, thus improving the portability of our solution and making it easily usable with different database engines.