Distributed Systems
Distributed systems consist of multiple independent components that are connected by a network and provide a common service. Depending on the particular use case, this includes deployments in which a collection of small data sets is distributed among a few nodes, as well as architectures for the massively parallelized processing of large workloads in the context of cloud applications. While on the one hand the distribution of a system offers new opportunities, for example the improvement of fault-tolerance properties through replication of data and computations, on the other hand it also creates additional challenges such as the need for an efficient implementation of services when several geographic sites are involved. A key goal of this group's research activities is the development of concepts and techniques that enable systems to leverage the advantages associated with distribution and at the same time use the available resources as efficiently as possible.
Projects:
E³: Energy-aware Execution Environments
The processing of large amounts of data on distributed execution platforms such as MapReduce or Heron contributes significantly to the energy consumption of today's data centers. The E³ project aims at minimizing the power consumption of such execution environments without sacrificing performance. For this purpose, the project develops means to make execution environments and data-processing platforms energy aware and to enable them to exploit knowledge about applications to dynamically adapt …
REFIT: Resource-Efficient Fault and Intrusion Tolerance
Internet-based services play a central role in today's society. With such services progressively taking over from traditional infrastructures, their complexity steadily increases. On the downside, this leads to more and more faults occurring. As improving software-engineering techniques alone will not do the job, systems have to be prepared to tolerate faults and intrusions.
REFIT investigates how systems can provide fault and intrusion tolerance in a resource-efficient manner. The key technology…
EDC: Efficient Distributed Coordination
Coordination services such as ZooKeeper are essential building blocks of today's data-center infrastructures as they provide processes of distributed applications with means to exchange messages, to perform leader election, to detect machine or process crashes, or to reliably store configuration data. Providing an anchor of trust for their client applications, coordination services have to meet strong requirements regarding stability and performance. Only this way, it is possible to ensure that…
BFT2Chain: Design and validation of scalable, Byzantine fault tolerant consensus algorithms for blockchains
Distributed Ledger Technologies (DLTs), often referred to as blockchains, enable the realisation of reliable and attack-resilient services without a central infrastructure. However, the widely used proof-of-work mechanisms for DLTs suffer from high latencies of operations and enormous energy costs. Byzantine fault-tolerant (BFT) consensus protocols prove to be a potentially energy-efficient alternative to proof-of-work. However, current BFT protocols also present challenges that still limit their practical use in production systems. This research project addresses these challenges by (1) improving the scalability of BFT consensus protocols without reducing their resilience, (2) applying modelling approaches for making the expected performance and timing behaviour of these protocols more predictable, even under attacks, taking into consideration environmental conditions, and (3) supporting the design process for valid, automated testable BFT systems from specification to deployment in a blockchain infrastructure. The topic of scalability aims at finding practical solutions that take into account challenges such as recovery from major outages or upgrades, as well as reconfigurations at runtime. We also want to design a resilient communication layer that decouples the choice of a suitable communication topology from the actual BFT consensus protocol and thus reduces its complexity.This should be supported by the use of trusted hardware components. In addition, we want to investigate combinations of these concepts with suitable cryptographic primitives to further improve scalability. Using systematic modelling techniques, we want to be able to analyse the efficiency of scalable, complex BFT protocols (for example, in terms of throughput and latency of operations), already before deploying them in a real environment, based on knowledge of system size, computational power of nodes, and basic characteristics of the communication links. We also want to investigate robust countermeasures that help defending against targeted attacks in large-scale blockchain systems. The third objective is to support the systematic and valid implementation in a practical system, structured into a constructive, modular approach, in which a validatable BFT protocol is assembled based on smaller, validatable building blocks; the incorporation of automated test procedures based on a heuristic algorithm which makes the complex search space of misbehaviour in BFT systems more manageable; and a tool for automated deployment with accompanying benchmarking and stress testing in large-scale DLTs.
Contact Persons:
Participating Scientists:
Publications:
SplitBFT: Improving Byzantine Fault Tolerance Safety Using Trusted Compartments
23rd ACM/IFIP International Middleware Conference, Middleware 2022 (Quebec, QC, 7. November 2022 - 11. November 2022)
In: Middleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference 2022
DOI: 10.1145/3528535.3531516 , , , , , :
Byzantine Fault-Tolerant State-machine Replication from a Systems Perspective
In: ACM Computing Surveys 54 (2021), Article No.: 24
ISSN: 0360-0300
DOI: 10.1145/3436728 :
Stream-based State Machine Replication
In: Proceedings of the 17th European Dependable Computing Conference (EDCC '21) 2021
DOI: 10.1109/edcc53658.2021.00024 , :
Resilient Cloud-based Replication with Low Latency
21st International Middleware Conference, Middleware 2020 (, 7. December 2020 - 11. December 2020)
In: Middleware 2020 - Proceedings of the 2020 21st International Middleware Conference 2020
DOI: 10.1145/3423211.3425689
URL: https://www4.cs.fau.de/Publications/2020/eischer_20_middleware.pdf , :
Strome: Energy-Aware Data-Stream Processing
Distributed Applications and Interoperable Systems (Madrid, 18. June 2018 - 21. June 2018)
In: Proceedings of the 18th International Conference on Distributed Applications and Interoperable Systems (DAIS '18) 2018
DOI: 10.1007/978-3-319-93767-0_4 , , , :
Resource-efficient Byzantine Fault Tolerance
In: IEEE Transactions on Computers, Washington, DC, USA: IEEE Computer Society, 2016, p. 2807-2819 (IEEE Transactions on Computers, Vol.65(9))
DOI: 10.1109/TC.2015.2495213
URL: https://www4.cs.fau.de/Publications/2015/distler_15_ieeetc.pdf , , :