Distributed Systems
Distributed systems consist of multiple independent components that are connected by a network and provide a common service. Depending on the particular use case, this includes deployments in which a collection of small data sets is distributed among a few nodes, as well as architectures for the massively parallelized processing of large workloads in the context of cloud applications. While on the one hand the distribution of a system offers new opportunities, for example the improvement of fault-tolerance properties through replication of data and computations, on the other hand it also creates additional challenges such as the need for an efficient implementation of services when several geographic sites are involved. A key goal of this group's research activities is the development of concepts and techniques that enable systems to leverage the advantages associated with distribution and at the same time use the available resources as efficiently as possible.
Projects:
Funding source: Bundesministerium für Wirtschaft und Klimaschutz (BMWK)
Project leader:
Das Konzept von Distributed Ledger Systemen (Blockchain) ist eine grundlegend neue Basistechnologie, welche in der öffentlichen Wahrnehmung derzeit verstärkt im Fokus steht und welche erhöhtes Potential zur Lösung von Problemstellungen in einer Vielzahl von Anwendungsbereichen verspricht.
Daneben wandelt sich die Luftverkehrslandschaft absehbar mit einer massiven Zunahme an Luftverkehrsteilnehmern und weiteren Luftverkehrsarten wie autonomen Kleinstsystemen. Des Weiteren besteht ein…
Funding source: Deutsche Forschungsgemeinschaft (DFG)
Project leader:
Distributed Ledger Technologies (DLTs), often referred to as blockchains, enable the realisation of reliable and attack-resilient services without a central infrastructure. However, the widely used proof-of-work mechanisms for DLTs suffer from high latencies of operations and enormous energy costs. Byzantine fault-tolerant (BFT) consensus protocols prove to be a potentially energy-efficient alternative to proof-of-work. However, current BFT protocols also present challenges that still limit their practical use in production systems. This research project addresses these challenges by (1) improving the scalability of BFT consensus protocols without reducing their resilience, (2) applying modelling approaches for making the expected performance and timing behaviour of these protocols more predictable, even under attacks, taking into consideration environmental conditions, and (3) supporting the design process for valid, automated testable BFT systems from specification to deployment in a blockchain infrastructure. The topic of scalability aims at finding practical solutions that take into account challenges such as recovery from major outages or upgrades, as well as reconfigurations at runtime. We also want to design a resilient communication layer that decouples the choice of a suitable communication topology from the actual BFT consensus protocol and thus reduces its complexity.This should be supported by the use of trusted hardware components. In addition, we want to investigate combinations of these concepts with suitable cryptographic primitives to further improve scalability. Using systematic modelling techniques, we want to be able to analyse the efficiency of scalable, complex BFT protocols (for example, in terms of throughput and latency of operations), already before deploying them in a real environment, based on knowledge of system size, computational power of nodes, and basic characteristics of the communication links. We also want to investigate robust countermeasures that help defending against targeted attacks in large-scale blockchain systems. The third objective is to support the systematic and valid implementation in a practical system, structured into a constructive, modular approach, in which a validatable BFT protocol is assembled based on smaller, validatable building blocks; the incorporation of automated test procedures based on a heuristic algorithm which makes the complex search space of misbehaviour in BFT systems more manageable; and a tool for automated deployment with accompanying benchmarking and stress testing in large-scale DLTs.
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader:
Network-based services such as distributed databases, file systems, or blockchains are essential parts of today's computing infrastructures and therefore must be able to withstand a wide spectrum of fault scenarios, including hardware crashes, software failures, and attacks. Although a variety of state-machine replication protocols exist that provide fault and intrusion tolerance, it is inherently difficult to build dependable systems based on their complex and often incomplete specifications.…
Project leader:
The processing of large amounts of data on distributed execution platforms such as MapReduce or Heron contributes significantly to the energy consumption of today's data centers. The E³ project aims at minimizing the power consumption of such execution environments without sacrificing performance. For this purpose, the project develops means to make execution environments and data-processing platforms energy aware and to enable them to exploit knowledge about applications to dynamically adapt …
Project leader:
Coordination services such as ZooKeeper are essential building blocks of today's data-center infrastructures as they provide processes of distributed applications with means to exchange messages, to perform leader election, to detect machine or process crashes, or to reliably store configuration data. Providing an anchor of trust for their client applications, coordination services have to meet strong requirements regarding stability and performance. Only this way, it is possible to ensure that…
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
Project leader:
Internet-based services play a central role in today's society. With such services progressively taking over from traditional infrastructures, their complexity steadily increases. On the downside, this leads to more and more faults occurring. As improving software-engineering techniques alone will not do the job, systems have to be prepared to tolerate faults and intrusions.
REFIT investigates how systems can provide fault and intrusion tolerance in a resource-efficient manner. The key technology…
Contact Persons:
Participating Scientists:
Publications:
Probabilistic Byzantine Fault Tolerance
43rd Symposium on Principles of Distributed Computing (PODC 2024) (Nantes, 17. June 2024 - 21. June 2024)
In: Proceedings of the 43rd Symposium on Principles of Distributed Computing (PODC 2024) 2024 , , , , :
TinyBFT: Byzantine Fault-Tolerant Replication for Highly Resource-Constrained Embedded Systems
30th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2024) (Hong Kong, China, 13. May 2024 - 16. May 2024)
In: Proceedings of the 30th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2024) 2024
DOI: 10.1109/RTAS61025.2024.00026
URL: https://sys.cs.fau.de/publications/2024/boehm_24_rtas.pdf , , :
Spider: A BFT Architecture for Geo-Replicated Cloud Services
(2024)
DOI: 10.48550/arXiv.2407.07899 , :
Tough on the Outside, Reliable on the Inside: Utilizing System Composition for Improved Resilience
1st Workshop on Resilient Operations – Byzantine Fault Tolerance and State-Machine Replication (ROBUST '24) (Erlangen, 13. March 2024 - 14. March 2024)
Open Access: https://robust2024.github.io/robust24/assets/abstracts/Utilizing_System_Composition_for_Improved_Resilience.pdf , :
Memory-Efficient Byzantine Fault-Tolerant Replication for Highly Resource-Constrained Systems
1st Workshop on Resilient Operations – Byzantine Fault Tolerance and State-Machine Replication (ROBUST '24) (Erlangen, 13. March 2024 - 14. March 2024)
Open Access: https://robust2024.github.io/robust24/assets/abstracts/memory-efficient-bft.pdf , :
Geo-Replicated Byzantine Fault-Tolerant State-Machine Replication with Low Latency (Dissertation, 2024)
DOI: 10.25593/open-fau-545 :
Micro Replication
53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '23) (Porto, Portugal, 27. June 2023 - 30. June 2023)
In: Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '23) 2023
DOI: 10.1109/DSN58367.2023.00024 , , :
SoK: Scalability Techniques for BFT Consensus
IEEE International Conference on Blockchain and Cryptocurrency (Dubai, United Arab Emirates, 1. May 2023 - 5. May 2023)
In: Proceedings of the 5th IEEE International Conference on Blockchain and Cryptocurrency 2023
DOI: 10.48550/arXiv.2303.11045
URL: https://arxiv.org/pdf/2303.11045.pdf , , , , , , :
Vivisecting the Dissection: On the Role of Trusted Components in BFT Protocols
(2023)
DOI: 10.48550/arXiv.2312.05714 , , , , , :
Generic Checkpointing Support for Stream-based State-Machine Replication
Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data (PaPoC '23) (Rome, 8. May 2023 - 8. May 2023)
DOI: 10.1145/3578358.3591329
URL: https://sys.cs.fau.de/publications/2023/lawniczak_23_papoc.pdf , , :
ZugChain: Blockchain-Based Juridical Data Recording in Railway Systems
Conference on Dependable Systems and Networks (Baltimore, Maryland, USA, 27. June 2022 - 30. June 2022)
In: Proceedings of the 52nd International Conference on Dependable Systems and Networks 2022
DOI: 10.1109/DSN53405.2022.00019
URL: https://www.ibr.cs.tu-bs.de/users/ruesch/papers/ruesch-dsn22.pdf , , , , , , , , , , , , , , :
EventChain: A Blockchain Framework for Secure, Privacy-Preserving Event Verification
23rd ACM/IFIP International Middleware Conference (Quebec, QC, 7. November 2022 - 11. November 2022)
DOI: 10.1145/3528535.3565243
URL: https://dl.acm.org/doi/10.1145/3528535.3565243 , , , , , , , :
SplitBFT: Improving Byzantine Fault Tolerance Safety Using Trusted Compartments
23rd ACM/IFIP International Middleware Conference, Middleware 2022 (Quebec, QC, 7. November 2022 - 11. November 2022)
In: Middleware 2022 - Proceedings of the 23rd ACM/IFIP International Middleware Conference 2022
DOI: 10.1145/3528535.3531516 , , , , , :
Byzantine Fault-Tolerant State-machine Replication from a Systems Perspective
In: ACM Computing Surveys 54 (2021), Article No.: 24
ISSN: 0360-0300
DOI: 10.1145/3436728 :
Egalitarian Byzantine Fault Tolerance
2021 IEEE 26th Pacific Rim International Symposium on Dependable Computing (PRDC) (Perth, 1. December 2021 - 3. December 2021)
DOI: 10.1109/PRDC53464.2021.00019
URL: https://www4.cs.fau.de/Publications/2021/eischer_21_prdc.pdf , :
Stream-based State Machine Replication
In: Proceedings of the 17th European Dependable Computing Conference (EDCC '21) 2021
DOI: 10.1109/edcc53658.2021.00024
URL: https://arxiv.org/pdf/2106.13019 , :
Resilient Cloud-based Replication with Low Latency
21st International Middleware Conference, Middleware 2020 (, 7. December 2020 - 11. December 2020)
In: Middleware 2020 - Proceedings of the 2020 21st International Middleware Conference 2020
DOI: 10.1145/3423211.3425689
URL: https://www4.cs.fau.de/Publications/2020/eischer_20_middleware.pdf , :
Low-Latency Geo-Replicated State Machines with Guaranteed Writes
7th Workshop on Principles and Practice of Consistency for Distributed Data, PaPoC 2020 (Heraklion, 27. April 2020 - 27. April 2020)
In: Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data, PaPoC 2020 2020
DOI: 10.1145/3380787.3393686
URL: https://www4.cs.fau.de/Publications/2020/eischer_20_papoc.pdf , , :
Deterministic Fuzzy Checkpoints
International Symposium on Reliable Distributed Systems (SRDS '19) (Lyon, 1. October 2019 - 4. October 2019)
In: Proceedings of the 38th International Symposium on Reliable Distributed Systems (SRDS '19) 2019
DOI: 10.1109/SRDS47363.2019.00026
URL: https://www4.cs.fau.de/Publications/2019/eischer_19_srds.pdf , , :
In Search of a Scalable Raft-based Replication Architecture
6th Workshop on Principles and Practice of Consistency for Distributed Data, PaPoC 2019 (Dresden, 25. March 2019)
In: Proceedings of the 6th Workshop on Principles and Practice of Consistency for Distributed Data, PaPoC 2019 2019
DOI: 10.1145/3301419.3323968 , :
Troxy: Transparent Access to Byzantine Fault-Tolerant Systems
48th International Conference on Dependable Systems and Networks (DSN '18) (Luxembourg City, Luxembourg, 25. June 2018 - 28. June 2018)
In: Proceedings of the 48th International Conference on Dependable Systems and Networks (DSN '18) 2018
DOI: 10.1109/DSN.2018.00019
URL: https://www4.cs.fau.de/Publications/2018/li_18_dsn.pdf , , , , , :
Strome: Energy-Aware Data-Stream Processing
Distributed Applications and Interoperable Systems (Madrid, 18. June 2018 - 21. June 2018)
In: Proceedings of the 18th International Conference on Distributed Applications and Interoperable Systems (DAIS '18) 2018
DOI: 10.1007/978-3-319-93767-0_4 , , , :
Scalable Byzantine Fault-tolerant State-Machine Replication on Heterogeneous Servers
In: Computing (2018), p. 1-22
ISSN: 0010-485X
DOI: 10.1007/s00607-018-0652-3
URL: https://www4.cs.fau.de/Publications/2018/eischer_18_computing.pdf , :
Latency-Aware Leader Selection for Geo-Replicated Byzantine Fault-Tolerant Systems
1st Workshop on Byzantine Consensus and Resilient Blockchains (BCRB '18) (Luxembourg City, 25. June 2018 - 28. June 2018)
In: Proceedings of the 48th International Conference on Dependable Systems and Networks Workshops (DSN-W '18) 2018
DOI: 10.1109/DSN-W.2018.00053
URL: https://www4.cs.fau.de/Publications/2018/eischer_18_bcrb.pdf , :
Hybrids on Steroids: SGX-based High Performance BFT
EuroSys 2017 (Belgrade)
In: Proceedings of the 12th European Conference on Computer Systems (EuroSys '17) 2017
URL: https://www4.cs.fau.de/Publications/2017/behl_17_eurosys.pdf , , :
Hybster - A Highly Parallelizable Protocol for Hybrid Fault-Tolerant Service Replication
(2017)
DOI: 10.24355/dbbs.084-201703031341 , , :
Resource-efficient Byzantine Fault Tolerance
In: IEEE Transactions on Computers, Washington, DC, USA: IEEE Computer Society, 2016, p. 2807-2819 (IEEE Transactions on Computers, Vol.65(9))
DOI: 10.1109/TC.2015.2495213
URL: https://www4.cs.fau.de/Publications/2015/distler_15_ieeetc.pdf , , :
SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance
EDCC 2016 (Gothenburg)
In: Proceedings of the 12th European Dependable Computing Conference (EDCC '16) 2016 , , , , :
Consensus-Oriented Parallelization: How to Earn Your First Million
Middleware 2015 (Vancouver)
In: Proceedings of the 16th Middleware Conference (Middleware '15) 2015
DOI: 10.1145/2814576.2814800
URL: https://www4.cs.fau.de/Publications/2015/behl_15_mw.pdf , , :
Towards Energy-Proportional State-Machine Replication
14th Workshop on Adaptive and Reflective Middleware (Vancouver)
In: Proceedings of the 14th Workshop on Adaptive and Reflective Middleware (ARM '15) 2015
DOI: 10.1145/2834965.2834969
URL: https://www4.cs.fau.de/Publications/2015/eibel_15_arm.pdf , :
Scalable BFT for Multi-Cores: Actor-based Decomposition and Consensus-oriented Parallelization
HotDep 2014 (Broomfield)
In: Proceedings of the 10th Workshop on Hot Topics in System Dependability (HotDep '14) 2014 , , :
CheapBFT: Resource-efficient Byzantine Fault Tolerance
EuroSys 2012 (Bern, 10. April 2012 - 13. April 2012)
In: Proceedings of the EuroSys 2012 Conference (EuroSys '12) 2012
DOI: 10.1145/2168836.2168866
URL: http://www4.cs.fau.de/Publications/2012/kapitza_12_eurosys.pdf , , , , , , , :
Increasing Performance in Byzantine Fault-Tolerant Systems with On-Demand Replica Consistency
EuroSys 2011 (Salzburg, 10. April 2011 - 13. April 2011)
In: Proceedings of the EuroSys 2011 Conference (EuroSys '11) 2011
DOI: 10.1145/1966445.1966455
URL: http://eurosys2011.cs.uni-salzburg.at/pdf/eurosys2011-distler.pdf , :
SPARE: Replicas on Hold
18th Network and Distributed System Security Symposium (NDSS '11) (San Diego)
In: Proceedings of the 18th Network and Distributed System Security Symposium (NDSS '11) 2011
URL: http://www.isoc.org/isoc/conferences/ndss/11/pdf/8_1.pdf , , , , :
State Transfer for Hypervisor-Based Proactive Recovery of Heterogeneous Replicated Services
SICHERHEIT '10 (Berlin, 5. October 2010 - 7. October 2010)
In: Proceedings of the 5th "Sicherheit, Schutz und Zuverlässigkeit" Conference (SICHERHEIT '10) 2010
URL: http://www4.informatik.uni-erlangen.de/~distler/publications/distler10state.pdf , , :
Functional Decomposition and Interactions in Hybrid Intrusion-tolerant Systems
In: Proceedings of the 3rd Workshop on Middleware-Application Interaction (MAI '09) 2009 , , :
Efficient State Transfer for Hypervisor-Based Proactive Recovery
In: Proceedings of the 2nd Workshop on Recent Advances on Intrusion-Tolerant Systems (WRAITS '08) 2008 , , :- Bessani Alysson, Reiser Hans P. , Sousa Paulo, Gashi Ilir, Stankovic Vladimir, Distler Tobias, Kapitza Rüdiger, Daidone Alessandro , Obelheiro Rafael:
FOREVER: Fault/intrusiOn REmoVal through Evolution & Recovery
ACM/IFIP/USENIX Middleware '08 (Leuven, Belgium)
In: Companion '08: Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion 2008
DOI: 10.1145/1462735.1462763
Targeting Tail Latency in Replicated Systems with Proactive Rejection
25th Middleware Conference (Middleware '24) (Hong Kong, 2. December 2024 - 6. December 2024)
In: Proceedings of the 25th Middleware Conference (Middleware '24) 2024
DOI: 10.1145/3652892.3700775
URL: https://sys.cs.fau.de/publications/2024/lawniczak_24_middleware.pdf , :