Mirador – Lehrstuhl für Informatik 4

Micro Replication for Dependable Network-based Services

(Third Party Funds Single)

Project leader: Tobias Distler
Project members: Laura Lawniczak, Harald Böhm
Start date: 1. November 2024
End date: 31. October 2027
Acronym: Mirador
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

Abstract:

Network-based services such as distributed databases, file systems, or blockchains are essential parts of today's computing infrastructures and therefore must be able to withstand a wide spectrum of fault scenarios, including hardware crashes, software failures, and attacks. Although a variety of state-machine replication protocols exist that provide fault and intrusion tolerance, it is inherently difficult to build dependable systems based on their complex and often incomplete specifications. Unfortunately, this commonly leads to systems being vulnerable against correlated failures or attacks, for example, in cases where, to save development and maintenance costs, all replicas in a system share the same homogeneous implementation.In the Mirador project, we seek to eliminate this gap between theory and practice, proposing a novel paradigm for the specification and implementation of dependable systems: micro replication. In contrast to existing systems, micro-replication architectures do not consist of a few monolithic and complex replicas, but instead are organized as dedicated, loosely coupled micro-replica clusters that are each responsible for a different protocol step or mechanism. As a key benefit of providing only a small subset of the overall protocol functionality, micro replicas make it significantly easier to reason about the completeness and correctness of both specifications as well as implementations. To further reduce complexity, all micro replicas follow a standardized internal work flow, thereby greatly facilitating the task of introducing heterogeneity at the replica, communication, and authentication level.Starting from this basic concept, in the Mirador project we explore micro replication as a means to build dependable replicated systems and examine its flexibility by developing micro-replication architectures for different fault models (i.e., crashes and Byzantine faults). In particular, our research focuses on two problem areas: First, we aim at increasing the resilience of micro-replicated systems by enabling them to recover from replica failures. Among other things, this requires mechanisms for rejuvenating micro replicas from a clean state and integrating replacement replicas at runtime. Second, our goal is to improve the performance and efficiency of micro-replicated systems and the applications running on top of them. Specifically, this includes the design of techniques to reduce overheads by exploiting optimistic approaches that save processor and network resources in the absence of faults. Furthermore, we investigate ways to restructure the service logic and for example outsource preprocessing steps to upstream micro-replica clusters. To evaluate the concepts, protocols, and mechanisms developed in the Mirador project, we build a heterogeneous micro-replicated platform that allows us to conduct experiments for a wide range of different settings and with a variety of applications.

Publications:

Distler T., Eischer M., Lawniczak L.:
Micro Replication
53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '23) (Porto, Portugal, 27. June 2023 - 30. June 2023)
In: Proceedings of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '23) 2023
DOI: 10.1109/DSN58367.2023.00024
Lawniczak L., Distler T.:
Targeting Tail Latency in Replicated Systems with Proactive Rejection
25th Middleware Conference (Middleware '24) (Hong Kong, 2. December 2024 - 6. December 2024)
In: Proceedings of the 25th Middleware Conference (Middleware '24) 2024
DOI: 10.1145/3652892.3700775
URL: https://sys.cs.fau.de/publications/2024/lawniczak_24_middleware.pdf
Messadi I., Gerber ME., Distler T., Kapitza R.:
TEE-Assisted Recovery and Upgrades for Long-Running BFT Services
20th International Conference on Availability, Reliability and Security (ARES '25) (Ghent, 11. August 2025 - 14. August 2025)
In: Proceedings of the 20th International Conference on Availability, Reliability and Security (ARES '25) 2025
URL: https://sys.cs.fau.de/publications/2025/messadi_25_ares.pdf
Lawniczak L., Distler T.:
Hard Shell, Reliable Core: Improving Resilience in Replicated Systems with Selective Hybridization
44th International Symposium on Reliable Distributed Systems (SRDS '25) (Porto, 29. September 2025 - 2. October 2025)
In: Proceedings of the 44th International Symposium on Reliable Distributed Systems (SRDS '25) 2025
URL: https://sys.cs.fau.de/publications/2025/lawniczak_25_srds.pdf