Juergen Kleinoeder

Dr.-Ing. Jürgen Kleinöder

Department of Computer Science
Chair of Computer Science 4 (Distributed Systems and Operating Systems)

Room: Raum 0.043
Martensstr. 1
91058 Erlangen

CIO of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Senior Academic Director at the Department of ‘Computer Sciences 4 (Distributed Systems and Operating Systems Group)

Lectures

Summer 2022: Systemprogrammierung 1

Publikationen (Cris)

2007

2005

2002

2001

2000

1999

1998

1997

1996

1994

1992

Rersearch areas (Cris)

Research projects (Cris)

  • Non-volatility in energy-aware operating systems

    (Third Party Funds Single)

    Term: since 1. January 2022
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    The current trend toward fast, byte-addressable non-volatile memory (NVM) with latencies and write resistance closer to SRAM and DRAM than flash positions NVM as a possible replacement for established volatile technologies. While on the one hand the non-volatility and low leakage capacity make NVM an attractive candidate for new system designs in addition to other advantageous features, on the other hand there are also major challenges, especially for the programming of such systems. For example, power failures in combination with NVM to protect the computing status result in control flows that can unexpectedly transform a sequential process into a non-sequential process: a program has to deal with its own status from earlier interrupted runs.

    If programs can be executed directly in the NVM, normal volatile main memory (functional) becomes superfluous. Volatile memory can then only be found in the cache and in device/processor registers ("NVM-pure"). An operating system designed for this can dispense with many, if not all, persistence measures that would normally otherwise be implemented and thereby reduce its level of background noise. Considered in detail, this enables energy requirements to be reduced, computing power to be increased and latencies to be reduced. In addition, the elimination of these persistence measures means that an "NVM-pure" operating system is leaner than its functionally identical twin of conventional design. On the one hand, this contributes to better analysability of non-functional properties of the operating system and, on the other hand, results in a smaller attack surface or trustworthy computing base.

    The project follows an "NVM-pure" approach. A threatening power failure leads to an interrupt request (power failure interrupt, PFI), with the result that a checkpoint of the unavoidable volatile system state is created. In addition, in order to tolerate possible PFI losses, sensitive operating system data structures are secured in a transactional manner analogous to methods of non-blocking synchronisation. Furthermore, methods of static program analysis are applied to (1) cleanse the operating system of superfluous persistence measures, which otherwise only generate background noise, (2) break up uninterruptible instruction sequences with excessive interruption latencies, which can cause the PFI-based checkpoint backup to fail and (3) define the work areas of the dynamic energy demand analysis. To demonstrate that an "NVM-pure" operating system can operate more efficiently than its functionally identical conventional twin, both in terms of time and energy, the work is carried out with Linux as an example.

  • Power-fail aware byte-addressable virtual non-volatile memory

    (Own Funds)

    Term: since 5. April 2021

    Virtual memory (VM) subsystems blur the distinction between storage and memory such that both volatile and non-volatile data can be accessed transparently via CPU instructions. Each and every VM subsystem tries hard to keep highly contended data in fast volatile main memory to mitigate the high access latency of secondary storage, irrespective of whether the data is considered to be volatile or not. The recent advent of byte-addressable NVRAM does not change this scheme in principle, because the current technology can neither replace DRAM as fast main memory due to its significantly higher access latencies, nor secondary storage due to its significantly higher price and lower capacity. Thus, ideally, VM subsystems should be NVRAM-aware and be extended in such a way that all available byte-addressable memory technologies can be employed to their respective advantages. By means of an abstraction anchored in the VM management in the operating system, legacy software should then be able to benefit unchanged and efficiently from byte- addressable non-volatile main memory. Due to the fact that most VM subsystems are complex, highly-tuned software systems, which have evolved over decades of development, we follow a minimally invasive approach to integrate NVRAM-awareness into an existing VM subsystem instead of developing an entirely new system from scratch. NVRAM will serve as an immediate DRAM substitute in case of main memory shortage and inherently support processes with large working sets. However, due to the high access latencies of NVRAM, non-volatile data also needs to be kept at least temporarily in fast volatile main memory and the volatile CPU caches, anyway. Our new VM subsystem - we want to adapt FreeBSD accordingly - then takes care of migration of pages between DRAM and NVRAM, if the available resources allow. Thus, the DRAM is effectively managed as a large software-controlled volatile page cache for NVRAM. Consequently, this raises the question of data losses caused by power outages. The VM subsystem therefore needs to keep its own metadata in a consistent and recoverable state and modified pages in volatile memory need to be copied to NVRAM to avoid losses. The former requires an extremely efficient transactional mechanism for modifying of complex, highly contended VM metadata, while the latter must cope with potentially large amounts of modified pages with limited energy reserves.

  • Migration-Aware Multi-Core Real-Time Executive

    (Own Funds)

    Term: since 11. August 2020

    This research proposal investigates the predictability of task migration in multi-core real-time systems. Therefore, we propose , a migration- aware real-time executive where migration decisions are no longer based on generic performance parameters but systematically deduced on application-specific knowledge of the real-time tasks. These so-called migration hints relate to temporal and spatial aspects of real-time tasks; they mark potential migration points in their non- sequential (multi-threaded) machine programs. Migration hints enable the operating system to reach decisions that have the most favorable impact on the overall predictability and system performance. The proposal assumes that application-specific hints on admissible and particularly favorable program points for migration represent a cost- effective way to leverage multi-core platforms with existing real-time systems and scheduling techniques. The object of investigation is multi-core platforms with heterogeneous memory architectures. The focus is on the worst-case overhead caused by migration in such systems, mainly depending on the current size and location of the real-time tasks' resident core-local data. This data set, which varies in size over execution time, is determined using tool-based static analysis techniques that derive usable migration hints at design time. In addition, the proposal develops migration-aware variants of standard real-time operating systems, which provide specialized interfaces and mechanisms to utilize these migration hints as favorable migration points at runtime to provide predictable migrations and optimize the overall schedulability and performance of the system.

  • Energy-, Latency- and Resilience-aware Networking

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1914 „Cyber-Physical Networking (CPN)
    Term: since 1. January 2020
    Funding source: DFG / Schwerpunktprogramm (SPP)
    URL: https://www.nt.uni-saarland.de/project/latency-and-resilience-aware-networking-larn/
  • Aspect-Oriented Real-Time Architectures (Phase 2)

    (Third Party Funds Single)

    Term: 1. August 2017 - 30. September 2020
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    URL: https://www4.cs.fau.de/Research/AORTA/
    The goal of the AORTA project is to enhance the predictability of dynamic mixed-criticality real time systems by extracting critical paths. These paths are to be transformed into their static equivalents and to be executed in a time-triggered fashion at run-time. In comparison to event triggered processes, time-triggered execution tends to underuse resources. Therefore the optimistic execution model of mixed-criticality real-time systems will be retained. Only in case of an emergency the real-time system will be executed according to the static schedule. At the same time the results of the first funding phase will be generalized to dynamic real-time architectures. In particular, the focus will be on mixed-criticality systems with complex dependency patterns. The research project will investigate several variants of real-time Linux, as well as applications from the domain of control engineering.The main focus during the second funding phase of the project is going to be on dependencies between critical and non-critical paths of execution. These dependencies may potentially be problematic and can be found on all levels of the system: For example, application software may combine non-critical comfort functions with critical control functionality, leading to coupled components. In the operating system buffers may be used for shared communication stacks. Often such coupling may be desirable, however, in dynamic systems a host of possible execution paths at run-time may lead to dramatically overprovisioned system designs w.r.t. WCET and WCRT. Therefore, guaranteed execution times often lead to a loss of the efficiency gained from the dynamic real-time system design. Three key activities of this project will provide hard guarantees at run-time for the critical application core: analysis, tailoring and mechanisms.The basis for this project will be existing techniques for designing mixed-criticality systems under hard real-time constraints. For AORTA, it will be assumed that in general critical paths have deterministic structure and therefore their coupling with non-critical paths may be mapped to static equivalents. In the course of this project the applicability of the simple communication patterns provided by different variants of real-time Linux will be scrutinized to determine if these can guarantee the hard deadlines of safety-critical control applications, and if the concepts and techniques for static analysis, tailoring and scheduling developed in the first funding phase are suitable for this purpose. In addition the necessity of coupling the real-time architecture, scheduling and dependencies will be investigated in the context of mixed-criticality real-time systems to determine the general fitness of real-time Linux's design concepts for switching real time paradigms at run-time.
  • Latency and Resilience-aware Networking

    (Third Party Funds Group – Sub project)

    Overall project: Cyber-Physical Networking (CPN)
    Term: 1. January 2016 - 31. December 2019
    Funding source: DFG / Schwerpunktprogramm (SPP)
    The poject develops transport channels for cyber-physical networks. Such channels need to be latency- and resilience-aware; i.e. the latency as seen by the application must be predictable and in certain limits, e.g. by balancing latency and resilience, be guaranteed. This is only possible by an innovative transport protocol stack and an appropriate fundament of operating system and low level networking support. Thereto the current proposal unites the disciplines Operating Systems / Real-Time Processing and Telecommunications / Information- Theory.

    Project target is the evolution of the PRRT (predictably reliable real-time transport) transport protocol stack towards a highly efficient multi-hop-protocol with loss domain separation. This is enabled by an interdisciplinary co-development with a latency-aware operating system kernel incl. wait-free synchronisation and the corresponding low level networking components (POSE, ``predictable operating system executive''). The statistical properties of the entire system (RNA, ``reliable networking atom'') shall be optimised and documented.

    A software-defined networking testbed for validation of the system in a real-world wide area network scenario is available. The developed components will be introduced during the workshops organised by the priority programme Cyber- physical Networking and will be made available to other projects during the entire run-time of the priority programme.

  • Energy-aware Execution Environments

    (Own Funds)

    Term: since 1. January 2016

    The processing of large amounts of data on distributed execution platforms such as MapReduce or Heron contributes significantly to the energy consumption of today's data centers. The E³ project aims at minimizing the power consumption of such execution environments without sacrificing performance. For this purpose, the project develops means to make execution environments and data-processing platforms energy aware and to enable them to exploit knowledge about applications to dynamically adapt the power consumption of the underlying hardware. To measure and control the energy consumption of hardware units, E³'s energy-aware platforms rely on hardware features provided by modern processors that allow the system software of a server to regulate the server's power usage at runtime by enforcing configurable upper limits. As a key benefit, this approach makes it possible to build data-processing and execution platforms that, on the one hand, save energy during phases in which only low and medium workloads need to be handled and, on the other hand, are still able to offer full processing power during periods of high workloads.

  • Quality-aware Co-Design of Responsive Real-Time Control Systems

    (Own Funds)

    Term: 1. September 2015 - 30. September 2021
    URL: https://www4.cs.fau.de/Research/qronOS/

    A key design goal of safety-critical control systems is the verifiable compliance with a specific quality objective in the sense of the quality of control. Corresponding to these requirements, the underlying real- time operating system has to provide resources and a certain quality of service. However, the relationship between real-time performance and quality of control is nontrivial: First of all, execution load varies considerably with environmental situation and disturbance. Vice versa, the actual execution conditions also have a qualitative influence on the control performance. Typically, substantial overestimations, in particular of the worst-case execution times, have to be made to ensure compliance with the aspired quality of control. This ultimately leads to a significant over-dimension of resources, with the degree disproportionately increasing with the complexity and dynamics of the control system under consideration. Consequently, it is to be expected that pessimistic design patterns and analysis techniques commonly used to date will no longer be viable in the future. Examples of this are complex, adaptive and mixed-critical assistance and autopilot functions in vehicles, where universal guarantees for all driving and environmental conditions are neither useful nor realistic. The issues outlined above can only be solved by an interdisciplinary approach to real-time control systems. This research project emanates from existing knowledge about the design of real-time control systems with soft, firm and hard timing guarantees. The basic assumption is that the control application's performance requirement varies significantly between typical and maximum disturbance and leads to situation-dependent reserves, correspondingly. Consequently, the commonly used pessimistic design and analysis of real-time systems that disregards quality-of- control dynamics is scrutinized. The research objective is the avoidance of pessimism in the design of hard real-time systems for control applications with strict guarantees and thus the resolution of the trade-off between quality-of-control guarantees and good average performance. This proposal pursues a co-design of control application and real-time executive and consists of the following three key aspects: model-based quality-of-control assessment, adaptive and predictive scheduling of control activities, and a hybrid execution model to regain guarantees.

  • Softwareinfrastruktur betriebsmittelbeschränkter vernetzter Systeme (Phase 2)

    (Third Party Funds Group – Sub project)

    Overall project: FOR 1508: Dynamisch adaptierbare Anwendungen zur Fledermausortung mittels eingebetteter kommunizierender Sensorsysteme
    Term: 1. August 2015 - 31. July 2018
    Funding source: DFG / Forschergruppe (FOR)

    Im Kontext der Gesamtvision der Forschergruppe BATS ist es das Ziel des Teilprojekts ARTE (adaptive run-time environment, TP 2) eine flexible Systemsoftwareunterstützung zu entwickeln. Diese soll es ermöglichen, für die Verhaltensbeobachtungen von Fledermäusen (TP 1) verteilte Datenstromanfragen (TP 3) auf einem heterogenen Sensornetzwerk (TP 4), bestehend aus stationären (TP 5) und mobilen (TP 7) Sensornetzwerkknoten, zu etablieren. Eine besondere Herausforderung stellen hierbei die knappen Ressourcen dar, im speziellen Speicher und Energie, sowie die wechselhafte Konnektivität der nur 2 g schweren mobilen Knoten. In Anbetracht dieser vielfältigen und teilweise konfligierenden Anforderungen soll ARTE in Form einer hochkonfigurierbaren Softwareproduktlinie realisiert werden. Ziel ist es, sowohl die unterschiedlichen funktionalen Anforderungen zwischen mobilen und stationären Knoten zu unterstützen, als auch wichtige nichtfunktionale Eigenschaften, wie niedriger Speicherverbrauch und Energieeffizienz. Entsprechend soll schon bei der Entwicklung von ARTE der Konfigurationsraum werkzeuggestützt und gezielt auf nichtfunktionale Eigenschaften untersucht werden, um gemäß der Anforderungen an das Projekt später im Einsatz eine optimierte Auswahl von Implementierungsartefakten zu bieten. Dabei ist explizit die dynamische Anpassbarkeit von Anwendungs- wie auch von Systemfunktionen zu berücksichtigen. Auf funktionaler Ebene wird ARTE Systemdienste in Gestalt einer Middleware bereitstellen, die Anpassung und Erweiterung zur Laufzeit unterstützt und auf Datenstromverarbeitung zugeschnitten ist, um eine ressourceneffiziente und flexible Ausführung von Datenstromanfragen zu ermöglichen.

  • Power-Aware Critical Sections

    (Third Party Funds Single)

    Term: 1. January 2015 - 30. September 2022
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    Race conditions of concurrent processes within a computing system may cause partly inexplicable phenomena or even defective run-time behaviour. Reason are critical sections in non-sequential programs. Solutions for the protection of critical sections generally are facing a multi-dimensional problem space: (1) processor-local interrupts, (2) shared-memory multi/many-core multiprocessors with (2a) coherent or (2b) incoherent caches, (3) distributed-memory systems with global address space, (4) interference with process management of the operating system. Thereby, the protection method makes pessimistic or optimistic assumptions regarding the occurrence of access contention.The number of contending processes depends on the use case and has a large impact on the effectiveness of their coordination at all levels of a computing system. Overhead, scalability, and dedication of the protective function thereby constitute decisive performance-affecting factors. This influencing quantity not only accounts for varying process run-times but also different energy uses. The former results in noise or jitter in the program flow: non-functional properties that are especially problematic for highly parallel or real-time dependent processes. In contrast, the later has economical importance as well as ecological consequences on the one hand and is tangent to the boundary of scalability of many-core processors (dark silicon) on the other hand.Subject to the structural complexity of a critical section and its sensitivity to contention, a trade-off becomes apparent that shall be tackled in the project by means of analytical and constructive measures. Objects of investigation are own special-purpose operating systems, which were designed primarily for the support of parallel and partly also real-time dependent data processing, and Linux. Goal is the provision (a) of a software infrastructure for load-dependent and---by the program sections---self-organized change of protection against crucial race condition of concurrent processes as well as (b) of tools for preparation, characterisation, and capturing of those sections. Hotspots caused by increased process activity and becoming manifested in energy-use and temperature rise shall be avoided or attenuated on demand or anticipatory by a section-specific dispatch policy. The overhead induced by the particular dispatch policy slips in the weighting to dynamic reconfiguration of a critical section for undertaking a change only in case that real practical gain compared to the original solution can be expected. Before-after comparisons based on the investigated operating systems shall demonstrate the effectivity of the approach developed.Race conditions of concurrent processes within a computing system may cause partly inexplicable phenomena or even defective run-time behaviour. Reason are critical sections in non-sequential programs. Solutions for the protection of critical sections generally are facing a multi-dimensional problem space: (1) processor-local interrupts, (2) shared-memory multi/many-core multiprocessors with (2a) coherent or (2b) incoherent caches, (3) distributed-memory systems with global address space, (4) interference with process management of the operating system. Thereby, the protection method makes pessimistic or optimistic assumptions regarding the occurrence of access contention.The number of contending processes depends on the use case and has a large impact on the effectiveness of their coordination at all levels of a computing system. Overhead, scalability, and dedication of the protective function thereby constitute decisive performance-affecting factors. This influencing quantity not only accounts for varying process run-times but also different energy uses. The former results in noise or jitter in the program flow: non-functional properties that are especially problematic for highly parallel or real-time dependent processes. In contrast, the later has economical importance as well as ecological consequences on the one hand and is tangent to the boundary of scalability of many-core processors (dark silicon) on the other hand.Subject to the structural complexity of a critical section and its sensitivity to contention, a trade-off becomes apparent that shall be tackled in the project by means of analytical and constructive measures. Objects of investigation are own special-purpose operating systems, which were designed primarily for the support of parallel and partly also real-time dependent data processing, and Linux. Goal is the provision (a) of a software infrastructure for load-dependent and---by the program sections---self-organized change of protection against crucial race condition of concurrent processes as well as (b) of tools for preparation, characterisation, and capturing of those sections. Hotspots caused by increased process activity and becoming manifested in energy-use and temperature rise shall be avoided or attenuated on demand or anticipatory by a section-specific dispatch policy. The overhead induced by the particular dispatch policy slips in the weighting to dynamic reconfiguration of a critical section for undertaking a change only in case that real practical gain compared to the original solution can be expected. Before-after comparisons based on the investigated operating systems shall demonstrate the effectivity of the approach developed.
  • Configurability Aware Development of Operating Systems

    (Third Party Funds Single)

    Term: since 1. May 2014
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    Todays operating systems (as well as other system software) offer a great deal of static configurability to tailor them with respect to a specific application or hardware platform. Linux 4.2, for instance, provides (via its Kconfig models and tools) more than fifteen thousand configurable features for this purpose. Technically, the implementation of all these features is spread over multiple levels of the software generation process, including the configuration system, build system, C preprocessor, compiler, linker, and more. This enormous variability has become unmanageable in practice; in the case of Linux it already has led to thousands of variability defects within the lifetime of Linux. With this term, we denote bugs and other quality issues related to the implementation of variable features. Variability defects manifest as configuration consistency and configuration coverage issues.

    In the CADOS project, we investigate scalable methods and tools to grasp the variability on every layer within the configuration and implementation space, visualize and analyze it and, if possible, adjust it while maintaining a holistic view on variability.

  • Software-controlled consistency and coherence for many-core processor architectures

    (Third Party Funds Single)

    Term: 1. September 2012 - 31. March 2021
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    The achievable computing capacity of individual processors currently strikes at its technological boundary line. Further improvement in performance is attainable only by the use of many computation cores. While processor architectures of up to 16 cores mark the topical state of the art in commercially available products, here and there systems of 100 computing cores are already obtainable. Architectures exceeding thousand computation cores are to be expected in the future. For better scalability, extremely powerful communication networks are integrated in these processors (network on chip, NoC), so that they combine de facto properties of a distributed system with those of a NUMA system. The extremely low latency and high bandwidth of those networks opens up the possibility to migrate methods of replication and consistency preservation from hardware into operating and run-time systems and, thus, to flexibly counteract notorious problems such as false sharing of memory cells, cache-line thrashing, and bottlenecks in memory bandwidth.

    Therefore, the goal of the project is to firstly design a minimal, event-driven consistency kernel (COKE) for such many-core processors that provides the relevant elementary operations for software-controlled consistency preservation protocols for higher levels. On the basis of this kernel, diverse "consistency machines" will be designed, which facilitate different memory semantics for software- and page-based shared memory.

  • Softwareinfrastruktur betriebsmittelbeschränkter vernetzter Systeme (Phase 1)

    (Third Party Funds Group – Sub project)

    Overall project: FOR 1508: Dynamisch adaptierbare Anwendungen zur Fledermausortung mittels eingebetteter kommunizierender Sensorsysteme
    Term: 1. August 2012 - 31. July 2015
    Funding source: DFG / Forschergruppe (FOR)
  • Efficient Distributed Coordination

    (Own Funds)

    Term: since 1. January 2012
    URL: https://www4.cs.fau.de/Research/EDC/

    Coordination services such as ZooKeeper are essential building blocks of today's data-center infrastructures as they provide processes of distributed applications with means to exchange messages, to perform leader election, to detect machine or process crashes, or to reliably store configuration data. Providing an anchor of trust for their client applications, coordination services have to meet strong requirements regarding stability and performance. Only this way, it is possible to ensure that a coordination service neither is a single point of failure nor becomes the bottleneck of the entire system.

    To address drawbacks of state-of-the-art systems, the EDC project develops approaches that enable coordination services to meet the stability and performance demands. Amongst other things, this includes making these services resilient against both benign and malicious faults, integrating mechanisms for extending the service functionality at runtime in order to minimize communication and synchronization overhead, as well as designing system architectures for effectively and efficiently exploiting the potential of multi-core servers. Although focusing on coordination services, the developed concepts and techniques are expected to also be applicable to other domains, for example, replicated data stores.

  • Aspect-Oriented Real-Time Architecture (Phase 1)

    (Third Party Funds Single)

    Term: 1. August 2011 - 31. August 2016
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    URL: https://www4.cs.fau.de/Research/AORTA/

    A cdentral role bei der Entwicklung von Echtzeitsystemen spielt die verwendete Echtzeitsystemarchitektur, in der sie nämlich Mechanismen widerspiegelt, um kausale und temporale Abhängigkeiten zwischen verschiedenen, gleichzeitigen Aufgaben eines Echtzeitsystems zu implementieren. Zwei gegensätzliche Pole solcher Architekturen stellen zeit- und ereignisgesteuerte Systeme dar. In ersteren werden Abhängigkeiten bevorzugt auf temporale Mechanismen abgebildet: Aufgabenfragmente werden zeitlich so angeordnet, dass beispielsweise gegenseitiger Ausschluss oder Produzenten-Konsumenten-Abhängigkeiten eingehalten werden. In letzteren werden solche Abhängigkeiten mit Hilfe von Synchronisationskonstrukten wie Semaphore oder Schlossvariablen explizit koordiniert. Die Echtzeitsystemarchitektur beeinflusst also die Entwicklung eines Echtzeitsystems auf Ebene der Anwendung und kann dort als stark querschneidende, nicht-funktionale Eigenschaft aufgefasst werden. Diese Eigenschaft beeinflusst darüber hinaus die Implementierung weiterer wichtiger nicht-funktionaler Eigenschaften von Echtzeitsystemen, etwa Redundanz oder Speicherverbrauch. Basierend auf einer geeigneten Repräsentation der kausalen und temporalen Abhängigkeiten auf der Ebene der Anwendung sollen im Rahmen des beantragen Projekts Mechanismen entwickelt werden, um die Echtzeitsystemarchitektur und damit weitere nicht-funktionale Eigenschaften von Echtzeitsystemen gezielt zu beeinflussen.

  • Latency Awareness in Operating Systems

    (Third Party Funds Single)

    Term: 1. May 2011 - 30. April 2020
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    The goal of the LAOS project is to investigate in the efficient use of modern many-core processors on operating system level. Thereby providing low latency operating system services even in high contention cases. 
    Self-made minimal kernels providing thread and interrupt management, as well as synchronization primitives are analyzed with respect to performance and scaling characteristics. These kernels consist of different architectural designs and alternative implementations. Strong focus lies on non-blocking implementations on parts, or if possible, on the whole operating system kernel. Standard Intel x86-64 compatible processors are the main target hardware because of their popularity in high-performance parallel computing, server and desktop systems. After careful analysis, modifications of existing kernels e.g. Linux may be possible that increase the performance in highly parallel systems.

  • Dependability Aspects in Configurable Embedded Operating Systems

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1500: Design and Architectures of Dependable Embedded Systems
    Term: 1. October 2010 - 30. September 2017
    Funding source: DFG / Schwerpunktprogramm (SPP)
    Future hardware designs for embedded systems will exhibit more parallelism at the price of being less reliable. This bears new challenges for system software, especially the operating system, which has to use and provide software measures to compensate for unreliable hardware. However, dependability in this respect is a nonfunctional concern that affects and depends on all parts of the system. Tackling it in a problem-oriented way by the operating system is an open challenge: (1) It is still unclear, which combination of software measures is most beneficial to compensate certain hardware failures – ideally these measures should be understood as a matter of configuration and adaptation. (2) To achieve overall dependability, the implementation of these measures, even though provided by the operating system, cannot be scoped just to the operating-system layer – it inherently crosscuts the whole software stack. (3) To achieve cost-efficiency with respect to hardware and energy, the measures have, furthermore, to be tailored with respect to the actual hardware properties and reliability requirements of the application. We address these challenges for operating-system design by a novel combination of (1) speculative and resource-efficient fault-tolerance techniques, which can (2) flexibly be applied to the operating system and the application by means of aspect-oriented programming, driven by (3) a tool-based (semi-)automatic analysis of the application and operating-system code, resulting in a strictly problem-oriented tailoring of the latter with respect to hardware-fault tolerance.
  • Trustworthy Clouds - Privacy and Resilience for Internet-scale Critical Infrastructure

    (Third Party Funds Group – Sub project)

    Overall project: Trustworthy Clouds - Privacy and Resilience for Internet-scale Critical Infrastructure
    Term: 1. October 2010 - 1. October 2013
    Funding source: EU - 7. RP / Cooperation / Verbundprojekt (CP)
  • Invasive Run-Time Support System (iRTSS) (C01)

    (Third Party Funds Group – Sub project)

    Overall project: TRR 89: Invasive Computing
    Term: 1. July 2010 - 30. June 2022
    Funding source: DFG / Sonderforschungsbereich / Transregio (SFB / TRR)

    Teilprojekt C1 erforscht Systemsoftware für invasiv-parallele Anwendungen. Bereitgestellt werden Methoden, Prinzipien und Abstraktionen zur anwendungsgewahren Erweiterung, Konfigurierung und Anpassung invasiver Rechensysteme durch eine neuartige, hochgradig flexible Betriebssystem-Infrastruktur. Diese wird zur praktischen Anwendung in ein Unix-Wirtssystem integriert. Untersucht werden (1) neue Entwurfs- und Implementierungsansätze nebenläufigkeitsgewahrer Betriebssysteme, (2) neuartige AOP-ähnliche Methoden für die statische und dynamische (Re-)konfigurierung von Betriebssystemen sowie (3) agentenbasierte Ansätze für die skalierbare und flexible Verwaltung von Ressourcen.

  • Security in Invasive Computing Systems (C05)

    (Third Party Funds Group – Sub project)

    Overall project: TRR 89: Invasive Computing
    Term: 1. July 2010 - 30. June 2022
    Funding source: DFG / Sonderforschungsbereich / Transregio (SFB / TRR)
    Untersucht werden Anforderungen und Mechanismen zum Schutz vor böswilligen Angreifern für ressourcengewahre rekonfigurierbare Hardware/Software-Architekturen. Der Fokus liegt auf der Umsetzung von Informationsflusskontrolle mittels Isolationsmechanismen auf Anwendungs-, Betriebssystems- und Hardwareebene. Ziel der Untersuchungen sind Erkenntnisse über die Wechselwirkungen zwischen Sicherheit und Vorhersagbarkeit kritischer Eigenschaften eines invasiven Rechensystems.
  • Adaptive Responsive Embedded Systems (ESI 2)

    (Third Party Funds Group – Sub project)

    Overall project: ESI-Anwendungszentrum für die digitale Automatisierung, den digitalen Sport und die Automobilsensorik der Zukunft
    Term: 1. January 2010 - 31. December 2018
    Funding source: Bayerisches Staatsministerium für Wirtschaft und Medien, Energie und Technologie (StMWIVT) (ab 10/2013)
    URL: https://www4.cs.fau.de/Research/ARES/
  • Resource-Efficient Fault and Intrusion Tolerance

    (Third Party Funds Single)

    Term: 1. October 2009 - 30. September 2022
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    URL: https://www4.cs.fau.de/Research/REFIT/

    Internet-based services play a central role in today's society. With such services progressively taking over from traditional infrastructures, their complexity steadily increases. On the downside, this leads to more and more faults occurring. As improving software-engineering techniques alone will not do the job, systems have to be prepared to tolerate faults and intrusions.

    REFIT investigates how systems can provide fault and intrusion tolerance in a resource-efficient manner. The key technology to achieve this goal is virtualization, as it enables multiple service instances to run in isolation on the same physical host. Server consolidation through virtualization not only saves resources in comparison to traditional replication, but also opens up new possibilities to apply optimizations (e.g., deterministic multi-threading).

    Resource efficiency and performance of the REFIT prototype are evaluated using a web-based multi-tier architecture, and the results are compared to non-replicated and traditionally-replicated scenarios. Furthermore, REFIT develops an infrastructure that supports the practical integration and operation of fault and intrusion-tolerant services; for example, in the context of cloud computing.

  • Platform for evaluation and education of embedded and safety-critical system software

    (Third Party Funds Single)

    Term: 1. October 2007 - 31. July 2014
    Funding source: Siemens AG

    The project originally started in the context of the CoSa project, where it is intended to be deployed as a creditable demonstrator for safety-critical mission scenarios. During the development of the I4Copter prototype, it turned out to be more of a challenge than initially expected, both in terms of complexity and applicability. The software required for flight control, navigation and communication is a comprehensive and demanding application for the underlying system software. That is why it has emerged as a demonstrative showcase, addressing various aspects of system software research. This way, other research projects, such as CiAO, also benefit from this platform. Beyond the domain of computer science, the development of a quadrotor helicopter also includes challenges in the areas of engineering, manufacturing and automatic control. That is why I4Copter is now an interdisciplinary project with partners in other sciences. It is therefore an ideal platform for students, especially those of combined study programs (e.g. Mechatronics or Computational Engineering), showing the need for cross-domain education.

CV

Jürgen Kleinöder is CIO of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany and Senior Academic Director at the Department of Computer Science 4 (Chair for Distributed Systems and Operating Systems). He is  Deputy Managing Director of the Department of Computer Science and Managing Director of the Transregional Collaborative Research Center “Invasive Computing” (SFB/TRR 89).

He completed his Master’s Degree (Diplom-Informatiker) in 1987 and his Ph.D. (Dr.-Ing.) in 1992 at the University of Erlangen. Between 1986 and 1989 he worked on UNIX operating-system support for multiprocessor architectures. From 1988 to 1991 he was member of the project groups for the foundation of a Bavarian University Network and the German IP network.

He is currently interested in all aspects of distributed object-oriented operating-system architectures; particularly in concepts for application-specific adaptable operating-system and run-time-system software (middleware).

He is member of the ACM, Eurosys and the German GI. From 2001 – 2008 he was chair of the special interest group “operating systems” in the GI.

Friedrich-Alexander-Universität Erlangen-Nürnberg