REDOS Note
Residual Energy Dependent NVRAM-based Operation Shutdown
Problem statement
NVRAM-based software depends on a reliable system state backup to be able to tolerate sudden, unpredictable exceptions that possibly cause the loss of highly sensitive volatile information kept in hardware registers and caches. Thus, fail-safe guarantees are now required from the system, since power failures when writing to the NVRAM, for example, can lead to control flows that unexpectedly convert a sequential process into a non-sequential one. The consequences for processes that happen under the assumption of sequential execution can be severe, as illustrated by the following program example:
typedef struct chain {
struct chain *link;
} chain_t;
typedef struct queue {
chain_t head;
chain_t *tail;
} queue_t; /* init: head = NULL; tail = &head */
void enqueue(queue_t *this, chain_t *item) {
item->link = 0;
this->tail->link = item;
this->tail = item;
}
The shown – especially for system software not untypical – simple C code assumes a sequential execution of the enqueue procedure.
Assume further that after appending the element to the end of the queue
(this->tail->link = item
)
but before updating the tail pointer
(this->tail = item
),
a power failure occurs that aborts the process in question. If the queue data structure is in NVRAM, this exception leaves an inconsistent data state: the new element has been appended but not yet made persistent as the last entry in the queue.
The next enqueue operation after restart will overwrite this element entry, as the tail pointer was not yet updated during the previous (interrupted) enqueue.
Solution approach
Such consistency problems can be prevented – in a functionally transparent way for the program parts affected by such types of race conditions – by an operating system using an appropriate event-based, sporadically triggered checkpointing mechanism built into its exception handling subsystem. The idea is to preempt the interrupted process from its volatile environment to NVRAM at the moment of a power failure and resume it at exactly this preemption point when the system is restarted:
- The exception handler takes care of saving the processor state to NVRAM and then shuts down the system.
- The bootstrap loader then restores the processor state saved in NVRAM at the appropriate time (i.e. when the system is restarted) and thus continues the interrupting process.
NVM-only operation
REDOS assumes that all programs are executed from the NVRAM, only. This concerns the machine programs as well as the operating system itself that runs them. Consequently, the volatile environment of the interrupted process is formed by the data of the processor contained in the registers and the cache. Under this precondition and to save the processor state to NVRAM, a trap or power failure interrupt (PFI) results in a micro-checkpoint request that is handled with strict time guarantee in the operating system. The specified residual energy window as a characteristic feature of the power-supply unit (PSU) determines the upper time limit for this procedure, the worst-case execution time (WCET) of which must never exceed it. In program areas where this mechanism cannot be used, particularly for the backup procedure itself, transactional programming comes into play.
Power failure
The hardware requirements for a power failure exception are anything but new, they were implemented in computers as early as the 1970s. Here an excerpt from corresponding processor manuals of this era (PDP11):
power-failure trap; if the AC power falls below 95 volts or outside 47 to 63 hertz, about two milliseconds are still allowed for power-failure handling.This standard technology, which seems to have gone out of fashion, is now available in microcontrollers, for example, with an integrated processor companion or, especially for servers, indirectly given with a USB-based PSU. However, the detection of an impending power failure during program execution is no longer an inherent processor feature in these cases: it does not occur synchronously (trap) in the CPU, but asynchronously (interrupt) in the periphery. Moreover, the interrupt request sent as PFI from the periphery is usually maskable (IRQ), in contrast to the synchronous case. Even in the case of a non-maskable interrupt (NMI), a loss of the hardware signal sent by the periphery cannot be ruled out. This is because, due to electrical or electromagnetic effects, an interfering signal (glitch) on the interrupt line can not only cause a spurious interrupt, but also make a real interrupt appear imperceptible by the PIC (programmable interrupt controller) or the CPU. A PFI as a spurious interrupt is tolerable, but a PFI ineffective as a lost interrupt is problematic.
Given this, a main problem are NMI nesting and IRQ-blocked critical sections: both endanger the timely handling of a possible checkpoint request. Respective sections need to be localised (e.g. using static program analysis) and then rearranged (based on program transformation tools, if necessary) so that interrupt latencies that are too long are prevented, that is, critical NMI interleavings are resolved or IRQ locks are either eliminated or at least removed again in good time.
If a failure of the PFI-dependent checkpoint mechanism must be assumed, provisions must be made in the operating system to either detect and clean up an inconsistently left global system state at restart or to execute the entire software (i.e., operating system and machine programs) transactionally. The latter requires a transactional form of programming very similar to non-blocking synchronization throughout – however, the associated software redesign effort generally prohibits such a disruptive approach for common general-purpose operating systems, but not necessarily for particularly sensitive locations in individual system functions (e.g., checkpoint protection) or for (small) special-purpose operating systems.
Preliminary work
Neverlast: Towards the design and implementation of the NVM-based everlasting operating system
54th Annual Hawaii International Conference on System Sciences, HICSS 2021 (Virtual, 4. January 2021 - 8. January 2021)
In: Tung X. Bui (ed.): Proceedings of the Annual Hawaii International Conference on System Sciences 2021
URL: http://hdl.handle.net/10125/71491 , , , , , :
Neverlast: An NVM-centric Operating System for Persistent Edge Systems
12th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys)
In: APSYS '21: PROCEEDINGS OF THE 12TH ACM SIGOPS ASIA-PACIFIC WORKSHOP ON SYSTEMS, NEW YORK: 2021
DOI: 10.1145/3476886.3477513 , , , , , :