Record Details

Evaluation of Level of Confidence and Optimization of Roll-back Recovery with Checkpointing for Real-Time Systems

DSpace at IIT Bombay

View Archive Info
 
 
Field Value
 
Title Evaluation of Level of Confidence and Optimization of Roll-back Recovery with Checkpointing for Real-Time Systems
 
Creator NIKOLOV, D
INGELSSON, U
SINGH, V
LARSSON, E
 
Subject RELIABILITY
CHALLENGES
ALGORITHMS
PLACEMENT
SCHEMES
 
Description Increasing soft error rates for semiconductor devices manufactured in later technologies enforce the usage of fault tolerant techniques such as Roll-back Recovery with Checkpointing (RRC). As RRC introduces time overhead that increases the completion (execution) time, time constraints (deadlines) might be violated. This is a drawback for a class of computer systems where the correct operation is defined not only by providing the correct outcome of an operation but also by ensuring that the deadlines are met. These computer systems are referred to as real-time systems (RTSs). In general RTSs are classified as soft and hard RTSs depending on the consequences of violating the deadlines. For soft RTSs, where consequences of violating the deadlines are not very severe, research have focused on optimizing RRC and shown that it is possible to find the optimal number of checkpoints such that the average execution time (AET) is minimal. While minimal AET is important for soft RTSs, it is more important to provide a high probability that deadlines are met for hard RTSs, where consequences of violating the deadlines may be catastrophic. Hence, there is a need of probabilistic guarantees that jobs employing RRC complete before a given deadline. Traditionally, AET analysis have been used for soft RTSs and worst case execution time (WCET) analysis along with schedule feasibility have been used for hard RTSs. In this paper we introduce a reliability metric, Level of Confidence (LoC), which is equally applicable to both soft and hard RTS. LoC is used as a metric to evaluate to what extent a deadline is met. The main contributions of this paper are as follows. First, we present a mathematical framework for the evaluation of LoC when RRC is employed. Second, we provide a proof to verify the correctness of the proposed expression. Third, in the context of hard RTSs, we provide a method to obtain the optimal number of checkpoints that maximizes the LoC. Fourth, in the context of soft RTSs where the maximal LoC may not be needed, but instead some LoC requirement is needed, we present an optimization method for RRC that finds the number of checkpoints that results in the minimal completion time while the minimal completion time satisfies a given LoC requirement. Fifth, we use the proposed framework to evaluate and compare probabilistic guarantees when RRC is optimized towards soft RTSs. (C) 2014 Elsevier Ltd. All rights reserved.
 
Publisher PERGAMON-ELSEVIER SCIENCE LTD
 
Date 2014-12-29T05:07:31Z
2014-12-29T05:07:31Z
2014
 
Type Article
 
Identifier MICROELECTRONICS RELIABILITY, 54(5)1022-1049
0026-2714
http://dx.doi.org/10.1016/j.microrel.2014.02.004
http://dspace.library.iitb.ac.in/jspui/handle/100/17153
 
Language English