Record Details

Reliability Modelling Of Whole RAID Storage Subsystems

Electronic Theses of Indian Institute of Science

View Archive Info
 
 
Field Value
 
Title Reliability Modelling Of Whole RAID Storage Subsystems
 
Creator Karmakar, Prasenjit
 
Subject Computer Storage System
Redundant Array of Independent Disks (RAID) Storage Systems
Disk Reliability Modelling
PRISM Model
Disk Subsystems - Modelling
RAID Storage Systems - Modelling
Disk Failure Model
Disk Failure
Computer Science
 
Description Reliability modelling of RAID storage systems with its various components such as RAID controllers, enclosures, expanders, interconnects and disks is important from a storage system designer's point of view. A model that can express all the failure characteristics of the whole RAID storage system can be used to evaluate design choices, perform cost reliability trade-offs and conduct sensitivity analyses.

We present a reliability model for RAID storage systems where we try to model all the components as accurately as possible. We use several state-space reduction techniques, such as aggregating all in-series components and hierarchical decomposition, to reduce the size of our model. To automate computation of reliability, we use the PRISM model checker as a CTMC solver where appropriate.

Initially, we assume a simple 3-state disk reliability model with independent disk failures. Later, we assume a Weibull model for the disks; we also consider a correlated disk failure model to check correspondence with the field data available. For all other components in the system, we assume exponential failure distribution. To use the CTMC solver, we approximate the Weibull distribution for a disk using sum of exponentials and we first confirm that this model gives results that are in reasonably good agreement with those from the sequential Monte Carlo simulation methods for RAID disk subsystems.

Next, our model for whole RAID storage systems (that includes, for example, disks, expanders, enclosures) uses Weibull distributions and, where appropriate, correlated failure modes for disks, and exponential distributions with independent failure modes for all other components. Since the CTMC solver cannot handle the size of the resulting models, we solve such models using hierarchical decomposition technique. We are able to model fairly large configurations with upto 600 disks using this model.

We can use such reasonably complete models to conduct several "what-if" analyses for many RAID storage systems of interest. Our results show that, depending on the configuration, spanning a RAID group across enclosures may increase or decrease reliability. Another key finding from our model results is that redundancy mechanisms such as multipathing is beneficial only if a single failure of some other component does not cause data inaccessibility of a whole RAID group.
 
Contributor Gopinath, K
 
Date 2014-06-05T10:18:31Z
2014-06-05T10:18:31Z
2014-06-05
2012-04
 
Type Thesis
 
Identifier http://etd.iisc.ernet.in/handle/2005/2323
http://etd.ncsi.iisc.ernet.in/abstracts/2987/G25277-Abs.pdf
 
Language en_US
 
Relation G25277