Administering virtual disks

Disk arrays and data striping

A disk array organizes multiple independent disks into one large, high performance disk. In addition, some array configurations write backup parity information at the same time. (This renders them fault-tolerant, in that they can continue working even if a disk fails while in use.) Data blocks are split up and written in parallel to the disks, which speeds access.

The length of time taken to execute a read or write on a disk is determined by the time taken for the data area on the disk surface to pass under the read/write heads of the drive. Reading or writing an 8KB block takes eight times as long as a 1KB block. However, if the 8KB block is written to a disk array with eight disks, the data is split into eight stripes of 1KB, which are written to individual disks in parallel. In this way, disk arrays achieve a higher data transfer rate than non-parallel drives.

In practice, the expected linear scaling of throughput from using multiple disks is not achieved. This is because of seek-time, on-board disk caches and parity generation.

Data striping also results in uniform load balancing across all the disks on a system, eliminating disk hot spots. These arise when one disk is saturated with I/O requests while the rest lie idle.

However, when multiple disks are organized into arrays, the potential for data loss from disk failures is higher because the probability of a disk failure occurring in a given period is higher. For example, for a disk drive with a rated mean time before failure of 100,000 hours, there is a 50% probability of the drive failing in that length of time. However, for a disk array of 10 such drives, there is a 50% probability that one of them will fail in 10,000 hours (just over a year of 24-hour operation).