Tuesday, February 12, 2008

RAID Striping Algorithms: The Missing Link

In the previous post we learned about RAID configurations and how parity works. The last piece of the puzzle is striping algorithms. Striping algorithms or schemes refer to how stripes are mapped on the disk array. Being a hands on person, I like to drill down to the bit how things work. This has helped me solve many problems with RAID systems. Striping schemes are complex layouts involving 2 layers, the physical stripes with parity and the logical striping where the data is mapped.

Our first step in working on a RAID recovery is to review each disk's raw data to determine stripe size and orientation.

Stripe size was mentioned previously but to review, Stripes are a collection of sectors. Each stripe contains a number sectors equal to a power of 2. The smallest stripe size we have see is 16 sectors (2^4). The most common size is 128 sectors (2^7) . In my real life example below, the stripe size was 2048 sectors (2^11).

Stripe mapping can be very confusing. In an effort to lessen the confusion, I will talk about 3 different views of the same stripe, the disk view, the physical volume view and the logical volume view. The physical volume and logical volume correspond to the 2 layers mentioned above.

The physical stripes reside on the disks in order. I will use a simplified annotation scheme based on stripe number and disk number, for example; S1D0 = first stripe on disk 0 (128 sectors from 0 - 127), S2D0 = next stripe on disk 0 (128 sectors from 128 - 255), etc. For the physical RAID volume representation I will use the letters 'PV'. So, the first stripe will be PV1, the second will be PV2 and so on. The last view of the data is the logical view. This is what the operating system sees. I will refer to the first stripe of the logical volume as LV1, LV2, etc. For clarity, I will use the term, stripe set, to refer to a parity stripe and its contiguous stripes from each drive (S1D0, S1D1, S1D2, S1D3, etc).

In RAID 0 and RAID 0+1 stripes are mapped contiguously in order of the drives. The mapping for these arrays is simple. In the RAID 0 example at the right, A1, A2, etc are the same as my volume representation (PV) above. Just substitute PV for A. A1 is the first stripe and resides on Disk 0, A2 is the second stripe and resides on the first stripe, A3 (the second stripe on Disk 0) is the third stripe, etc. There are no parity stripes to worry about.

The most interesting stripe map is found in RAID 5 Arrays. RAID 5 was originally defined in 1988 in a paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference. The new idea in this paper was for one stripe out of each stripe set to be used for parity in a group of striped drives while distributing the parity stripe over all the drives to minimize the impact on performance. This concept of data storage and redundancy revolutionized the way people looked at data storage. The concept of distributed data with redundancy has moved out of simple disk management and is used as the basis for distributed data in global networks. Newer schemes for redundancy in large arrays such as RAID 6 designate 2 stripes for holding parity to address the vulnerability of a RAID array to data loss during rebuild of a failed drive. We will focus on RAID 5 for the purposes of this post.

RAID 5 sets aside one physical stripe for parity for each stripe set. The parity stripe alternates from drive to drive. What determines where the parity stripe resides is based on 2 standard schemes, reverse parity and forward parity. In the image at the left is an example of reverse parity. In reverse parity, the first parity stripe is on the last drive (Disk 3) and is designated A(p) in the image. In my notation this would be represented as S1D3 for the first stripe of Disk 3. It is the 4th stripe of the RAID volume (PV4) and isn't included in the logical volume at all since it is a parity stripe. In reverse parity the first stripe of the first drive (S1D0) is also the first stripe of the RAID Volume (PV1) and the first stripe of the Logical Volume (LV1). This scheme makes it easy conceptually to see the logical volume as this information is where it should be as on a single drive.

To continue the second parity stripe is on Disk 2 and is designated B(p) in the example (in my notation it would be S2D2). The mapping of data stripes are simply in order from Disk 0 to Disk 3 skipping the parity stripe. I will discuss different data mapping schemes in a future post. In this scheme, the first drive holds the first stripe of the volume (LV1). It is easy conceptually as the first drive looks like it is really the first part of the volume as it holds the master boot record (MBR) and often the bios parameter block (BPB). The second stripe (LV2) is on second drive (S1D1), etc. Note that the number of stripes in the logical volume is less than the physical volume. The difference is simply (n - 1) * the total number of stripes where n = the total number of drives in the array.

In the forward parity scheme, the first parity stripe resides on the first stripe on the first drive (S1D0) in the array. The second parity stripe resides on the second stripe on the second drive (S2D1). The third parity stripe resides on the third stripe on the third drive (S3D2). This continues until the last drive in the array. Then, this scheme starts over on the first drive. The first data stripe of the volume is first stripe on the second hard disk (A2 in the image, S1D1 in my notation). It contains the MBR and most likely, the BPB, depending on the size of the stripe.

I use the term logical volume to describe the actual look of the volume to the operating system. The RAID array is usually completely hidden from the operating system and is seen by the OS as a regular disk volume like a single hard drive. I wanted to return to this representation to show how the size of the resulting volume is calculated. Using the formula above for determining the number of stripes works equally well for determining the size. The size of the RAID's logical volume is equal to the number of drives minus 1 times the size of the smallest drive or in mathematical terms :

LV = (N - 1) * S

Where LV is the size of the logical volume, N is the number of hard drives and S is the size of the smallest drive.

There are many other parity striping schemes including proprietary schemes. Compaq created one that had me stumped for a while. Compaq, when they took over HP's Surestore division, created a parity mapping scheme that spanned several data stripes. Instead of have one parity stripe for each physical stripe (PV), they used 16 contiguous stripes on a drive for parity. the data stripes remained 128 sectors. Thus, for each of those 16 stripes it looks like RAID 4.

Using a standard algorithm to unstripe this set yielded a strange result. It resembled a reverse parity misconfiguration when unstriped with a forward parity algorithm and a forward parity misconfiguration when unstriped with a reverse parity algorithm. When I looked at the file system with a data recovery tool, many files would be visible but more than half wouldn't open. On inspection of the unstriped volume, parity stripes were evident in the the place that data should be. After careful review of each drive I could see that the data was striped at 128 sectors but the parity was striped at 2048 sectors (16 x 128). I was able to rewrite my unstripe routine to account for this difference and get the data back.

In my next post I will address data mapping schemes.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home