Friday, May 2, 2008

Reviving a Maxattach NAS 4300: Part II

When I started writing about the NAS 4300, I thought it would be relatively short and would fit in one post. It looks like this post will expand to at least 3 articles and may become 4. The presentation of the issues requires more than I like to publish in one article. In addition, there were several issues with the free NAS software. I will address them as best I can.

Selecting the Replacement Drives and other Considerations.

To remind the reader, this box has an unusual feature of an onboard SCSI controller. This creates some unique opportunities to improve on the original setup. There is even an external SCSI connector that can be connected to a server rack like the one below it in my image.

The possibilities are:

  • 1. Replace all or some of the drives with SCSI drives and improve the performance

  • 2. Replace all the drives with ATAPI hard disk drives

  • 3. Install a SATA (4) drive SATA controller and 4 SATA hard disk drives

  • 4. Install a hardware RAID controller and matching drives (SCSI or SATA)

  • It is interesting that these are all available inside what is really a single purpose device. My choice was governed by what was available in my lab. I found 3 WD2000JB 200 GB drives. Along with the SCSI drive previously mentioned, a spare 18 GB SCSI drive, to fill out the box

    If I wanted to fill out the box maximizing cost/benefit, I would chose 4 large ATAPI drives, 500 GB to 750 GB. An interesting point of this box is that you could conceivably add a drive rack like the one in the first image of Part 1 and connect it to the external SCSI port. I believe in recycling old equipment.

    Choosing the NAS software and Operating System

    This became a real challenge. Centos which is my favorite linux distro for servers didn't install on the NAS. It turned out after attempting several installs of Centos and other NAS systems that one of the onboard memory modules was bad. The NAS comes with (3) 128 MB memory modules. I removed the bad one and was left with 256 MB. This is quite sufficient for a non-windows NAS.

    Centos is a full implementation of the Red Hat Enterprise OS and is my first choice for servers. It is well maintained and easy to install. My only gripe is that with the release of version 5, they have removed the option to install everything. As a data recovery shop we need everything including file system support for NTFS and Mac HPF. The good news is that Centos 4 installed without a problem.

    I spent way too much time testing installs for this post. I tested 3 opensource NAS distros, Openfiler, FreeNAS and SMEServer. Full reviews of each are beyond the scope of this post. This post was never meant to be a full analysis of NAS software. However, I will provide what little insight I could into each.

    Openfiler

    This is a full featured NAS set up and management system. It is geared to large companies utilizing many servers. It allows you to manage all the servers from a single location. It is complicated and the documentation is limited. They sell a version of the "Administrator Guide" but I found it online in pdf here.

    It installed without a problem. However, it needs a primary domain controller to complete the setup. I abandoned this one after getting frustrated by it.

    FreeNAS

    FreeNAS is based on the FreeBSD OS. It has a tiny foot print of 32 MB but is a fully functional file server managed with a web interface. It was easy to install taking just a few minutes. However, it has one big drawback. It has marginal support for multiple drives. I was dissuaded from using it by number of problems others were having with the software RAID support. If you don't need software RAID is is a good choice for a simple and easy to manage NAS.

    SME Server

    I came across this implementation several years ago. It was recommended by a computer consultant who I had done some work for. It is a modified linux distro. He gave me a copy to test. It was impressive at that time for its simplicity. In its current incarnation, it is based on an older release of Centos (4.6) then I am currently using. Its major drawback is its preset configuration. It will automatically set up your drives in RAID 5 if you have 4 similar sized drives. Otherwise, you have to build the RAID set up yourself using LVM.

    Setting up the NAS

    So, as it happens you walk up to the NAS with your newly created CD and remember that the NAS doesn't come with a CD and a standard ATAPI CD drive isn't bootable. (Did I forget to mention this?) I searched for an old SCSI based CD drive and found several in my collection. Unfortunately, only one worked. You can find them available on the web but beware that as you need one that uses the old 50 pin ribbon cable connectors or the newer 68 pin connectors. You will also need the cable. I used a 68 pin cable with an adapter to a 50 pin external to a 50 pin pin internal. It took 3 different connectors. I found this internal HP drive sold on a mac site. Please note that I don't have any connection with any site recommended here. However, for $13 it seems an excellent buy. You will also need an additional 'Y' power splitter for the CD drive. I have them lying around but can be obtained from many computer parts suppliers and online.

    My 4300 NAS boots up with an option to load a boot menu by pressing F1. This brings up a fairly complicate menu with options for booting and resetting the NAS to default settings. It also allows you to fail a particular drive. You are required to set the boot option and then exit forcing a reboot. Please make sure that both the SCSI CD and SCSI hard disk are displayed in the SCSI adapter start up. The Adaptec bios will display the drive and SCSI ID.

    Installation of FreeNAS

    Installation was relatively simple although there were a few moments that could have been clearer. First download the ISO image from the FreeNAS website. Burn the ISO onto a CD using your favorite burning software. I used Nero as it comes free with my burner.

    Now, you are ready to insert the FreeNAS boot disk. To review. You have the SCSI CD drive, SCSI hard disk and the (3) IDE drives properly configured. I set the IDE drives to Primary Master and Slave and Secondary Master.

    When the system reboots after setting the boot option to CD drive, the FreeNAS CD runs without supervision. It will boot up a fully functional NAS system. However, our goal is to load the FreeNAS OS/software onto the SCSI hard disk. Here is one of the confusing parts. FreeNAS displays just its splash screen after booting. Hitting the space bar gets you to its menu management system. The last option is the install option to hard disk or flash memory. (Note the console allows you to set up the network interfaces and some other options. I will try to cover NAS setup in a future post.) The FreeNAS wiki is an excellent source for installation and configuration.

    During the installation you are given several options. I chose "Full install with data partition", followed by choosing the SCSI drive (da0) for the installation. It will complete the installation in just a few minutes. You will have to reset the bios to boot from the SCSI disk. (Note: that you can only boot from a hard disk configure as SCSI ID# 0.)

    After completing the installation, I discovered that FreeNAS doesn't deal well with software RAID. I found after researching this issue on the web that many users were dissatisfied with FreeNAS because of the software RAID problems. I ended up abandoning FreeNAS as a result.

    I returned to using the Centos Distro. I was able to create a complete NAS with many additional features using Centos. It just isn't as simple as FreeNAS. I will discuss this further in my next post.

    Labels:

    Thursday, March 13, 2008

    Reviving a MaxAttach NAS 4300: Part 1

    Residual Value

    For this next entry I wanted to veer away from theory and practice to the experimental. From time to time broken equipment gets abandoned in my lab. I have a collection of very old and outdated servers. I always wonder if any of them is worth repairing for their residual value. I have repaired several old external USB and firewire drives but usually as an adjunct to facilitate the return of data to clients.

    A Fortuitous Find


    While cleaning up the lab I came across an old MaxAttach 4300. It wasn't particularly old. It came to me with 3 failed drives and a shutdown problem. It was out of warranty and the client didn't want it back. This model came equipped with (4) Maxtor 160 GB drives. Please excuse the mess around the unit in the image above, I was conducting tests and put it in the middle of my work area.

    The MaxAttach series is self-healing. It uses RAID 1 on the system partition and RAID 5 on the data partition. So, if any drive fails it continues to behave and produce data as needed. It will send a warning to an alert IT manager and flash warnings on an attached monitor. However, if a second drive fails, it drops off line and refuses to boot.

    The NAS had the unique problem of having had 3 drives fail along with an issue of randomly crashing. For the data recovery I was able to force all 4 drives to read to generate reasonable mirrors with only minimal failed sector reads. Each of the 3 drives suffered minor head crashes which left marks on the drives. By judiciously combining the drives and selecting stripes to eliminate the bad areas, I was able to recover all the data.

    During my testing for this project, I discovered the cause of the shutdown problem. It had a faulty fan that worked intermittantly. Replacing the fan solved the random crashing problem. Luckily, the rest of the mother board was fine. The operating system was completely corrupted. So a fresh install of an OS would be required.

    Inside the box


    The MaxAttach 4300 series was a big upgrade from the 4100 series. The mother board included 2 network ports, a video port, a serial port, a keyboard port, a mouse port, and most surprisingly, was the inclusion of an adaptec 7899W chip on the motherboard.

    This was no ordinary board. There are 3 additional pci slots available. In the factory configuration one pci slot was taken by a third network port. The processor was a standard Pentium 4 with supporting intel chipset.

    Picking the OS

    I was worried about support for different OS's on this system because of the proprietary chip set. Automatic installs would have a difficult time dealing this system. I debated trying to repair the Win2K OS that came with the NAS. Windows is a pain to repair. It has too many interconnected processes. There were other issues I had to address such as how to install the OS without a CD/DVD reader. My criteria for choosing the OS was simplicity and price. I wanted something that would provide the same basic functions that this NAS came with but without the cost of buying a new server OS. So, I narrowed the choices to Linux or FreeBSD.

    SCSI! A Great Interface.

    I attached a monitor with keyboard to the back of the NAS. Then, I attached a CD drive to one of the drive connects with a bootable CD. On power up it showed in the CMOS however, the on the boot screen it was conspicuously missing. On the other hand I was surprised to see that the modified bios included boot entries for both a SCSI CD and SCSI hard disk. This changed my whole perception on how to refurbish the NAS.

    It was now clear to me that I needed to take advantage of the SCSI interface. I located all my adapters, converters and SCSI tools. I located an old SCSI CD and 18 GB SCSI drive. The SCSI CD used a 50 pin connector. I had to find the right setup to convert the 68 pin connection to 50 pin. The SCSI drive was easy as I had a 68 pin version. Now, I could select the boot to be the SCSI CD. I did make some modifications to the original design. My original design included 4 200 GB Western Digital drives. I chose this drive model based on availability and price. Since, completing the repair of the NAS, I can now source 500 GB drives for under $100. That would give 1 Terabyte of capacity. Wow! In my final setup I included 1 designated boot drive, the SCSI 18 GB, and 3 data drives, 200 GB for a 400GB server with RAID 5 redunancy.

    Free NAS Operating Systems

    In my search on the web I came across several options. The first was Centos Linux. Centos Linux is actually my favorite among the linux enterprise distros. The advantage of Centos was that it included all the features I needed. The disadvantage was that it would need a web based NAS manager. While available it would have taken some work to install and configure. I found 2 other interesting possibilities, Openfiler and FreeNAS.

    I thought that I would be able to complete this article in one post. I had to break it up into 3 parts or would have been too long. The next part will address the remaining hardware issues and the final part will address installation of the NAS software.

    Monday, February 25, 2008

    RAID Striping Algorithms: The Missing Link II

    In the prior post I explained how physical stripes including parity are laid out in a RAID 5 configuration. I introduced data mapping as a separate layer from striping.

    To continue... while the physical striping is ordered and easy to understand, the data mapping layer or logical stripes aren't always laid out in a simple contiguous pattern. When I first started writing my unstriping tools, this confounded me no end.


    To understand the differences we need to return to my simplified annotation of the 3 points of view of the stripes. We can use the model at the right to help. The first stripe represented has reverse parity with the data contiguously laid out from Disk 0 to Disk 2. The parity stripe is on Disk 3. A1 corresponds to S1D0 and PV1 as well as LV1. The second level is well ordered simply eliminating the parity stripe from the logical volume.

    So, you ask, "Where is all the complication?" The difficulty comes when you reach the parity stripe. In a simple schema you expect the next logical stripe (LV4) to be B1 (S2D0: 128 - 255 sectors) which would also correspond to PV5. However, this isn't what always happens. A common striping algorithm puts the next stripe (LV4) on the second stripe of the last drive (S2D3) as if it jumps up to the next level. Then, the second stripe on the first drive (S2D0) becomes LV5 and the second stripe on the second drive (S2D1) becomes LV6. It doesn't seem to make sense adding an unnecessary complication.

    In the Compaq schema from my previous post it was even more complex. Compaq created an array using a factor of 16:1 parity stripes to data stripes. For each 16 data stripes on the data drives one drive was designated as the parity drive. Then, it would switch in reverse parity fashion to the next drive for 16 data stripes on the other drives. To make things even more complicated, they still rotated the data stripes on the other drives. It wasn't just a simple big stripe of 2048 (16 * 128) sectors which was my first attempt.

    There is no substitution for spending the time and doing a good analysis of the stripes. After making several unstripe attempts adjusting parameters and testing the outcome on that Compaq recovery, I came close to recoverying data. However, I would get some files back but not others. On inspection of the rebuilt volume there were "holes" in the file system. The array contained 14 drives and each needed to be inspected. The failed drives had both been successfully recovered. So, there was no excuse for missing data. To check continuity, I look for easy to read files, usually text files. Long MS Word documents also contain sufficient text data. I want to find files that cross stripe boundaries. When an easily readable file crosses a stripe boundary, I can connect the stripes. This helps me to create a map of the array laying out all the stripes in both logical order and physical order. Using the map I am able to construct the algorithm for unstriping.

    It would be too long to go into further detail. Unstriping tools are a book unto themselves and go beyond the scope of this blog.

    Next I will take a look at some of the NAS devices we have recovered and their issues.

    Tuesday, February 12, 2008

    RAID Striping Algorithms: The Missing Link

    In the previous post we learned about RAID configurations and how parity works. The last piece of the puzzle is striping algorithms. Striping algorithms or schemes refer to how stripes are mapped on the disk array. Being a hands on person, I like to drill down to the bit how things work. This has helped me solve many problems with RAID systems. Striping schemes are complex layouts involving 2 layers, the physical stripes with parity and the logical striping where the data is mapped.

    Our first step in working on a RAID recovery is to review each disk's raw data to determine stripe size and orientation.

    Stripe size was mentioned previously but to review, Stripes are a collection of sectors. Each stripe contains a number sectors equal to a power of 2. The smallest stripe size we have see is 16 sectors (2^4). The most common size is 128 sectors (2^7) . In my real life example below, the stripe size was 2048 sectors (2^11).

    Stripe mapping can be very confusing. In an effort to lessen the confusion, I will talk about 3 different views of the same stripe, the disk view, the physical volume view and the logical volume view. The physical volume and logical volume correspond to the 2 layers mentioned above.

    The physical stripes reside on the disks in order. I will use a simplified annotation scheme based on stripe number and disk number, for example; S1D0 = first stripe on disk 0 (128 sectors from 0 - 127), S2D0 = next stripe on disk 0 (128 sectors from 128 - 255), etc. For the physical RAID volume representation I will use the letters 'PV'. So, the first stripe will be PV1, the second will be PV2 and so on. The last view of the data is the logical view. This is what the operating system sees. I will refer to the first stripe of the logical volume as LV1, LV2, etc. For clarity, I will use the term, stripe set, to refer to a parity stripe and its contiguous stripes from each drive (S1D0, S1D1, S1D2, S1D3, etc).

    In RAID 0 and RAID 0+1 stripes are mapped contiguously in order of the drives. The mapping for these arrays is simple. In the RAID 0 example at the right, A1, A2, etc are the same as my volume representation (PV) above. Just substitute PV for A. A1 is the first stripe and resides on Disk 0, A2 is the second stripe and resides on the first stripe, A3 (the second stripe on Disk 0) is the third stripe, etc. There are no parity stripes to worry about.

    The most interesting stripe map is found in RAID 5 Arrays. RAID 5 was originally defined in 1988 in a paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)" in June 1988 at the SIGMOD conference. The new idea in this paper was for one stripe out of each stripe set to be used for parity in a group of striped drives while distributing the parity stripe over all the drives to minimize the impact on performance. This concept of data storage and redundancy revolutionized the way people looked at data storage. The concept of distributed data with redundancy has moved out of simple disk management and is used as the basis for distributed data in global networks. Newer schemes for redundancy in large arrays such as RAID 6 designate 2 stripes for holding parity to address the vulnerability of a RAID array to data loss during rebuild of a failed drive. We will focus on RAID 5 for the purposes of this post.

    RAID 5 sets aside one physical stripe for parity for each stripe set. The parity stripe alternates from drive to drive. What determines where the parity stripe resides is based on 2 standard schemes, reverse parity and forward parity. In the image at the left is an example of reverse parity. In reverse parity, the first parity stripe is on the last drive (Disk 3) and is designated A(p) in the image. In my notation this would be represented as S1D3 for the first stripe of Disk 3. It is the 4th stripe of the RAID volume (PV4) and isn't included in the logical volume at all since it is a parity stripe. In reverse parity the first stripe of the first drive (S1D0) is also the first stripe of the RAID Volume (PV1) and the first stripe of the Logical Volume (LV1). This scheme makes it easy conceptually to see the logical volume as this information is where it should be as on a single drive.

    To continue the second parity stripe is on Disk 2 and is designated B(p) in the example (in my notation it would be S2D2). The mapping of data stripes are simply in order from Disk 0 to Disk 3 skipping the parity stripe. I will discuss different data mapping schemes in a future post. In this scheme, the first drive holds the first stripe of the volume (LV1). It is easy conceptually as the first drive looks like it is really the first part of the volume as it holds the master boot record (MBR) and often the bios parameter block (BPB). The second stripe (LV2) is on second drive (S1D1), etc. Note that the number of stripes in the logical volume is less than the physical volume. The difference is simply (n - 1) * the total number of stripes where n = the total number of drives in the array.

    In the forward parity scheme, the first parity stripe resides on the first stripe on the first drive (S1D0) in the array. The second parity stripe resides on the second stripe on the second drive (S2D1). The third parity stripe resides on the third stripe on the third drive (S3D2). This continues until the last drive in the array. Then, this scheme starts over on the first drive. The first data stripe of the volume is first stripe on the second hard disk (A2 in the image, S1D1 in my notation). It contains the MBR and most likely, the BPB, depending on the size of the stripe.

    I use the term logical volume to describe the actual look of the volume to the operating system. The RAID array is usually completely hidden from the operating system and is seen by the OS as a regular disk volume like a single hard drive. I wanted to return to this representation to show how the size of the resulting volume is calculated. Using the formula above for determining the number of stripes works equally well for determining the size. The size of the RAID's logical volume is equal to the number of drives minus 1 times the size of the smallest drive or in mathematical terms :

    LV = (N - 1) * S

    Where LV is the size of the logical volume, N is the number of hard drives and S is the size of the smallest drive.

    There are many other parity striping schemes including proprietary schemes. Compaq created one that had me stumped for a while. Compaq, when they took over HP's Surestore division, created a parity mapping scheme that spanned several data stripes. Instead of have one parity stripe for each physical stripe (PV), they used 16 contiguous stripes on a drive for parity. the data stripes remained 128 sectors. Thus, for each of those 16 stripes it looks like RAID 4.

    Using a standard algorithm to unstripe this set yielded a strange result. It resembled a reverse parity misconfiguration when unstriped with a forward parity algorithm and a forward parity misconfiguration when unstriped with a reverse parity algorithm. When I looked at the file system with a data recovery tool, many files would be visible but more than half wouldn't open. On inspection of the unstriped volume, parity stripes were evident in the the place that data should be. After careful review of each drive I could see that the data was striped at 128 sectors but the parity was striped at 2048 sectors (16 x 128). I was able to rewrite my unstripe routine to account for this difference and get the data back.

    In my next post I will address data mapping schemes.

    Thursday, January 31, 2008

    Understanding RAID Systems: RAID 0, 1, 5, 0+1 Arrays

    RAID implemented on motherboards is becoming more common. The SATA standard includes many of the features of the SCSI command set used in most high end RAID systems. This has permitted simplified RAID to be put on low priced motherboards selling for under $200. RAID 0, RAID 1 and RAID 5 are on the motherboard in my workstation. I bought it for for $98. Due to the ubiquity of RAID systems today, I thought it would be interesting to explain how RAID systems work highlighting their advantages and disadvantages.

    RAID started as a search for creating large storage systems from inexpensive small drives. There were 2 goals: first, to increase storage available and; second, to increase speed of response by taking advantage of the asynchronous nature of separate drives. I don't want to go too deep into the history in this post. However, I wanted to point out that the first attempt was to create an array of drives, now called, RAID 0, wasn't really redundant. This is a trap for many users. RAID 0 is dangerous. It creates a single volume from all the drives so that if any one of the drives fail, the whole volume is lost. It was discarded quickly as unusable. RAID 0 is much more common than it should be. Users believe that the increase in speed is worth the increased risk of failure. We see many failed RAID 0 arrays.

    RAID is an acronym for Redundant Array of Independent Drives. While there are many different configurations, most are combinations of RAID 0, RAID 1 and RAID 5. All RAID systems use a technique called "striping," that writes data in contiguous blocks in drive order. So, stripe 0 is written to drive 0, stripe 1 is written to drive 1, etc. When the end of the array is reached the write starts over at drive 0. For example in a simple array with 3 drives, stripe 3 (the 4th stripe) would be written to drive 0, stripe 4 would be written to drive 1, etc. (Note actual placement of the data varies according to the type of RAID configuration implemented).

    RAID 1 is called mirroring. In this configuration there are 2 drives that are written to at the same time. They are exact copies of each other. The first drive in the set is the one that is read but all write operations are performed on both drives. If for any reason the first drive fails to read, the second drive is read and an error condition on the array is asserted. This drops the failed drive from the array. All subsequent write operations are only performed on the good drive. RAID 1 is used where storage space and speed are much less important than redundancy. Often we see RAID 1 on the system drives in servers where the first 2 drives in big arrays are set up as RAID 1 (the system boot) and the rest are RAID 5.

    The second attempt to create a large storage system incorporated features of both RAID 0 and RAID 1. It is called appropriately, RAID 0 + 1. In this configuration both RAID 0 and RAID 1 are combined such that each drive has a mirror thus eliminating the risk of cataclysmic failure when one drive fails. An even number of drives is required as each member of the striped set has a mirror drive. This is very wasteful of storage capacity. It does have the advantage of speed and redundancy. As in RAID 1, only the first set is read but both sets are written to at the same time. There was still felt to be much room for improvement which led to several more attempts to design a better system.

    Understanding RAID 5 and Stripting with Parity

    After testing several different configurations and methods, a configuration was found that would protect against single drive failure while providing both significant increases in speed and capacity for very low cost. This was the birth of RAID 5. The secret to RAID 5 is actually quite simple. It uses a technique called, "parity". Parity looks at each bit stored on each of the drives and puts them in columns and rows. To visualize this yourself, think of each drive as occupying a column and each "bit" occupying a row. A bit is the smallest amount of data stored on digital media, representing a binary number, either 0 or 1. Parity is simply the binary sum of those bits retaining only the one's column stripping any carry over. Below is an example:


    Parity Example:

    Drive: 0 1 2 3 Parity

    Bit 0: 0 1 0 1 0

    Bit 1: 1 1 1 0 1

    Bit 2: 0 0 1 1 0

    Table 1

    Note in the example above that in the case where there is an even number of 1's, the parity is 0 and in the case of where there is an odd number of 1's, the parity is 1. Thus, in computer jargon we say the parity is even when it adds up to 0 and odd when it adds up to 1. This parity rule holds true for any number of drives or bits. In case it wasn't clear and to refresh our memory from prior posts, all data is stored on drives as little areas of magnetic polarity which depending on their orientation represent a binary '0' or a '1'. These are grouped together into bytes (8 bits) and sectors (512 bytes)for ease of control and integrity testing. Each byte can be thought of in our table above as bits 1 - 8 and each sector as 512 collections of those 8 bits. On RAID systems sectors are collected into "Stripes" usually a multiple of 2 such as 128 sectors per stripe (most common size).

    I probably digressed into a little too much detail. To return to understanding RAID 5, several drives are grouped together such that 1 stripe out of the number of drives is defined as the parity stripe. And for each bit on each of the drives there is a corresponding parity bit on that stripe. This means that if there are 'n' drives, the real data capacity is equal to (n-1) * capacity of each drive. So,if there are 7 36GB drives in the RAID 5 Array, you multiply the capacity (36GB) by (7 - 1) = 6... (6 * 36) to get 216 GB as the size of the RAID volume. As a side note, that parity stripe is actually spread out over all the drives. It turned out that it was much slower to keep parity on a designated parity drive.

    So the big question is, "How does it continue to work when one drive fails?" It turns out be a simple mathematical problem that computers are able to perform extremely quickly. The parity checking and result are easily performed in memory within the time it takes to assemble the data packet for use by the system... Just by adding the remaining bits back together and adding the parity bit, you reproduce the missing bit. This is the whole crux of the function of the redundancy. To return to our examples above...

    Parity Example with Recovery Bit:

    Drive: 0 1 2 3 Parity Recovered Bit

    Bit 0: 0 1 X 1 0 0

    Bit 1: 1 1 X 0 1 1

    Bit 2: 0 0 X 1 0 1

    Table 2


    If you compare the recovered bits to the missing column in Table 1 you will see that they match. As a mental exercise blank out any of the columns in Table 1 and see what the results are.

    This shows how a single parity stripe on a 14 drive array can reproduce the missing data from a failed drive quickly and accurately.

    We will continue with our discussion of RAID arrays in the next post.

    Peter

    Friday, January 11, 2008

    Part 3: How Disks Fail

    Now, we have to imagine how all the parts work together. The platters spin at reasonably high rates, anywhere from 3400 rpm (old drives) to 15000 rpm. The heads move at high speeds as well. The controller card maintains the speed of the platters and manages the movements of the heads. All the rest of the components are there to refine the signals and manage the communications.

    head crash imageThe first and most obvious failure is the head crash. A head crash occurs when a head touches the platter and damages the media below. If you recall from the prior post there are several layers to the media. The last 2 layers are a hardening layer and a lubricating layer. Once the head has abraded through these 2 layers, the data layer is easy to damage. The last 2 layers are there for that reason to protect the magnetic layer. Heads can touch the media without damaging them. However, the heads are made out of the same materials that IC's are made from, glass and silicone. Glass is one of the hardest substances and can easily scratch most other materials. Sandpaper is made from glass. Considering how fast those heads are moving in relationship to the platters, it doesn't take much to scratch the media. A weird consequence of the hard crash is "stiction". Stiction is just as it sounds. It is occurs when the head sticks to the platter. Stiction occurs due to several factors including magnetic attraction, smoothness, electrostatic attraction and the stickiness of silicon.

    The second common type of failure is electronic. Electronic failure is damage to any of the electrical components. The chain of components includes the control circuits, reading and writing circuits, and the communication circuits. These circuits can fail as a result of heat, cold solder joints, manufacturers defects, and externally generated failures (surges, physical forces, etc). It is interesting to note that an electronic failure can easily mimic any of the other failure even a head crash.

    The last type of failure I will talk about is firmware failure. On each hard disk is an eprom that holds information and software that manages the functions of the drive. In addition some manufacturers put a part of the information and programming for the hard drive on the platter. This is commonly called the "firmware". We separate out this type of failure from electronic failure because it is addressed differently when we perform data recovery.

    Here is how we describe the failures that can occur:

    Head Crash with:

    damaged heads
    media damage
    stiction

    Electronic Failure with:

    arc damage to the media
    damage to the firmware

    Next Post I will talk about software failures.


    Peter

    Tuesday, December 11, 2007

    Part 2: Disk Media Technology

    Image of Future Disk Media
    Media really means material. The inside of a hard drive contains one or more platters or plates that are used to hold the magnetic information. These platters consist of a substrate layer and several thin layers of material designed to hold the data and protect it.

    Each platter has its own set of read/write heads. Sometimes, there is only one read/write head for a given platter but usually there are 2. Each head sits on a metallic arm that functions as a spring and extension. The spring acts to keep the heads as close to the platters as possible. Head technology may be addressed in a future post.

    The platters start with a "substrate" layer such as aluminum or glass. The substrate layer is actually the support for the other materials. The substrate is polished to a flat surface to keep it as smooth as possible. The less defects here the more data you can pack in. Additional smoothing may be accomplished by adding a "thin film" of substrate. Once the surface is prepared, additional "thin film" layers are applied using either electroplating methods (cheaper but almost phased out) or "sputtering" methods. These layers vary in thickness from an amazing 1 nanometer (1 billionth of a meter) to 30 nanometers.

    The first layer is the magnetic layer. It was originally iron oxide paint. Rumor has it that the first platters used the same paint used on the Golden Gate Bridge. Modern drives use a cobalt iron mix to create a harder layer with better magnetic characteristics. So, it has better durability and is able to hold much more data. There are actually 3 "sub-layers" to create the magnetic layer. There are 2 layers of magnetic material with a layer of the element Ruthenium in between, creating a super magnetic sandwich. The 2 magnetic layers act to reinforce each other's magnetism.

    The second layer is a hardening layer. It is usually a sputtered carbon layer. While we mainly think of carbon as soft, such as graphite in lead pencils or coal, diamonds are also made of carbon. When sputtered on in a thin film, most of the carbon is deposited as an amorphous solid (like coal) but a portion of the carbon crystallizes and acquires the hardness of diamond. This layer can be from 3 nanometers to several nanometers in thickness.

    The last layer is the lubricating layer. Lubricants used are chemicals with names like z-dol and Z-tetraol. They form a regular, smooth and very thin layer on the surface of the carbon. These layers can be little as 1 nanometer thick. Unlike standard oil based lubricants these synthetics have unique properties. Besides thinness, they are more durable and don't evaporate like oils. However, they are sensitive to heat. Heat can cause the lubricant to break down or evaporate. Loss of this layer is a major factor in media damage.

    Looking ahead to the future of disks the latest news is that there may still be as much as 5 times the density available for data. A new technology called, exchange coupled composite, which is made from alternating layers of fast changing and slow changing magnetic materials. The 2 layers act to dampen each other eliminating errors at such small distances. Another interesting approach being investigated is "bit patterned media" which uses etching of the media to create discreet pockets to hold the data.

    Next I will try to address some of the causes and characteristics of hard disk failure.

    Peter