Introduction

The first magnetic disk was introduced by IBM in the 305 RAMAC computer on September 13th, 1956. The first disk drive was the size of two large refrigerators, could hold 4.4 MB, and cost $10,000 per MB. Although the capacity of the hard disk has exploded and price per GB has decreased spectacularly, the price of a complete enterprise storage solution can still quickly amount to tens of thousands of dollars and more.

Building a complete server solution for our own server lab, we quickly found out that finding the best storage solution for our needs was pretty hard when you are on a tight budget. As usual, the companies active in this market are not helping out. Minor evolutions are called "Breakthrough Architectures", "Affordability" means "not too expensive unless you need more than two drive bays filled" and "Business Intelligence" or "Investment Protection" just means that the marketing people were running out of buzz words and inspiration. In fact, the storage companies do their best to confuse people by calling both a simple SCSI DAS and a very expensive Fibre Channel SAN "scalable", "flexible", "affordable" and "serviceable".

The seasoned storage veteran quickly weeds out all the fluffy buzzwords, but what if you are relatively new to this market? What if your own experience with storage has been limited to adding disks to your old trusty tower server or the workstations of your colleagues? Welcome to the second part of our server guide! Just like our first guide, our goal is to offer you a no-nonsense introduction into the server room, and in this particular server guide we focus on storage performance and different disk interfaces.


Disk performance?

Before we start discussing the different topologies and technologies in the storage world, it is good to get back to basics. The basic component of 99.9% of the storage technology out there is still the hard disk.

To understand the basic performance of a disk, take a look at what happens when a request is sent to the disk:
  1. The Disk controller translates a logical address into a physical address (cylinder, track, and sector). The request is a matter of a few tens of nanoseconds, the command decoding and translating can take up to 1 ms.
  2. The head is moved by the actuator to the correct track. This is called seek time, the average seek time is somewhere between 3.5 and 10 ms
  3. The rotational motor makes sure that the correct sector is located under the head. This is called rotational latency and it takes from 5.6 ms (5400 rpm) to 2 ms (15000 rpm). Rotational latency is thus determined by how fast the rotational motor spins.
  4. The data is then read or written. The time it takes is dependent on how many sectors the disk has to write or read. The rate at which data is accessed is called the media transfer rate (MTR).
  5. If data is read, the data goes into disk buffer, and is transferred by the disk interface to the system.
Media transfer rate (MTR) depends on the rotation speed and on the density with which data is stored. The higher the density, the more data moves under the head in the same amount of time.

Which operation will be the most important? That depends on the amount of data you read or write. If you need many small pieces of data all scattered all over the disk, seek time and latency are the most important. On the other hand if you transfer larger, contiguous pieces of data (i.e. data that is located in close proximity on the drive surface), the MTR will be the most important parameter.

To illustrate this, take a look at the table below. The table below calculates how much time it would take to transfer one block of 4 MB, similar to opening a MP3 song on a desktop PC. We also calculate the time it takes to get 100 different blocks of 4 KB, similar to what would happen if 100 users sent a very simple query to a database server simultaneously. At the end of the table we calculate the total time it takes to perform the requested actions, and we calculate the sustained transfer rate (STR), or the amount of data divided by the total time.



The Faster SATA and SCSI disk performing a database and a typical desktop workload

Although it's transferring one tenth the amount of data, the database access takes almost 15 times more time. In the case of our database access, seek time and latency determine 90-95% of our disk performance, while transfer time is only 1%. If we increase the size of the blocks that we need to 16 KB, little would change. The transfer time would quadruple, but the total time would hardly increase. However, if we increase the numbers of blocks or more generally the number of "I/O operations" that we access, the total time necessary to complete this action would scale almost linearly: twice as many I/O operations will double the time.

In our "desktop MP3" example, transfer time is good for 85% of the time: MB/s is the most important metric. File and FTP servers are somewhere between the desktop and database server examples: on average the number of KB per I/O operation is much higher than a transactional database, but I/O operations are also requested simultaneously.

So basically, there are two ways to measure storage performance:
  1. In MB/s
  2. In I/O operations per second
Notice that in the worst case, database storage server performance can be less than 1 MB/s. Of course, smart techniques such as Native Command Queuing, read ahead buffers, Out of Order Data delivery, and smart caches can lower the impact of concurrent accesses. However, it is not uncommon for database applications to lower the STR (Sustained Transfer Rate) of very fast drives to a few MB per second.

Enterprise Disks: all about SCSI
Comments Locked

21 Comments

View All Comments

  • dickrpm - Saturday, October 21, 2006 - link

    I have a big pile of "Storage Servers" in my basement that function as a audio, video and data server. I have used PATA, SATA and SCSI 320 (in that order) to achieve necessary reliability. Put another way, when I started using enterprise class hardware, I quit having to worry (as much) about data loss.
  • ATWindsor - Friday, October 20, 2006 - link

    What happens if you encounter a unrecovrable read error when you rebuid a raid5-array? (after a disk has failed) Is the whole array unusable, or do you only loose the file using the sector which can't be read?

    AtW
  • nah - Friday, October 20, 2006 - link

    actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.
  • nah - Friday, October 20, 2006 - link

    actually the cost of the original RAMAC was USD 35,000 per year to lease---IBM did not sell them outright in those days, and the size was roughly 4.9 MB.
  • yyrkoon - Friday, October 20, 2006 - link

    It's nice to see that someone finally did an article that had information about SATA port multipliers (these devices have been around for around 2 years, and no one seems to know about them), but since I have no direct hands on experience, I feel the article concerning these was a bit skimpy.

    Also, while I see you're talking about iSCSI (I think some call it SCSI over IP ?) in the comments section here, I'm a bit interrested as to why I didnt see it mentioned in the article.

    I plan on getting my own SATA port multiplier eventually, and I have a pretty good idea how well they would work under the right circumstances, with the right hardware, but since I do not work in a data center (or some such profession), the likelyhood of me getting my hands on a SAS, iSCSI, FC, etc rack/system is un-likely. What I'm trying to say here, is that I think you guys could go a good bit deeper into detail with each technology, and let each reader decide if the cost of product x is worth it for whatever they want to do. In the last several months (close to two years) I've been doing alot of research in this area, and still find some of these technologies a bit confusing. iSCSI for example, the only documention I could find on the subject (around 6 months ago) was some sort of technical document, written by Microsoft that I found very hard time digesting. Since then, I've only seen (going from memory) white papers from companies like HP pushing thier own specific products, and I dont care about thier product in particular, I care about the technology, and COULD be interrested in building my own 'system' some day.

    What I am about to say next, I do not mean as an insult in ANY shape or form, however I think when you guys write articles on such subjects, that you NEED to go into more detail. Motherboards are one thing, hard drives, whatever, but when you get into technology that isnt very common(outside of enterprise solutions) such as SAS, iSCSI, etc, I think you're actualy doing your readers a dis-service by showing a flow chart or two, and briefly describing the technology. NAS, SAN, etc have all been done to death, but I think if you look around, you will find that a good article on ATLEAST iSCSI, how it works, and how to implement it, would be very hard to find(without buying a prebuilt solution from a company). Anyhow (again) I think I've beat this horse to death, you get my drift by now im sure ;)
  • photoguy99 - Thursday, October 19, 2006 - link

    Great article, well worth it for AT to have this content.

    Can't wait for part 2 -
  • ceefka - Thursday, October 19, 2006 - link

    Can we also expect a breakdown and benchmarking on network storage solutions for the home and small office?
  • LoneWolf15 - Thursday, October 19, 2006 - link

    Great article. It addressed points that I not only didn't think of, but that were far more useful to me than just baseline performance.

    It seems to me that for the moderately-sized business (or "enterprise-on-a-budget" role, such as K-12 education) that enterprise-level SATA such as Caviar RE drives in RAID-5, plus solid server backups (which should be done anyways) make more sense cost-wise than SAS. Sure, the risk for error is a bit higher, but that is why no systems/network administrator in their right minds would rely on RAID-5 alone to keep data secure.

    I hope that Anandtech will do a similarly comprehensive article about backup for large storage someday, including multiple methods and software options. All this storage is great, but once you have it, data integrity (especially now that server backups can be hundreds of gigabytes or more) cannot be stressed enough.

    P.S. It's one of the reasons I can't wait until we have enough storage that I can enable Shadow Copy on our Win2k3 boxes. Just one more method on top of the existing structure.
  • Olaf van der Spek - Thursday, October 19, 2006 - link

    quote:

    the command decoding and translating can take up to 1 ms.


    Why does this simple (non-mechanical) operation take so long?
  • Fantec - Thursday, October 19, 2006 - link

    Working for an ISP, we started to use PATA/SATA a few years ago. We still use SCSI, FC & PATA/SATA depending on our needs. SATA is the first choice when we may have redundant data (and, in this case, disks are setup in JBOD (standalone) for performances issues). At the opposite, FC is only used for NFS filers (mostly used for mail storage, where average file size is a few KB).
    Between both, we are looking at needed storage size & IO load to make up our mind. Even for huge IO loads but only when requested block size is big enough, SATA behaves quite well.

    Nonetheless, something bugs me in your article on Seagate test. I manage a cluster of servers whose total throughoutput is around 110 TB a day (using around 2400 SATA disks). With Seagate figure (an Unrecoverable Error every 12.5 terabytes written or read), I would get 10 Unrecoverable Errors every day. Which, as far as I know, is far away from what I may see (a very few per week/month).

Log in

Don't have an account? Sign up now