The Impact of NCQ on Multitasking Performance

Just under a year ago, we reviewed Maxtor's MaXLine III, a SATA hard drive that boasted two very important features: a 16MB buffer and support for Native Command Queuing (NCQ).  The 16MB buffer was interesting as it was the first time that we had seen a desktop SATA drive with such a large buffer, but what truly intrigued us was the drive's support for NCQ.  The explanation of NCQ below was from our MaXLine III review from June of 2004:

Hard drives are the slowest things in your PC and they are such mostly because they are the only component in your PC that still relies heavily on mechanics for its normal operation. That being said, there are definite ways of improving disk performance by optimizing the electronics that augment the mechanical functions of a hard drive.

Hard drives work like this: they receive read/write requests from the chipset's I/O controller (e.g. Intel's ICH6) that are then buffered by the disk's on-board memory and carried out by the disk's on-board controller, making the heads move to the correct platter and the right place on the platter to read or write the necessary data. The hard drive is, in fact, a very obedient device; it does exactly what it's told to do, which is a bit unfortunate. Here's why:

It is the hard drive, not the chipset's controller, not the CPU and not the OS that knows where all of the data is laid out across its various platters. So, when it receives requests for data, the requests are not always organized in the best manner for the hard disk to read them. They are organized in the order in which they are dispatched by the chipset's I/O controller.

Native Command Queuing is a technology that allows the hard drive to reorder dynamically its requests according to the location of the requests on a platter. It's like this - say you had to go to the grocery store and the drug store next to it, the mall and then back to the grocery store for something else. Doing it in that order would not make sense; you'd be wasting time and money. You would naturally re-order your errands to grocery store, grocery store, drug store and then the mall in order to improve efficiency. Native Command Queuing does just that for disk accesses.

For most desktop applications, NCQ isn't necessary. Desktop applications are mostly sequential in nature and exhibit a high degree of spatial locality. What this means is that most disk accesses for desktop systems occur around the same basic areas on a platter. Applications store all of their data around the same location on your disk as do games, so loading either one doesn't require many random accesses across the platter - reducing the need for NCQ. Instead, we see that most desktop applications benefit much more from higher platter densities (more data stored in the same physical area on a platter) and larger buffers to improve sequential read/write performance. This is the reason why Western Digital's 10,000 RPM Raptor can barely outperform the best 7200 RPM drives today.

Times are changing, however, and while a single desktop application may be sequential in nature, running two different desktop applications simultaneously changes the dynamics considerably. With Hyper Threading and multi-core processors being the things of the future, we can expect desktop hard disk access patterns to begin to slightly resemble those of servers - with more random accesses. It is with these true multitasking and multithreading environments that technologies such as NCQ can improve performance.

In the Maxtor MaXLine III review, we looked at NCQ as a feature that truly came to life when working in multitasking scenarios. Unfortunately, finding a benchmark to support this theory was difficult. In fact, only one benchmark (the first Multitasking Business Winstone 2004 test) actually showed a significant performance improvement due to NCQ.

After recovering from Part I and realizing that my nForce4 Intel Edition platform had died, I was hard at work on Part II of the dual core story. For the most part, when someone like AMD, Intel, ATI or NVIDIA launches a new part, they just send that particular product. In the event that the new product requires another one (such as a new motherboard/chipset) to work properly, they will sometimes send both and maybe even throw in some memory if that's also a more rare item. Every now and then, one of these companies will decide to actually build a complete system and ship that for review. For us, that usually means that we get a much larger box and we have to spend a little more time pulling the motherboard out of the case so we can test it out on one of our test benches instead - obviously, we never test a pre-configured system supplied by any manufacturer. This time around, both Intel and NVIDIA sent out fully configured systems for their separate reviews - two great huge boxes blocking our front door now.

When dissecting the Intel system, I noticed something - it used a SATA Seagate Barracuda 7200.7 with NCQ support. Our normal testbed hard drive is a 7200.7 Plus, basically the same drive without NCQ support. I decided to make Part I's system configuration as real world as possible and I used the 7200.7 with NCQ support. So, I used that one 7200.7 NCQ drive for all of the tests for Monday's review. Normally, only being able to run one system at a time would be a limitation. But given how much work I had to put into creating the tests, I wasn't going to be able to run multiple things at the same time while actually using each machine, so this wasn't a major issue. The results turned out as you saw in the first article and I went on with working on Part II.

For Part II, I was planning to create a couple more benchmarks, so I wasn't expecting to be able to compare things directly to Part I. I switched back to our normal testbed HDD, the 7200.7 Plus. Using our normal testbed HDD, I was able to set up more systems in parallel (since I had more HDDs) and thus, testing went a lot quicker. I finished all of the normal single threaded application benchmarks around 3AM (yes, including gaming tests) and I started installing all of the programs for my multitasking scenarios.

When I went to run the first multitasking scenario, I noticed something was very off - the DVD Shrink times were almost twice what they were in Monday's review. I spent more time working with the systems and uncovered that Firefox and iTunes weren't configured identically to the systems in Monday's review, so I fixed those problems and re-ran. Even after re-running, something still wasn't right - the performance was still a lot slower. It was fine in all other applications and tests, just not this one. I even ran the second multitasking scenario from Monday's review and the performance was dead on - something was definitely up. Then it hit me...NCQ.

I ghosted my non-NCQ drive to the NCQ drive and re-ran the test. Yep, same results as Monday. The difference was NCQ! Johan had been pushing me to use a Raptor in the tests to see how much of an impact disk performance had on them, and the Raptor sped things up a bit, but not nearly as much as using the 7200.7 did. How much of a performance difference? The following numbers use the same configuration from Monday's article, with the only variable being the HDD. I tested on the Athlon 64 FX-55 system:

Seagate Barracuda 7200.7 NCQ - 25.2 minutes
Seagate Barracuda 7200.7 no NCQ - 33.6 minutes
Western Digital Raptor WD740 - 30.9 minutes

The performance impact of NCQ is huge. But once again, just like the first NCQ article, this is the only test that I can get to be impacted by NCQ - the other Multitasking Scenarios remain unchanged.  Even though these numbers were run on the AMD system, I managed to get similar results out of the Intel platform. Although, for whatever reason, the Intel benchmarks weren't nearly as consistent as the AMD benchmarks.  Given that we're dealing with different drive controllers and vastly different platforms, there may be many explanations for that.

At first, I thought that this multitasking scenario was the only one where NCQ made an impact, but as you'll find out later on in this article, that's not exactly true.

Multitasking Performance Multitasking Scenario 2: File Compression
Comments Locked

106 Comments

View All Comments

  • segagenesis - Wednesday, April 6, 2005 - link

    Since I mostly play games ill stick to buying the AMD64 3500+. Thanks. My definition of multi-tasking is using a whole other computer ;)

    The Pentium D seems pretty decent at multitasking as you would define running two things at once but I rarely do that sort of thing since its kind of dumb to encode a DVD in the background while playing a game. Or does encoding a DVD really interfere with browsing the web? I dont know... that and the heat factory output as if it was bad enough is now worse.
  • Woodchuck2000 - Wednesday, April 6, 2005 - link

    #23 - I'm assuming that a dual core A64 at 2.2GHz will blow a Pentium D out of the water at any of the launch frequencies! The Prescott core isn't really designed for multi-core operation, and needs some kind of arbitration logic and some funky-memory-controlling-thingy to work. As a result, the performance improvements in multi-threaded applications aren't anything like the theoretical extra 100% another core could bring. With A64 being designed for multi-core operation I'd expect the increase in performance to be nearer 85%.

    As regards the performance gap between the P4 630 and A64 3500+, the majority of the benchmarks shown here are designed specifically to show performance improvements in multi-core processors. The 630 is hyper-threaded and therefore logically multi-cored, if not physically so. As such the 630 will have artificially high performance compared with the 3500+ - in most single-application benchmarks, the AMD chip would thrash both Intel chips.

    Is there any chance of adding benches for the 630 with HT disabled (or at least giving us an idea of performance.)? We've got a vanilla A64 versus a HT 630 and Dual-core system. It'd be good to see how a single core performs for reference.
  • Anand Lal Shimpi - Wednesday, April 6, 2005 - link

    Jeff7181

    I really can't say more, but you are barking up the wrong tree with those assumptions :)

    AMD's dual core will be quite impressive, even more so than Intel's. Don't look at performance as the only vector to measure though...

    marcusgarcia

    We did look at HT performance when it came out, but the problem is that HT doesn't improve performance in all cases. Look at the Gaming Multitasking Scenario 2 tests, HT reduced performance significantly - most likely because DVD Shrink and Doom 3/Splinter Cell were both contending for floating point resources that were in use. Dual core solves this problem by having two complete sets of execution units, so there's no worry about contention between threads for shared resources.

    As far as Half Life 2 goes, it is still single threaded so its performance characteristics would be no different than what you see here.

    mlittl3

    I've been looking into running VoIP or some sort of voice chat program in the background, the problem surfaces in trying to put together a reliable, repeatable workload. Dual core will most definitely help there, but how much - I do not know.

    I haven't given up yet :)

    BruceDickenson

    Glad to have you on-board and thanks for the kind words :)

    Woodchuck2000

    The new dual core chips are still LGA-775, but they do require a new motherboard (unlike AMD's solution which just requires a new BIOS). Currently Intel's 945 and 955 chipsets will support dual core, and tomorrow I should have a nForce4 SLI Intel Edition board that will support dual core as well. The new NVIDIA chipset does support dual core, but it's up to motherboard makers to implement support for it in their designs.

    Check with the motherboard maker to make sure that dual core is explicitly supported by the board, it should say so somewhere in the manual or on the box.

    Take care,
    Anand
  • Jeff7181 - Wednesday, April 6, 2005 - link

    #22... Anand said a 2.2 GHz Athlon 64 won't compete with the 2.8 GHz Pentium D. That either means a 2.2 GHz dual core Athlon 64 will have lackluster performance, or it will be AMD's new enthusiast line like the FX is right now, which means it would be competing with the Extreme Edition chips, not the regular line.

    I guess there's a 3rd possibility. He was referring to dual core Opterons which obviously won't compete with the Pentium D any more than the Opteron competes with the Pentium 4 right now.
  • Woodchuck2000 - Wednesday, April 6, 2005 - link

    Just out of interest, does anyone know what socket do the new cores use? Will the new nVidia chipset support the new cores (it was hinted at briefly, but not stated explicitly...)?

    #19 - What's your source for those assertions? I've heard reports that AMD have got samples running at well over 2GHz and since the K8 architecture is natrually better suited to multiple cores I'd have expected blistering performance. BTW, does anyone know if the AMD cores will be based on the new Venice rev? Is SSE3 a given?
  • BruceDickenson - Wednesday, April 6, 2005 - link

    Hey all, long time reader, this is my first "post"/comment...

    Just had to say this is one of the most interesting articles I've read in a long time. I loved the NCQ tangent, it almost felt like you were part of a conversation when you read how Lal Shimpi analyzed the anomaly in his testing.

    Loved it! Thanks AT!
  • marcusgarcia - Wednesday, April 6, 2005 - link

    14# again.

    Forgot to say in the last post, my rant is about HT, not dual cores.

    I know 2-cores won't make THAT difference on these trivial things (who needs another 2.8ghz for simple stuff?)...but..HT is benefiting GREATLY from it, yet noone mentioned it and didn't even try this sort of test when HT was launched.

    When you see the 3.0 HT doing better than a AMD 3500+ (supposedly 500 points faster), you gotta ask how badly would it beat the AMD 64 3000+, which happens to cost almost the same than the P4 3.0 ghz...which happened to destroy the much faster AMD on the test.

    That pretty much sucks and leave us with the impression that people either:

    a - wanted to benefit AMD
    or
    b - were too ingenuous to think on these tests when doing HT tests (which can't be true because i always wanted them)
  • Jeff7181 - Wednesday, April 6, 2005 - link

    Errr... correction...

    1.) The dual core 2.2 GHz Athlon-64's will be less than impressive and won't even perform in the same class as a Pentium D @ 2.8 GHz.

    2.) The mainstream Athlon-64 dual core chips will be running at much less than 2.2 GHz, and the 2.2 GHz dual cores will be the FX line, which compete with the Extreme Edition Pentiums.
  • marcusgarcia - Wednesday, April 6, 2005 - link

    14#

    Completely wrong.

    1º: Outlook checks 8 pop accounts for mail and apply it's rules to it every minute or so.
    2º: MSN with webcam can eat quite some CPU, specially because i play on the dark with the "low light filter" turned on, which happens to eat quite some CPU.
    3º: For every file opened/closed both the AVG and the MS anti-spyware are going to have a check if that's malicious and if the action is allowed.

    When i close everything and run 3dmark01 i get around 300 - 600 more points out of it from my 12200 points score.

    PS: don't forget IE, which is usually opened here or on tom's hardware (or both and some more), which happen to have a lot of those huge flash banners.

    I think that DOES make *a lot* of difference.

    Add to that the fact of many people using skype while gaming, mainly on FPS and RTS, which can make all the difference.
  • Jeff7181 - Wednesday, April 6, 2005 - link

    "Let's just say that the dual core Athlon 64 running at 2.2GHz won't be compared to a dual core Pentium D running at 2.8GHz."

    So you leave two possibilities.

    1.) The dual core 2.2 GHz Athlon-64's will be less than impressive and won't even perform in the same class as a Pentium D @ 2.8 GHz, but rather dual core Extreme Edition chips.

    2.) The desktop dual core Athlon-64's will be running at much less than 2.2 GHz.

Log in

Don't have an account? Sign up now