The Impact of NCQ on Multitasking Performance

Just under a year ago, we reviewed Maxtor's MaXLine III, a SATA hard drive that boasted two very important features: a 16MB buffer and support for Native Command Queuing (NCQ).  The 16MB buffer was interesting as it was the first time that we had seen a desktop SATA drive with such a large buffer, but what truly intrigued us was the drive's support for NCQ.  The explanation of NCQ below was from our MaXLine III review from June of 2004:

Hard drives are the slowest things in your PC and they are such mostly because they are the only component in your PC that still relies heavily on mechanics for its normal operation. That being said, there are definite ways of improving disk performance by optimizing the electronics that augment the mechanical functions of a hard drive.

Hard drives work like this: they receive read/write requests from the chipset's I/O controller (e.g. Intel's ICH6) that are then buffered by the disk's on-board memory and carried out by the disk's on-board controller, making the heads move to the correct platter and the right place on the platter to read or write the necessary data. The hard drive is, in fact, a very obedient device; it does exactly what it's told to do, which is a bit unfortunate. Here's why:

It is the hard drive, not the chipset's controller, not the CPU and not the OS that knows where all of the data is laid out across its various platters. So, when it receives requests for data, the requests are not always organized in the best manner for the hard disk to read them. They are organized in the order in which they are dispatched by the chipset's I/O controller.

Native Command Queuing is a technology that allows the hard drive to reorder dynamically its requests according to the location of the requests on a platter. It's like this - say you had to go to the grocery store and the drug store next to it, the mall and then back to the grocery store for something else. Doing it in that order would not make sense; you'd be wasting time and money. You would naturally re-order your errands to grocery store, grocery store, drug store and then the mall in order to improve efficiency. Native Command Queuing does just that for disk accesses.

For most desktop applications, NCQ isn't necessary. Desktop applications are mostly sequential in nature and exhibit a high degree of spatial locality. What this means is that most disk accesses for desktop systems occur around the same basic areas on a platter. Applications store all of their data around the same location on your disk as do games, so loading either one doesn't require many random accesses across the platter - reducing the need for NCQ. Instead, we see that most desktop applications benefit much more from higher platter densities (more data stored in the same physical area on a platter) and larger buffers to improve sequential read/write performance. This is the reason why Western Digital's 10,000 RPM Raptor can barely outperform the best 7200 RPM drives today.

Times are changing, however, and while a single desktop application may be sequential in nature, running two different desktop applications simultaneously changes the dynamics considerably. With Hyper Threading and multi-core processors being the things of the future, we can expect desktop hard disk access patterns to begin to slightly resemble those of servers - with more random accesses. It is with these true multitasking and multithreading environments that technologies such as NCQ can improve performance.

In the Maxtor MaXLine III review, we looked at NCQ as a feature that truly came to life when working in multitasking scenarios. Unfortunately, finding a benchmark to support this theory was difficult. In fact, only one benchmark (the first Multitasking Business Winstone 2004 test) actually showed a significant performance improvement due to NCQ.

After recovering from Part I and realizing that my nForce4 Intel Edition platform had died, I was hard at work on Part II of the dual core story. For the most part, when someone like AMD, Intel, ATI or NVIDIA launches a new part, they just send that particular product. In the event that the new product requires another one (such as a new motherboard/chipset) to work properly, they will sometimes send both and maybe even throw in some memory if that's also a more rare item. Every now and then, one of these companies will decide to actually build a complete system and ship that for review. For us, that usually means that we get a much larger box and we have to spend a little more time pulling the motherboard out of the case so we can test it out on one of our test benches instead - obviously, we never test a pre-configured system supplied by any manufacturer. This time around, both Intel and NVIDIA sent out fully configured systems for their separate reviews - two great huge boxes blocking our front door now.

When dissecting the Intel system, I noticed something - it used a SATA Seagate Barracuda 7200.7 with NCQ support. Our normal testbed hard drive is a 7200.7 Plus, basically the same drive without NCQ support. I decided to make Part I's system configuration as real world as possible and I used the 7200.7 with NCQ support. So, I used that one 7200.7 NCQ drive for all of the tests for Monday's review. Normally, only being able to run one system at a time would be a limitation. But given how much work I had to put into creating the tests, I wasn't going to be able to run multiple things at the same time while actually using each machine, so this wasn't a major issue. The results turned out as you saw in the first article and I went on with working on Part II.

For Part II, I was planning to create a couple more benchmarks, so I wasn't expecting to be able to compare things directly to Part I. I switched back to our normal testbed HDD, the 7200.7 Plus. Using our normal testbed HDD, I was able to set up more systems in parallel (since I had more HDDs) and thus, testing went a lot quicker. I finished all of the normal single threaded application benchmarks around 3AM (yes, including gaming tests) and I started installing all of the programs for my multitasking scenarios.

When I went to run the first multitasking scenario, I noticed something was very off - the DVD Shrink times were almost twice what they were in Monday's review. I spent more time working with the systems and uncovered that Firefox and iTunes weren't configured identically to the systems in Monday's review, so I fixed those problems and re-ran. Even after re-running, something still wasn't right - the performance was still a lot slower. It was fine in all other applications and tests, just not this one. I even ran the second multitasking scenario from Monday's review and the performance was dead on - something was definitely up. Then it hit me...NCQ.

I ghosted my non-NCQ drive to the NCQ drive and re-ran the test. Yep, same results as Monday. The difference was NCQ! Johan had been pushing me to use a Raptor in the tests to see how much of an impact disk performance had on them, and the Raptor sped things up a bit, but not nearly as much as using the 7200.7 did. How much of a performance difference? The following numbers use the same configuration from Monday's article, with the only variable being the HDD. I tested on the Athlon 64 FX-55 system:

Seagate Barracuda 7200.7 NCQ - 25.2 minutes
Seagate Barracuda 7200.7 no NCQ - 33.6 minutes
Western Digital Raptor WD740 - 30.9 minutes

The performance impact of NCQ is huge. But once again, just like the first NCQ article, this is the only test that I can get to be impacted by NCQ - the other Multitasking Scenarios remain unchanged.  Even though these numbers were run on the AMD system, I managed to get similar results out of the Intel platform. Although, for whatever reason, the Intel benchmarks weren't nearly as consistent as the AMD benchmarks.  Given that we're dealing with different drive controllers and vastly different platforms, there may be many explanations for that.

At first, I thought that this multitasking scenario was the only one where NCQ made an impact, but as you'll find out later on in this article, that's not exactly true.

Multitasking Performance Multitasking Scenario 2: File Compression
Comments Locked

106 Comments

View All Comments

  • mlittl3 - Wednesday, April 6, 2005 - link

    Hey Anand,

    A really cool multitasking scenario for gaming would be running a game with something like Skype in the background. Everyone saying that a respectable gamer (whatever that means) would not run multi applications in the background is not thinking about VoIP.

    I am in Louisiana and I like to game with my friend in Georgia. We talk to each other using Skype will playing Halo on the same server. I know the overhead necessary for VoIP must slow things down some.

    Won't dual core help in this case?
  • Aikouka - Wednesday, April 6, 2005 - link

    I know I personally have a lot of things open when gaming, especially if I'm playing World of Warcraft. I'll typically alt-tab out of the game to check IRC or Firefox (with a bunch of tabs open) to look something up or if I'm bored, just browse the net a little bit.

    The only problem I ever have with slowdowns is if the game is highly CPU-bound and uses up 100% of my CPU, which WoW does almost all the time.
  • Rand - Wednesday, April 6, 2005 - link

    #10- I'm inclined to agree, but people did request it so presumably some people ar interested in doing so for whatever reason.

    "I don´t close AVG and MS Antispyware and MSN and outlook and IExplorer everytime i open warcraft or half life 2, so...WHO MADE ME BELIEVE AMD WAS FASTER?"

    Merely running applications in the background isn't going to do much to benefit DualCore/SMP unless those applications are actually utilizing the processor. Odds are MSN/Outlook/Spyware/Anti-virus probably aren't doing a thing but sitting idle when your gaming.

  • marcusgarcia - Wednesday, April 6, 2005 - link

    #10: There surely is.

    I play warcraft III online, which is a RTS game.
    Being so, not all actions are dependant on my reflex (in fact i can many times minimize the game for around 20 seconds which is the time my char takes to walk to a certain place on a given map).

    That being said, i am ALWAYS with a few instances of internet explorer open, MSN open, outlook express open and of course AVG and MS Anti spyware loaded on memory with real time protection.
    Add the fact that sometimes i am viewing and being viewed on MSN webcam.

    I'm sure MANY more people do that.

    Remember not all players are FPS gamers...in fact, FPS is far behind MMORPG in sales, which doesn't require near as much attention and reflex.
  • marcusgarcia - Wednesday, April 6, 2005 - link

    OK.
    Something is very wrong here.

    I mean, WHY DIDN'T ALL SITES DO THESE TESTS WHEN HT WAS LAUNCHED?

    It clearly shows here what is MUCH better when it comes under regular usage.
    A Pentium 4 3.0 ghz is beating AMD's trash on 3500+....i mean, WTF?

    Almost noone (does anyone at all?) goes closing all applications before gaming or doing any other activity and HT is clearly giving AMD a serious beating on the multi-tasking scenario (read: EVERYONE's usage).

    I don´t close AVG and MS Antispyware and MSN and outlook and IExplorer everytime i open warcraft or half life 2, so...WHO MADE ME BELIEVE AMD WAS FASTER?

    I mean...dude...are we talking about servers here to compare single threaded performance?
    Are we still on Windows 3.11?


    By the way, how in the hell aren't they including Half Life's 2 performance?

    Surely the physics engine plays quite a bit on processing and even more surely it is done on separate threads, which would show the dual core being strong even on a single application, let alone on a multi-tasking one.

    I'm quite repented for having an Athlon 64 3000+ as my CPU right now when the Pentium 4 3.0 HT would be clearly outspacing the Athlon in every respect as long as i was multi-tasking/opening/closing/minimizing things (e.g.: ALWAYS).

    Damn at all these sites.
  • boban10 - Wednesday, April 6, 2005 - link

    i think that this real-world multitasking testing done by Anandtech is 1000 times better than one syntetic benchmark, that is most time optimized for one or another cpu....
    someone agree ?


    ronaldo
  • mbhame - Wednesday, April 6, 2005 - link

    I'm sorry but I find the premise of Page 11 borderline absurd. I *cannot* fathom there's a respectable amount of gamers that actually do that on a regular basis.
  • Anand Lal Shimpi - Wednesday, April 6, 2005 - link

    WooDaddy

    Multi-core multitasking is already quite difficult, you have no idea how frustrating last weekend was. The issue is that I can sit with you on a computer and show you all the areas that dual core will improve performance, but quantifying it so I can stick a bunch of bars in a graph is far more difficult. AMD and Intel are actively working with BAPCo on SYSMark 2006 that should be much more multi-core friendly, but until then we're left with a lot of hard work. We're trying to write our own benchmarks as well, it's just that they take quite a bit of time to put together.

    Take care,
    Anand
  • Anand Lal Shimpi - Wednesday, April 6, 2005 - link

    Thanks for pointing out the graph error, the labels just got messed up it looks like; should be fixed now.

    Remember AMD is talking about a 2H 2005 launch for dual core Athlon 64 on the desktop, don't expect to see reviews of desktop parts anytime soon.

    As far as the encoding comment goes, it's tough for me to actually elaborate without stepping into areas I can't get into just yet. Let's just say that the dual core Athlon 64 running at 2.2GHz won't be compared to a dual core Pentium D running at 2.8GHz.

    Take care,
    Anand
  • WooDaddy - Wednesday, April 6, 2005 - link

    Ok Anand, either you're slick or you're slipping. History shows you're slick..

    You said a dual-core A64 won't help in encoding apps. I know you're not one to say stuff just because you THINK it's true, but because you KNOW so. I'm not at T0M H4rdware..

    So.. Since you're alluding to it, WHEN'S THE DUAL CORE A64 TEST COMING OUT!?!?! *pant**pant*

    Seriously though, I see that this multicore, multi-tasking benchmarking is going to get quite difficult. How do you know just how fast it really is considering all the combinations of different apps you will have running in the background? It Madonion or those other benchmarking guys going to be coming out with a synthetic benchmarking tool to gauge the max performance of these new multi-core processors?

Log in

Don't have an account? Sign up now