Multitasking Performance

As we discovered in the first article, multitasking performance requires a slightly different approach to benchmarking methodology.  While for single application performance in which we test with a system that's in a very clean state with nothing but the benchmark and drivers loaded, for our multitasking tests, we have the system configured as what a real system would be.  That means tons of programs and lot's of tasks running in the background.  If you missed Part I, here's a quick recap of what our system configuration is like for our multitasking tests; the following applications were installed:

Daemon Tools
Norton AntiVirus 2004 (with latest updates)
Firefox 1.02
DVD Shrink 3.2
Microsoft AntiSpyware Beta 1.0
Newsleecher 2.0
Visual Studio .NET 2003
Macromedia Flash Player 7
Adobe Photoshop CS
Microsoft Office 2003
3ds max 7
iTunes 4.7.1
Trillian 3.1
DivX 5.2.1
AutoGK 1.60
Norton Ghost 2003
Adobe Reader 7

What's important about that list is that a handful of those programs were running in the background at all times, primarily Microsoft's AntiSpyware Beta and Norton AntiVirus 2004.  Both the AntiSpyware Beta and NAV 2004 were running with their real-time protection modes enabled, to make things even more real world.


Multitasking Scenario 1: DVD Shrink

For this test, we used DVD Shrink, one of the simplest applications available to compress and re-encode a DVD to fit on a single 4.5GB disc.  We ran DVD Decrypt on the Star Wars Episode VI DVD so that we had a local copy of the DVD on our test bed hard drive (in a future version of the test, we may try to include DVD Decrypt performance in our benchmark as well).  All of the DVD Shrink settings were left at default, including telling the program to assume a low priority, a setting that many users check in order to be able to do other things while DVD Shrink is working. 

Next, we did the following:

1) Open Firefox and load the following web pages in tabs (we used local copies of all of the web pages):

We kept the browser on the AT front page.

2) Open iTunes and start playing the latest album of avid AnandTech reader 50 Cent on repeat all.
3) Open Newsleecher.
4) Open DVD Shrink.
5) Login to our news server and start downloading headers for our subscribed news groups.
6) Start backup of Star Wars Episode VI - Return of the Jedi.  All default settings, including low priority.

DVD Shrink was the application in focus. This matters because by default, Windows gives special scheduling priority to the application currently in the foreground (we will test what happens when it's not in the foreground later in this article).  We waited until the DVD Shrink operation was complete and recorded its completion time. Below are the results:

Multitasking Performance - Scenario 1

The results here aren't too surprising. With dual core, you can get a lot more done at once, so the Pentium D 2.8 cuts the DVD Shrink encode time by about half when compared to the Athlon 64 3500+. 

There is one element that caught us off guard, however. When looking at these numbers, we noticed that they were unusually high compared to the numbers from our first article.  Yet, we ran and re-ran the numbers and had fairly consistent results.  Even running the CPUs at the same speeds as in our first article yielded lower performance than what we saw in that piece.  Comparatively, the processors all performed the same with reference to each other, but the DVD Shrink times were all noticeably higher.  So, we started digging, and what we uncovered was truly interesting.

Gaming Performance The Impact of NCQ on Multitasking Performance
Comments Locked

106 Comments

View All Comments

  • BoBOh - Monday, April 11, 2005 - link

    Where are the code compile tests. We're not all gamers, some are software developers! :)

    BoB
  • NightCrawler - Saturday, April 9, 2005 - link

    Dual core Athlon 64's in June ?
  • fitten - Saturday, April 9, 2005 - link

    - also, there should be (SMT) after simultaneous multi-threading in the quote from the paper on the IBM site.
  • fitten - Saturday, April 9, 2005 - link

    - quote should be in front of "Scalable not after.
  • fitten - Saturday, April 9, 2005 - link

    a) By definition, Intel's implementation must be different than IBM's or anyone elses' because the CPUs aren't implemented the same. Not only do they implement different ISAs, but the entire architectures are different... different number of registers, different ISA, different designs.

    2) Intel's definition of HyperThreading: http://www.intel.com/technology/hyperthread/

    D) This paper http://domino.watson.ibm.com/acas/w3www_acas.nsf/i...$FILE/heinrich.pdf , found on IBM's site by searching, is entitled Scalable "Multi-threaded Multiprocessor Architectures". The first paragraph states: "The former [hardware multi-threading], in the form of hyper-threading (HT) or simultaneous multi-threading, appears in the Intel Xeon and Pentium 4, and the IBM POWER5."
  • Reflex - Friday, April 8, 2005 - link

    Well first off, I am not going to do everyone's homework on this, the info is out there, you all have Google. If you ask a IBM engineer if what Intel is doing is the same as what they are doing, or even if it is really SMT, they would tell you flat out that it is not and they fullfill completely different needs in their products and are implemented completely different. Your definition seems to be that the hardware can accept two threads, therefore it is SMT. That is a VERY simplisitic definition of what SMT is, when there are actually many variations on the concept(HT is a variation, but it is not what most CPU engineers consider actual SMT).

    One of the primary issues here is that HT does not actually allow two simultanious threads, it is more of a enhanced thread scheduler that attempts to fill unused units with jobs that are pending. A true SMT CPU is actually architecturally able to execute two simultanious threads, its not just filling in idle parts of the pipeline with something to do(highly parallel designs). There is a ton of info on this, if you care I suggest you do the research yourself, I don't have the time(and in some ways the expertise) to write a lengthy article on the topic.

    Alternatly, you can just buy into the marketing I suppose, its no skin off my teeth.
  • fitten - Friday, April 8, 2005 - link

    I was going to comment on the phrase "true SMT" above. I'm wondering if this comes from the same lines of thought as the "true dual-core" arguments.

    Anyway, "HyperThreading" (HT) is just Intel marketing terminology for Symmetric MultiThreading (SMT). They are one and the same, with the same design goals... to more effectively utilize core resources by keeping the resources more busy instead of sitting around idle, particularly at the time granularity of cache misses and/or latencies.
  • defter - Friday, April 8, 2005 - link

    #93 "Intel has labeled it as SMT, however there is another name for what they are doing(that I cannot remember at the moment). What they are calling SMT is nowhere even close to solutions like Power."

    Well please tell us the exact definition of SMT and the difference between the multithreading in Power and P4?


    "That aside, the implementation Intel has chosen is designed to make up for inefficiencies in the Prescott pipeline"

    In Prescott pipeline? Why did the HT exist in Northwood based Xeons then? Of course the SMT is designed to reduce inefficiencies in the pipeline. If the CPU can utilize most of its resources when running a single thread there isn't a point of implementing SMT.
  • saratoga - Friday, April 8, 2005 - link

    #93: Intel labeled SMT Hyperthreading. It is effectively the same as what the newer Power processors do (make one core two threads wide).

    It also was not designed for Prescott, rather it was included in the P7 core from the beginning. For this reason it was available on P4s prior to Prescott.
  • saratoga - Friday, April 8, 2005 - link

    #80:

    HT improves the utilization of execution resources. Its not a bandaid, its a design choice. In some cases it can be used to compensate for some other weakness, in others it can simply be to increase throughput on multithreaded workloads.

    Sun and IBM use it because they build server systems and SMT makes a large difference in traditional server loads.

    Intel uses it because they realized it would work well with the P4. I don't know why AMD does not use it. Probably because they don't think the Athlon has enough unused hardware on typical loads to justify the extra transistors. Or maybe just because the Athlon was not designed with it in mind and they can't justify redoing the whole thing to add a single feature. Or maybe a combination of the two.

Log in

Don't have an account? Sign up now