Dual Core and Hyper Threading: Detriment or Not?

A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance.  To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance. 

For multithreaded application performance, we can now turn to a number of benchmarks.  We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):

 3dsmax 7   Composite Score 3dsmax 5 rays CBALLS2 SinglePipe2 UnderWater
HT Enabled 3.0 12.922s 17.297s 83.515s 119.641s
HT Disabled 2.51 14.937s 21.141s 102.734s 141.641s

Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler.  The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):

 Media Encoding  DVD Shrink WME9 H.264 iTunes
HT Enabled 7.1m 46.5fps 9.96m 38s
HT Disabled 8.0m 38.6fps 8.53m 40s

Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications.  The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads. 

Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:

 Gaming   Call of Duty 2 Quake 4
HT Enabled 68.4 142.3
HT Disabled 69.3 142.3

Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data.  Quake 4, on the other hand, shows no difference in performance with SMP on or off. 

From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor.  The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all). 

The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor?  To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time.  We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not. 

We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled.  Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more. 

We tested four different scenarios:
  1. A virus scan + MP3 encode
  2. The first scenario + a Windows Media encode
  3. The second scenario + unzipping files, and
  4. The third scenario + our Splinter Cell: CT benchmark.
The graph below compares the total time in seconds for all of the timed tasks (everything but Splinter Cell) to complete during the tests:

 AMD Athlon 64 X2 4800+   AVG LAME WME ZIP Total
AVG + LAME 22.9s 13.8s     36.7s
AVG + LAME + WME 35.5s 24.9s 29.5s   90.0s
AVG + LAME + WME + ZIP 41.6s 38.2s 40.9s 56.6s 177.3s
AVG + LAME + WME + ZIP + SCCT 42.8s 42.2s 46.6s 65.9s 197.5s

 Intel Pentium EE 955 (no HT)   AVG LAME WME ZIP Total
AVG + LAME 24.8s 13.7s     38.5s
AVG + LAME + WME 39.2s 22.5s 32.0s   93.7s
AVG + LAME + WME + ZIP 47.1s 37.3s 45.0s 62.0s 191.4s
AVG + LAME + WME + ZIP + SCCT 40.3s 47.7s 58.6s 83.3s 229.9s

 Intel Pentium EE 955 (HT Enabled)   AVG LAME WME ZIP Total
AVG + LAME 25.0s 13.3s     38.3s
AVG + LAME + WME 34.4s 21.6s 30.2s   86.2s
AVG + LAME + WME + ZIP 41.5s 28.1s 37.7s 54.2s 161.5s
AVG + LAME + WME + ZIP + SCCT 51.4s 33.0s 45.3s 71.1s 200.8s

As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled.  However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads.  But to get the full picture, we have to measure one last data point: Splinter Cell performance. 

In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell.  The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system.  However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:

 Splinter Cell: CT   Average Min Max
Intel Pentium EE 955 (no HT) 71.0 fps 27.8 fps 128.1 fps
Intel Pentium EE 955 (HT enabled) 77.2 fps 32.5 fps 139.6 fps
AMD Athlon 64 X2 4800+ 66.9 fps 10.5 fps 185.0 fps

The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story.  Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT. 

While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications.  The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system. 

We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor.  The usable limit, even for today's applications and usage models, is far from just two threads.  

Multi-Core Support in Games? Overall Performance using Winstone 2004
Comments Locked

84 Comments

View All Comments

  • JarredWalton - Friday, December 30, 2005 - link

    See above post. The 3800+ OC article has the BF2 benchmarks/tools in it.
  • bob4432 - Friday, December 30, 2005 - link

    thanks, i had just found that. excellent tool ;). what is the difference between average fps and actual fps?
  • Spacecomber - Friday, December 30, 2005 - link

    If you need more direction on how to go about creating and running a timedemo in BF2, take a look at http://www.overclockers.com.au/article.php?id=3841...">this article over at overclockers.com.au.

    The timedemo records the time it takes for each frame to be rendered over the course of the demo being run. It sums these times and divides by the number of frames to come up with an average. You end up with just one number standing in for a rather large collection of data. Some sites, such as hardocp, try to show more than just an average, usually by presenting a graph of the framerates over the length of the timedemo. This can be helpful, because when you are trying to evaluate how well a particular hardware setup will work with your favorite game, you really are looking to see whether it will maintain playable minimun framerates at the resolution and graphics settings that you want to use. An average alone only gives you a rough idea about this, though it does give you a quick and dirty way to compare different video cards in the same game setting.

    If you create and run a Battlefield 2 timedemo and look at the complete results, you'll see how very wide the range of framerates is. For example, running the timedemo, I have gotten an average of 50 fps, but the range is from 2 to 105 fps, with a standard deviation of 12.3. Graphing out the individual frame rates will let you see how often the frame rates drop below 20 fps, for example, which many would consider too low for online gaming.

    http://www.sequoyahcomputer.com/Analysis/BF2memory...">Here is a graph of a BF2 timedemo. It's for the data that gave me an average of 50 fps that I mentioned previously. Although 50 fps sounds like an ok average, looking at the graph, you can see that many might consider these settings on this hardware to be barely playable.

    Space
  • bob4432 - Saturday, December 31, 2005 - link

    thanks, what program did you use to graph the data?
  • Spacecomber - Saturday, December 31, 2005 - link

    The full results of the time demo are saved in a csv file, timedemo_framerates.csv, which can be opened with a spreadsheet program. I used the spreadsheet program in OpenOffice to view the data and eliminate the framerates that are erroneously recorded before the actual gameplay demo has begun (they are easy to recognize, since they are at the begining of the data and unnaturally high), and I also used the spreadsheet program to graph the data.

    Space
  • JarredWalton - Friday, December 30, 2005 - link

    I believe Anand is using the same benchmark that I http://www.anandtech.com/cpuchipsets/showdoc.aspx?...">linked in my Overclocking article. He's probably running the 1.12 version now, which would account for the slightly lower scores than what I got with the 1.03 version and demo files. BF2 is VERY GPU limited, so even at 1024x768 you will start to hit FPS limits on high-end systems. You can see in the above page how FPS scaled with CPU speed on an X2 3800+ chip, and I only improve average frame rates by 18% with a 35% overclock at 1024x768. That dropped to 8% at 1280x1024 and less than 4% at 1600x1200 and above.
  • danidentity - Friday, December 30, 2005 - link

    Has there been any official word on whether or not 975X will support Conroe?
  • coldpower27 - Friday, December 30, 2005 - link

    a 975X Rev 2.0 is probably needed. However the i965 Chipser series for sure as they are rumored to be launched simultaneously.
  • Shintai - Friday, December 30, 2005 - link

    You gonna need i965 I bet for sure, specially if Conroe gonna use a 1333Mhz bus.

    However, Merom should fit in Yonah Socket (Conroe mobile part)
  • Beenthere - Friday, December 30, 2005 - link

    Every hardware site that has tested the power consumption and operating temps of Presler knows full well this is a 65 nano FLAME THROWER almost making the P4 FLAME THROWER look good by comparison. "Normal" operating temps of 80 C are OUTRAGEOUS as is equal or higher power consumption than the FLAME THROWING P4 series. And as the benches show -this is a Hail Mary approach by Intel to baffle the naive with B.S. No one with a clue would touch this inferior CPU design. And to add insult to injury, after the Paper Launch -- when they are actually available for purchase in Feb. or later, the asking price is $999. Yeah, I'll run right out and buy a truckload of Preslers to use for space heaters in my house...

Log in

Don't have an account? Sign up now