Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors

Name: Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
Item: Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
Author: Anand Lal Shimpi

by Anand Lal Shimpi on December 30, 2005 11:36 AM EST

Posted in
CPUs

84 Comments | Add A Comment

84 Comments

Multi-core support in Games?

Both Quake 4 and Call of Duty 2 now have SMP support, supposedly offering performance improvements on dual core and/or Hyper Threading enabled processors.

For Call of Duty 2, you simply install the new patch and off you go; SMP support is enabled. To verify, we ran our CoD 2 benchmark and kept a log of the total processor utilization over time. Below is a shot of perfmon with a fresh install of CoD2 (sans SMP patch):

Note how the total CPU utilization for our dual-core testbed hovers right around 50%, with the maximum being just under 52% (the remaining 2% can be attributed to driver and other overhead that can eat up extra CPU cycles).

Now, let's look at CoD2 CPU utilization with the SMP patch installed:

While the average CPU utilization only goes up by around 9%, the maximum CPU utilization increases tremendously, now up to 83%, showing us that the second core is being used.

We looked at performance at 1024x768 and obviously the higher the resolution, the lesser the impact of a faster CPU (at the same time, the lower the resolution, the greater the impact will be as the game becomes less GPU limited).

To ensure a fair comparison, we tested using the SMP patch and simply disabled SMP manually by setting the r_smp_backend variable to "0". We confirmed that SMP support was actually disabled by running perfmon and measuring CPU utilization.

Call of Duty 2	SMP Disabled	SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz)	80.6	N/A
AMD Athlon 64 X2 4800+ (2.4GHz)	79.8	70.3
AMD Athlon 64 X2 3800+ (2.0GHz)	78.7	68.1
Intel Pentium Extreme Edition 955 (3.46GHz)	79.8	68.4
Intel Pentium Extreme Edition 840 (3.2GHz)	78.1	68
Intel Pentium D 820 (2.8GHz)	75.6	67.1

Surprisingly enough, we actually saw pretty large performance drops in CoD2 with SMP enabled across both AMD and Intel platforms. This is unfortunate, but the withdrawn SMP support of Quake 3 makes it less than shocking. We do expect that things will get better as time goes on.

Quake 4 was a different story; with r_useSMP enabled, we saw some extremely large performance gains with the move to dual core:

Quake 4	SMP Disabled	SMP Enabled
AMD Athlon 64 FX-57 (2.8GHz)	115.4	N/A
AMD Athlon 64 X2 4800+ (2.4GHz)	114.9	147.4
AMD Athlon 64 X2 3800+ (2.0GHz)	100.9	143.2
Intel Pentium Extreme Edition 955 (3.46GHz)	98.9	142.3
Intel Pentium Extreme Edition 840 (3.2GHz)	89.0	133.6
Intel Pentium D 820 (2.8GHz)	80.6	125.5

The SMP patch either only spawns two threads, or the instruction mix of Quake 4 with the patch does not mix well with Intel's Pentium EE 955. The dual core with Hyper Threading enabled platform didn't do anything at all for performance.

While we're only looking at two games, this is a start for multithreaded game development. You can expect to see a lot of examples where dual-core does absolutely nothing for gaming, but as time goes on, the situation will change.

Presler vs. Smithfield - A Brief Look Dual Core and Hyper Threading: Detriment or Not?

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

84 Comments

View All Comments

Anand Lal Shimpi - Friday, December 30, 2005 - link
I had some serious power/overclocking issues with the pre-production board Intel sent for this review. I could overclock the chip and the frequency would go up, but the performance would go down significantly - and the chip wasn't throttling. Intel has a new board on the way to me now, and I'm hoping to be able to do a quick overclocking and power consumption piece before I leave for CES next week.

Take care,
Anand
Betwon - Friday, December 30, 2005 - link

quote:

We tested four different scenarios:

1. A virus scan + MP3 encode
2. The first scenario + a Windows Media encode
3. The second scenario + unzipping files, and
4. The third scenario + our Splinter Cell: CT benchmark.

The graph below compares the total time in seconds for all of the timed tasks (everything but Splinter Cell) to complete during the tests:

AMD Athlon 64 X2 4800+ AVG LAME WME ZIP Total
AVG + LAME 22.9s 13.8s 36.7s
AVG + LAME + WME 35.5s 24.9s 29.5s 90.0s
AVG + LAME + WME + ZIP 41.6s 38.2s 40.9s 56.6s 177.3s
AVG + LAME + WME + ZIP + SCCT 42.8s 42.2s 46.6s 65.9s 197.5s

Intel Pentium EE 955 (no HT) AVG LAME WME ZIP Total
AVG + LAME 24.8s 13.7s 38.5s
AVG + LAME + WME 39.2s 22.5s 32.0s 93.7s
AVG + LAME + WME + ZIP 47.1s 37.3s 45.0s 62.0s 191.4s
AVG + LAME + WME + ZIP + SCCT 40.3s 47.7s 58.6s 83.3s 229.9s

We find that it isn't scientific. Anandtech is wrong.
You should give the end time of the last completed task, but not the sum of each task's time.

For expamle: task1 and task2 work at the same time

System A only spend 51s to complete the task1 and task2.
task1 -- 50s
task2 -- 51s

System B spend 61s to complete the task1 and task2.
task1 -- 20s
task2 -- 61s

It is correct: System A(51s) is faster than System B(61s)
It is wrong: System A(51s+50s=101s) is slower than System B(20s+61s=81s)
tygrus - Tuesday, January 3, 2006 - link
The problem is they don't all finish at the same time and the ambiguous work of a FPS task running.

You could start them all and measure the time taken for all tasks to finish. That's a workload but it can be susceptible to the slowest task being limited by its single thread performance (once all other tasks are finished, SMP underutilised).

Another way is for tasks that take longer and run at a measurable and consistent speed.
Is it possible to:
* loop the tests with a big enough working set (that insures repeatable runs);
* Determine average speed of each sub-test (or runs per hour) while other tasks are running and being monitored;
* Specify a workload based on how many runs, MB, Frames etc. processed by each;
* Calculate the equivalent time to do a theoretical workload (be careful of the method).

Sub-tasks time/speed can be compared to when they were run by themselves (single thread, single active task). This is complicated by HyperThreading and also multi-threaded apps under test. You can work out the efficiency/scaling of running multiple tasks versus one task at a time.

You could probably rejig the process priorities to get better 'Splinter Cell' performance.
Viditor - Saturday, December 31, 2005 - link
Scoring needs to be done on a focused window...
By doing multiple runs with all of the programs running simultaneously, it's possible to extract a speed value for each of the programs in turn, under those conditions. The cumulative number isn't representative of how long it actually took, but it's more of a "score" on the performance under a given set of conditions.
Betwon - Saturday, December 31, 2005 - link
NO! It is the time(spend time) ,not the speed value.
You see:
24.8s + 13.7s = 38.5s
42.8s + 42.2s + 46.6s + 65.9s = 197.5s

Anandtech's way is wrong.
Viditor - Saturday, December 31, 2005 - link

quote:
It is the time(spend time), not the speed value

It's a score value...whether it's stated in time or even an arbitrary number scale matters very little. The values are still justified...
Betwon - Saturday, December 31, 2005 - link
You don't know how to test.
But you still say it correct.

We all need the explains from anandtech.
Viditor - Saturday, December 31, 2005 - link

quote:
You don't know how to test

Then I better get rid of these pesky Diplomas, eh?
I'll go tear them up right now...:)
Betwon - Saturday, December 31, 2005 - link
I mean: You don't know how the anandtech go on the tests.
The way of test.
What is the data.

We only need the explain from anandtech, but not from your guess.

Because you do not know it!
you are not anandtech!
Viditor - Saturday, December 31, 2005 - link
Thank you for the clarification (does anyone have any sticky tape I could borrow? :)
What we do know is:
1. All of the tests were started simultaneously..."To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time"
2. The 2 ways to measure are: finding out individual times in a multitasking environment (what I think they have done), or producing a batch job (which is what I think you're asking for) and getting a completion time.

Personally, I think that the former gives us far more usefull information...
However, neither scenario is more scientifically correct than the other.

Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors

Post Your Comment

84 Comments

View All Comments

Anand Lal Shimpi - Friday, December 30, 2005 - link

Betwon - Friday, December 30, 2005 - link

tygrus - Tuesday, January 3, 2006 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Log in

Don't have an account? Sign up now