Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors

Name: Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
Item: Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
Author: Anand Lal Shimpi

by Anand Lal Shimpi on December 30, 2005 11:36 AM EST

Posted in
CPUs

84 Comments | Add A Comment

84 Comments

Intel's move to their 65nm process has gone extremely well. We've had 65nm Presler, Cedar Mill and Yonah samples for the past couple of months now and they have been just as good as final, shipping silicon. Just a couple of months ago we previewed Intel's 65nm Pentium 4 and showcased their reduction in power consumption as well as took an early look at overclocking potential of the chips.

Intel's 65nm Pentium 4s will be the last Pentium 4s to come out of Santa Clara and while we'd strongly suggest waiting to upgrade until we've seen what Conroe will bring us, there are those who can't wait another six months, and for those who are building or buying systems today, we need to find out if Intel's 65nm Pentium 4 processors are any more worthwhile than the rather disappointing chips that we had at 90nm.

The move to 90nm for Intel was highly anticipated, but it could not have been any more disappointing from a performance standpoint. In a since abandoned quest for higher clock speeds, Intel brought us Prescott at 90nm with its 31 stage pipeline - up from 20 stages in the previous generation Pentium 4s. Through some extremely clever and effective engineering, Prescott actually wasn't any slower than its predecessors, despite the increase in pipeline stages. What Prescott did leave us with, however, was a much higher power bill. Deeply pipelined processors generally consume a lot more power, and Prescott did just that.

Intel tried to minimize the negative effects of Prescott as much as possible through technologies like their Enhanced Intel SpeedStep (EIST). However, at the end of the day, the fastest Athlon 64 consumed less power under full load than the slowest Prescott at idle. Considering that most PCs actually spend the majority of their time idling, this was truly a letdown from Intel.

With 65nm, the architecture of the chips won't change at all - in fact, the single-core 65nm Pentium 4s based on the Cedar Mill core will be identical to the current Pentium 4 600 series that we have today (with the inclusion of Intel's Virtualization Technology). So with no architectural changes, the power consumption at 65nm should be lower than at 90nm. As we found in our first article on Intel's 65nm chips, power consumption did indeed go down quite a bit; however, it's still not low enough to be better than AMD. It will take Conroe before Intel can offer a desktop processor with lower power consumption than AMD's 90nm Athlon 64 line.

In an odd move, just before the end of 2005, Intel is introducing their first 65nm processor. Not the Cedar Mill based Pentium 4 and not even the Presler based Pentium D, but rather the Presler based Pentium Extreme Edition 955.

The Presler core is Intel's dual-core 65nm successor to Smithfield, which as you will remember was Intel's first dual-core processor. Presler does actually offer one architectural improvement over Smithfield and that is the use of a 2MB L2 cache per core, up from 1MB per core in Smithfield. Other than that, Presler is pretty much a die-shrunk version of Smithfield.

With 2MB cache on each core, the transistor count of Presler has gone up a bit. While Smithfield weighed in at a whopping 230M transistors, Presler is now up to 376M. The move to 65nm has actually made the chip smaller at 162 mm², down from 206 mm². With a smaller die size, Presler is actually cheaper for Intel to make than Smithfield, despite having twice the cache. Equally impressive is that Cedar Mill, the single core version, measures in at a meager 81 mm².

The Extreme Edition incarnation of Presler brings back support for the 1066MHz FSB, which you may remember was lost with the original move to dual-core. Given that both cores on the chip have to share the same bus, more FSB bandwidth will always help performance.

The Pentium Extreme Edition 955 runs at 3.46GHz (1066MHz FSB), thus giving it a clock speed advantage over all of Intel's other dual-core processors. And as always, the EE chip offers Hyper Threading support on each of its two cores allowing the chip to handle a maximum of four threads at the same time. Since it's an Extreme Edition chip, the 955 will be priced at $999. If you're curious about the cheaper, non-Extreme versions of Presler, here is Intel's 65nm dual-core roadmap for 2006:

Intel Dual Core Desktop
CPU	Core	Clock	FSB	L2 Cache
???	Conroe	???	???	4MB
???	Conroe	???	???	2MB
950	Presler	3.4GHz	800MHz	2x2MB
940	Presler	3.2GHz	800MHz	2x2MB
930	Presler	3.0GHz	800MHz	2x2MB
920	Presler	2.8GHz	800MHz	2x2MB

As you can see, the Extreme Edition 955 will be the first, but definitely not the only dual-core 65nm processor out in the near future, so don't let the high price tag worry you. The remaining 900 series Pentium D chips should come with prices much closer to the equivalent 800 series.

Power Consumption and The Test

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

84 Comments

View All Comments

Anand Lal Shimpi - Friday, December 30, 2005 - link
I had some serious power/overclocking issues with the pre-production board Intel sent for this review. I could overclock the chip and the frequency would go up, but the performance would go down significantly - and the chip wasn't throttling. Intel has a new board on the way to me now, and I'm hoping to be able to do a quick overclocking and power consumption piece before I leave for CES next week.

Take care,
Anand
Betwon - Friday, December 30, 2005 - link

quote:

We tested four different scenarios:

1. A virus scan + MP3 encode
2. The first scenario + a Windows Media encode
3. The second scenario + unzipping files, and
4. The third scenario + our Splinter Cell: CT benchmark.

The graph below compares the total time in seconds for all of the timed tasks (everything but Splinter Cell) to complete during the tests:

AMD Athlon 64 X2 4800+ AVG LAME WME ZIP Total
AVG + LAME 22.9s 13.8s 36.7s
AVG + LAME + WME 35.5s 24.9s 29.5s 90.0s
AVG + LAME + WME + ZIP 41.6s 38.2s 40.9s 56.6s 177.3s
AVG + LAME + WME + ZIP + SCCT 42.8s 42.2s 46.6s 65.9s 197.5s

Intel Pentium EE 955 (no HT) AVG LAME WME ZIP Total
AVG + LAME 24.8s 13.7s 38.5s
AVG + LAME + WME 39.2s 22.5s 32.0s 93.7s
AVG + LAME + WME + ZIP 47.1s 37.3s 45.0s 62.0s 191.4s
AVG + LAME + WME + ZIP + SCCT 40.3s 47.7s 58.6s 83.3s 229.9s

We find that it isn't scientific. Anandtech is wrong.
You should give the end time of the last completed task, but not the sum of each task's time.

For expamle: task1 and task2 work at the same time

System A only spend 51s to complete the task1 and task2.
task1 -- 50s
task2 -- 51s

System B spend 61s to complete the task1 and task2.
task1 -- 20s
task2 -- 61s

It is correct: System A(51s) is faster than System B(61s)
It is wrong: System A(51s+50s=101s) is slower than System B(20s+61s=81s)
tygrus - Tuesday, January 3, 2006 - link
The problem is they don't all finish at the same time and the ambiguous work of a FPS task running.

You could start them all and measure the time taken for all tasks to finish. That's a workload but it can be susceptible to the slowest task being limited by its single thread performance (once all other tasks are finished, SMP underutilised).

Another way is for tasks that take longer and run at a measurable and consistent speed.
Is it possible to:
* loop the tests with a big enough working set (that insures repeatable runs);
* Determine average speed of each sub-test (or runs per hour) while other tasks are running and being monitored;
* Specify a workload based on how many runs, MB, Frames etc. processed by each;
* Calculate the equivalent time to do a theoretical workload (be careful of the method).

Sub-tasks time/speed can be compared to when they were run by themselves (single thread, single active task). This is complicated by HyperThreading and also multi-threaded apps under test. You can work out the efficiency/scaling of running multiple tasks versus one task at a time.

You could probably rejig the process priorities to get better 'Splinter Cell' performance.
Viditor - Saturday, December 31, 2005 - link
Scoring needs to be done on a focused window...
By doing multiple runs with all of the programs running simultaneously, it's possible to extract a speed value for each of the programs in turn, under those conditions. The cumulative number isn't representative of how long it actually took, but it's more of a "score" on the performance under a given set of conditions.
Betwon - Saturday, December 31, 2005 - link
NO! It is the time(spend time) ,not the speed value.
You see:
24.8s + 13.7s = 38.5s
42.8s + 42.2s + 46.6s + 65.9s = 197.5s

Anandtech's way is wrong.
Viditor - Saturday, December 31, 2005 - link

quote:
It is the time(spend time), not the speed value

It's a score value...whether it's stated in time or even an arbitrary number scale matters very little. The values are still justified...
Betwon - Saturday, December 31, 2005 - link
You don't know how to test.
But you still say it correct.

We all need the explains from anandtech.
Viditor - Saturday, December 31, 2005 - link

quote:
You don't know how to test

Then I better get rid of these pesky Diplomas, eh?
I'll go tear them up right now...:)
Betwon - Saturday, December 31, 2005 - link
I mean: You don't know how the anandtech go on the tests.
The way of test.
What is the data.

We only need the explain from anandtech, but not from your guess.

Because you do not know it!
you are not anandtech!
Viditor - Saturday, December 31, 2005 - link
Thank you for the clarification (does anyone have any sticky tape I could borrow? :)
What we do know is:
1. All of the tests were started simultaneously..."To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time"
2. The 2 ways to measure are: finding out individual times in a multitasking environment (what I think they have done), or producing a batch job (which is what I think you're asking for) and getting a completion time.

Personally, I think that the former gives us far more usefull information...
However, neither scenario is more scientifically correct than the other.

Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors

Post Your Comment

84 Comments

View All Comments

Anand Lal Shimpi - Friday, December 30, 2005 - link

Betwon - Friday, December 30, 2005 - link

tygrus - Tuesday, January 3, 2006 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Betwon - Saturday, December 31, 2005 - link

Viditor - Saturday, December 31, 2005 - link

Log in

Don't have an account? Sign up now