Intel's move to their 65nm process has gone extremely well.  We've had 65nm Presler, Cedar Mill and Yonah samples for the past couple of months now and they have been just as good as final, shipping silicon.  Just a couple of months ago we previewed Intel's 65nm Pentium 4 and showcased their reduction in power consumption as well as took an early look at overclocking potential of the chips. 

Intel's 65nm Pentium 4s will be the last Pentium 4s to come out of Santa Clara and while we'd strongly suggest waiting to upgrade until we've seen what Conroe will bring us, there are those who can't wait another six months, and for those who are building or buying systems today, we need to find out if Intel's 65nm Pentium 4 processors are any more worthwhile than the rather disappointing chips that we had at 90nm. 

The move to 90nm for Intel was highly anticipated, but it could not have been any more disappointing from a performance standpoint.  In a since abandoned quest for higher clock speeds, Intel brought us Prescott at 90nm with its 31 stage pipeline - up from 20 stages in the previous generation Pentium 4s.  Through some extremely clever and effective engineering, Prescott actually wasn't any slower than its predecessors, despite the increase in pipeline stages.  What Prescott did leave us with, however, was a much higher power bill.  Deeply pipelined processors generally consume a lot more power, and Prescott did just that. 

Intel tried to minimize the negative effects of Prescott as much as possible through technologies like their Enhanced Intel SpeedStep (EIST).  However, at the end of the day, the fastest Athlon 64 consumed less power under full load than the slowest Prescott at idle.  Considering that most PCs actually spend the majority of their time idling, this was truly a letdown from Intel. 

With 65nm, the architecture of the chips won't change at all - in fact, the single-core 65nm Pentium 4s based on the Cedar Mill core will be identical to the current Pentium 4 600 series that we have today (with the inclusion of Intel's Virtualization Technology).  So with no architectural changes, the power consumption at 65nm should be lower than at 90nm.  As we found in our first article on Intel's 65nm chips, power consumption did indeed go down quite a bit; however, it's still not low enough to be better than AMD.  It will take Conroe before Intel can offer a desktop processor with lower power consumption than AMD's 90nm Athlon 64 line. 

In an odd move, just before the end of 2005, Intel is introducing their first 65nm processor.  Not the Cedar Mill based Pentium 4 and not even the Presler based Pentium D, but rather the Presler based Pentium Extreme Edition 955. 

The Presler core is Intel's dual-core 65nm successor to Smithfield, which as you will remember was Intel's first dual-core processor.  Presler does actually offer one architectural improvement over Smithfield and that is the use of a 2MB L2 cache per core, up from 1MB per core in Smithfield.  Other than that, Presler is pretty much a die-shrunk version of Smithfield. 

With 2MB cache on each core, the transistor count of Presler has gone up a bit.  While Smithfield weighed in at a whopping 230M transistors, Presler is now up to 376M.  The move to 65nm has actually made the chip smaller at 162 mm2, down from 206 mm2.  With a smaller die size, Presler is actually cheaper for Intel to make than Smithfield, despite having twice the cache.  Equally impressive is that Cedar Mill, the single core version, measures in at a meager 81 mm2

The Extreme Edition incarnation of Presler brings back support for the 1066MHz FSB, which you may remember was lost with the original move to dual-core.  Given that both cores on the chip have to share the same bus, more FSB bandwidth will always help performance.

The Pentium Extreme Edition 955 runs at 3.46GHz (1066MHz FSB), thus giving it a clock speed advantage over all of Intel's other dual-core processors.  And as always, the EE chip offers Hyper Threading support on each of its two cores allowing the chip to handle a maximum of four threads at the same time.  Since it's an Extreme Edition chip, the 955 will be priced at $999.  If you're curious about the cheaper, non-Extreme versions of Presler, here is Intel's 65nm dual-core roadmap for 2006:

Intel Dual Core Desktop
CPU Core Clock FSB L2 Cache
??? Conroe ??? ??? 4MB
??? Conroe ??? ??? 2MB
950 Presler 3.4GHz 800MHz 2x2MB
940 Presler 3.2GHz 800MHz 2x2MB
930 Presler 3.0GHz 800MHz 2x2MB
920 Presler 2.8GHz 800MHz 2x2MB

As you can see, the Extreme Edition 955 will be the first, but definitely not the only dual-core 65nm processor out in the near future, so don't let the high price tag worry you. The remaining 900 series Pentium D chips should come with prices much closer to the equivalent 800 series.

Power Consumption and The Test
Comments Locked

84 Comments

View All Comments

  • Betwon - Saturday, December 31, 2005 - link

    NO, 2. is wrong.

    We need to know the end time of all tasks.

    The sum of each task's time will mislead.

    Because it can not show the real time spend to complete those tasks. (Time is overlayed)
  • Viditor - Saturday, December 31, 2005 - link

    quote:

    The sum of each task's time will mislead

    That's what I thought you meant...it's not misleading to me (nor to most of the other readers I gather, since nobody else has come forward). If you want to know the time to complete all tasks, then just take the largest time number of what ever test you wish.
    The reason that the setup they used appeals to me is that it helps me understand how an individual application is affected under those conditions, and the totals give me a relative picture of each of the apps as a whole. They haven't said that the time listed in the "Total" is actually how long things took in reality, they said it was the total of the times.
    I understand that the difference in those two phrases is perhaps a difficulty that many have when understanding a foriegn language...

    In the future, you might want to be less confrontational about your questions...
    Phrases like "There are still many knowledge about CPU that anandtech need to learn" are considered quite inhospitable...
  • fitten - Saturday, December 31, 2005 - link

    No. What is being mentioned here is "Wall Clock Time" vs. summation of execution times. You start a stopwatch at the instant you start your task bundle and when the last task in the bundle is finished, you stop your stopwatch. That's the wall clock time. Measuring CPU utilization time is quite easily seen to be false. with two CPUs, two tasks may take 20s each to finish, but they may start and finish at the same time after 20s of wall clock time... not 20s + 20s = 40s (each task will see 20s of CPU utilization time, but those sets of 20s are simultaneously used... 20s on one CPU and 20s on the other CPU at the same time - for a wall clock finish time of 20s, not 40s).

    And, you cannot simply take the largest time number. For example, suppose a task that runs for 1s is blocked by a second task which takes 10s, then the first task takes another 1s to finish, while 10s is larger than 2s, the wall clock time for this bundle is actually 12s (1s + 10s + 1s), not 10s or 2s.
  • bldckstark - Monday, January 2, 2006 - link

    Ummmmm, all of the times you are screaming about are listed. You can work it out for yourself. Although, when you look at the concurrent timing for each app, you will find that the AMD posted a better score. Concurrent timing results -
    AMD 4800+ - 65.9s
    955EE No HT - 83.3
    955EE With HT - 71.1

    Consecutive times of course show a different picture, and most of all, SPCC is a wreck during all of this for AMD.

    I have to say, I can't remember when I last opened 4 huge memory and CPU hogging programs at exactly the same time that I tried to play a game. These CPU's may be great at doing this many activities at once, but I can only do one thing at a time. Each of these programs would be started separately, and when they are on their way, I might start gaming. This is a great test, but not realistic.

  • Betwon - Friday, December 30, 2005 - link

    Your test of the SMP game --Quake4
    Your result is diffirent with the result of the more detail test from FiringSquad.
    http://www.firingsquad.com/hardware/quake_4_dual-c...">http://www.firingsquad.com/hardware/qua...-core_pe...

    We find that both HT and multi-core will improve the fps. P4 540 HT is about 1x % improvement.

    We need your explains. Why you say that HT will not help the in the the SMP game --Quake4?

    And we do not find that AthlonX2 have the more excellent improvement than PD, when they work (change from single-core-work to multi-core-work).

    Where is the benefits of on-die communication? 101ns latency? why is it slower the lateny of the memory? Is your cache2cache test software wrong?

    The test shows that
    SMPon/SMPoff PD840 102.9 fps/74.8 fps --> 37.6% improvement
    SMPon/SMPoff X2 3800+ 101.1 fps/74.4 fps --> 35.9% improvement
    SMPon/SMPoff X2 4800+ 103.2 fps/87.7 fps --> 17.7% improvement
    AMD test:
    http://www.firingsquad.com/hardware/quake_4_dual-c...">http://www.firingsquad.com/hardware/qua...al-core_...
    Intel test:
    http://www.firingsquad.com/hardware/quake_4_dual-c...">http://www.firingsquad.com/hardware/qua...-core_pe...

    The improvement ratio of PD is better than that of athlonX2.
  • psychobriggsy - Saturday, December 31, 2005 - link

    > SMPon/SMPoff PD840 102.9 fps/74.8 fps --> 37.6% improvement
    > SMPon/SMPoff X2 3800+ 101.1 fps/74.4 fps --> 35.9% improvement
    > SMPon/SMPoff X2 4800+ 103.2 fps/87.7 fps --> 17.7% improvement

    Looks like the issue is an upper performance limit around the 103 fps mark that probably isn't caused by the CPU - e.g., GPU or something else.

    If it is a memory bandwidth issue (which should be easy to test for by using faster memory and running the tests again) then there isn't much that can be done. Then again, the Intel processor uses DDR2 so ...

    If the 4800+ improved by 36% like the 3800+ then it would achieve around 120fps.

    In the end it just shows that the lower-priced dual-cores are still a better deal ... especially as they can be overclocked quite nicely.
  • Viditor - Friday, December 30, 2005 - link

    quote:

    The improvement ratio of PD is better than that of athlonX2.

    I would hope so, since the patch was partially written by Intel...
    quote:

    the 1.0.5 patch mentions Intel by name as a collaborator with no word on AMD...While it isn’t optimized for AMD64, frame rates on a dual-core Athlon 64 X2 3800+ are 63 percent faster at 800x600 with threading enabled. The 4800+ also feeds back good gains

    http://firingsquad.com/hardware/quake_4_dual-core_...">http://firingsquad.com/hardware/quake_4_dual-core_...
  • Betwon - Saturday, December 31, 2005 - link

    PD840 139.1fps/83fps --> 67.6%
    PD840 are 67.6 percent faster at 800x600 with threading enabled.

    67.6% > 63%

    Patch was partially written by Intel...?
    But the patch is very excellent!

    This patch is the most improvement game patch for SMP CPU.
    We can not find that another SMP game patch can improvement the game performent so much.

    Good quality of the codes!
  • Betwon - Saturday, December 31, 2005 - link

    PD840 139.1fps/83fps --> 67.6%
    PD840 are 67.6 percent faster at 800x600 with threading enabled.

    67.6% > 63%

    Patch was partially written by Intel...?
    But the patch is very excellent!

    This patch is the most improvement game patch for SMP CPU.
    We can not find that another SMP game patch can improvement the game performent so much.

    Good quality of the codes!
  • Viditor - Saturday, December 31, 2005 - link

    quote:

    But the patch is very excellent!

    Possibly, but Intel is well known for creating an imbalance in performance for their processors using software (e.g. the Intel Compiler). Most likely, future versions of the patch will correct for this. Either way, it really says less about the CPU than it does the patch...

Log in

Don't have an account? Sign up now