Literally Dual Core

One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it.  By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package). 


The chip at the bottom of the image is Presler; note the two individual cores.

Intel's architecture, featuring no on-die memory controller, allows for such a split to be made without any major changes.  Even on Smithfield, all traffic between the cores actually had to travel out one core, off the chip and onto the external FSB and then back into the other core.  With Presler, the same type of communication can take place without any disruptions. The only difference is that the data from core to core has a slightly longer distance to travel. 

In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors.  Johan's description of the utility follows:
"Michael S. started this extremely interesting thread at the Ace's hardware Technical forum. The result was a little program coded by Michael S. himself, which could measure the latency of cache-to-cache data transfer between two cores or CPUs. In his own words: "it is a tool for comparison of the relative merits of different dual-cores."

"Cache2Cache measures the propagation time from a store by one processor to a load by the other processor. The results that we publish are approximately twice the propagation time. For those interested, the source code is available here."
Armed with Cache2Cache, we looked at the added latency seen by Presler over Smithfield:

   Cache2Cache Latency in ns (Lower is Better)
AMD Athlon 64 X2 4800+ 101
Intel Smithfield 2.8GHz 253.1
Intel Presler 2.8GHz 244.2

Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield.  We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results. 

We're not sure why, but core-to-core communication is faster on Presler than on Smithfield.  That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.

Power Consumption and The Test Larger L2, but no increase in latency?
Comments Locked

84 Comments

View All Comments

  • yacoub - Tuesday, January 3, 2006 - link

    quote:

    The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.


    Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?
  • jjunk - Tuesday, January 3, 2006 - link

    quote:

    Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?


    It's right there in the chart. As for further discussion not really necessary. Screaming frame rates might look good on the chart but they don't help game play. A 10 fps min will definately be noticiable.
  • IntelUser2000 - Sunday, January 1, 2006 - link

    quote:

    When we do receive the new motherboard, we will take a look at power consumption once more to get an idea of the final state of Intel's 65nm power consumption, but until then, we don't want to draw any conclusions based on what we've seen.
    '

    I don't like that paragraph. It makes it sound like 65nm will be all that makes Presler in power consumption. It will also make people judge 65nm based on Presler, since that's the first CPU on the 65nm.

    In fact its not that simple. Taking a CPU that's on a certain process like the Smithfield and putting on a smaller process won't mean instant 40-50% decrease in power consumption. That's called the dumb shrink. The reason Northwood had significantly lower power than Willamette was because Northwood was optimized to lower power consumption.

    A CPU that runs well at 130nm may do bad at 90nm and even worse at 65nm for example. Presler was said to be not Intel's main focus and Intel moved their design teams to Conroe, so people who's supposed to be optimizing Presler for 65nm all went away and Presler was just done a dumb shrink.

    Sleep transistor was an optional feature on 65nm, not required. So Presler may not have it.
  • IntelUser2000 - Monday, January 2, 2006 - link

    Why use DDR2-667 with 5-5-5-15 timings?? Most DDR2-667 can do 4-4-4-8(around there). This is gonna skew the results in AMD's favor as DDR400 used is the lowest latency possible.

    In reality nobody is gonna use DDR400 at 2-2-2-7 lateny or DDR2-667 at 4-4-4-8 latency. Nobody I have ever heard in outside internet uses the RAM at those timings.

    Anandtech should either benchmark them all at JEDEC timings or use them all with low latency. I understand they want to be sure the new test system to work properly, but using low latency RAM for the comparison system is just not fair.

    JEDEC timings for DDR400 is 3-3-3-8. Where are your DDR400 advantage over DDR2 now??
  • hans007 - Sunday, January 1, 2006 - link

    i think that the 9xx series is a big improvement over the 8xx.

    i have an 8xx myself the 820 which is the lowest power. the leakage is exponential so the 955 is going to draw a much highe ramount than say a 920 will.

    i bet the 920 will be a half decent cpu drawing maybe only 70 watts. which isnt TOO terrible in the grand scheme of power. the 920 would only run at 2.8 ghz and have not as high leakage percentage so i think it will be the one to get.

    true intel is not better yet, but they are getting there. and their dual cores still cost less.

    i also think that intel should be commended for writing the smp code for q4. that is the doom3 engine which will go into a LOT of games. and since it speeds up the amd chips as well, it is a free upgrade for everyone. sure it makes up for a large deficiency in the intel chips, but it is FREE.

    and it makes the really cheap 920/820 chips very price competitive. as the 820 chips are very very cheap about $150 on ebay (which is probably near what oems get them for in bulk, this the rampant dell 820 deals going on)
  • jjmcwill - Saturday, December 31, 2005 - link

    I do professional software development for a living, using Visual Studio 2003 to build the code for a product I work on. We have over 1000 .cpp files and over 1500 header files.

    On my work box: An HP xw6200 workstation with a single 3.0GHz Xeon CPU, 2MB L2 cache, 1G RAM, compilation takes 10:45 for a single project in our solution. On my home system: Socket 754 Athlon 64 3000+, 1.5G RAM, compilation takes 7:30. Both systems build the code off of the exact same, external ide hard drive in a Firewire enclosure. I use it to carry all my work back and forth between work and home.

    At some point we'll be investigating Make to launch parallel compiles, and I would be VERY interested in seeing dual-core CPU comparisons which include compilation benchmarks, using Visual Studio 2003 under Windows, using Make -j2 or Make -j3 under windows, and using gcc/make under Linux.

    Based on what I've seen with the Xeon, I'm leaning toward an AMD X2 or dual core Opteron for my next upgrade.


    Thanks.

  • Calin - Tuesday, January 3, 2006 - link

    I think that an Extreme Edition CPU (while much more expensive) would give better results with hyperthreading enabled than a simple Pentium D and maybe even than an Athlon64 X2 while doing several threads of compile.
  • Brian23 - Saturday, December 31, 2005 - link

    The second valuable post in this thread.

    I own a X2 3800 and I'm pleased with the results anand posted. I won't need to upgrade for a while.

    I'm looking forward to AMD implementing something similar to Sun's design: multiple threads running simultaneously. It shouldn't be that hard to do. It's just adding GPRs and a little logic that controls the thread contexts.
  • Missing Ghost - Saturday, December 31, 2005 - link

    Some other web sites report that the cpu becomes too hot with the stock heatsink.
  • Gary Key - Saturday, December 31, 2005 - link

    quote:

    Some other web sites report that the cpu becomes too hot with the stock heatsink.


    The initial press release kits that contained the Intel D975XBX motherboard had an issue that created higher than normal idle/load temperatures. We have new boards on the way from Intel. I can promise you that the first results shown in other 955EE reviews do not occur on the 975x boards from Gigabyte and Asus, nor will it occur on the production release Intel D975XBX. I highly recommend a different air cooling system than the stock heatsink but most of the reported results at this time are incorrect.

Log in

Don't have an account? Sign up now