Architecture Summary


Woodcrest's home is a newer revision of the Bensley platform than what Dempsey launched with, which means that it's a drop-in part for newer Bensley based systems. If all goes to plan Clovertown (Quad-Core Xeon) should be a drop-in upgrade as well (depending on the system vendor). As we discussed in our Dempsey article, the Bensley platform features FB-DIMM with a peak bandwidth of 21GB/sec, SAS/SATA support and 1066/1333MHz FSB.

Woodcrest Highlights:

Shared 4MB L2 "Smart Cache"
Dempsey based processors had a separate 2MB L2 cache for each core, but Woodcrest has 4MB of L2 Cache shared between both cores. Due to the fact that the cores share a single cache, there is no data replication like there is with separate L2 caches; this results in more efficient data-sharing between cores. The shared cache also helps with mismatched loads: when one core is consistently using more cache than the other core, the CPU can allocate more L2 cache to that core. Both of these techniques are illustrated below.



Wide Dynamic Execution Enhancements
With the Intel Core micro-architecture, every execution core is 33% wider than previous generations, allowing each core to fetch, dispatch, execute and retire up to four full instructions simultaneously. The Opteron - as well as all previous NetBurst Xeon processors - can only handle 3 at a time.

Macro Fusion
Macro-fusion combines certain common x86 instructions into a single instruction for execution. Without Macro-fusion four instructions at a time are fetched from the queue and each instruction gets decoded into separate micro-ops. With Macro Fusion, 5 instructions can be fetched at a time, and if a fusable pair is present it can be sent to a single decoder. A single micro-op can then represent two regular x86 instructions.



Beyond 2 Sockets, is Intel's FSB still an Achilles Heel?

As we've seen in past benchmarks, the front side bus has been a thorn in Intel's side, especially in the quad socket systems. Whether or not the new architectural changes that Intel has made with Woodcrest will alleviate enough of that pressure to overpower the scalability of Opteron in four socket configurations is unknown at this point. Intel is quite confident that with the shared cache and its dual independent FSB running at 1333MHz that bus bandwidth is not a concern, however at some point the bus bottleneck will be a problem. One of Intel's architects has however stated that an integrated memory controller is possible, which Intel has already shown us a demo of.

Index The new benchmark suite
POST A COMMENT

59 Comments

View All Comments

  • peternelson - Thursday, July 13, 2006 - link

    Agreed!

    I'm not interested in 32 bit performance.

    If you're gonna be spending this much money on an upmarket system you better be running it in 64 bit mode. I know I will.

    So if Opteron benches better than Woodcrest in 64 bit mode that changes the equation for me.

    Also isn't Opteron 290 out any time now? The would close the % gap a little because of the higher clock speed.

    Also S1207 Opterons will be here 1st August. The new nforce5 based pro/server chipsets might give a little boost over existing ones too, as could the bandwidth boost and lower power of DDR2.
    Reply
  • defter - Thursday, July 13, 2006 - link

    quote:

    I'm not interested in 32 bit performance.


    Check the review, this (and previous Linux review) uses only 64bit software.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    Oh indeed, x64 Windows was used.

    Good to see than, that woodcrest doesnt take a 64bit penalty. Maybe the Linux application of my source uses FP calculations and no SSE, or it is due to the compiler which may be in favour for Intel on the windows. Anyways 64bit is a big "playing" field for benchmarks looking forward to read more ;-)

    cheers

    Kiijibari
    Reply
  • Calin - Thursday, July 13, 2006 - link

    Slightly better performance and slightly lower power consumption. Looks like you have a winner for new servers.
    However, for a Fortune 500 company, there are other things much more important than slightly better performance and slightly lower power consumption.
    Reply
  • JarredWalton - Thursday, July 13, 2006 - link

    After the poor showing of NetBurst Xeons against Opteron, I'd think any Dell shops would be thrilled to regain the performance crown. Also, frankly, a 5-10% lead is about all most things get you these days, especially when I/O and everything else comes into play. The Woodcrest systems have better overall CPU performance, but it often isn't that important when working on massive databases.

    Incidentally, from what we've seen of Conroe, it seems like Intel could release Core chips at up to 3.4-3.6 GHz without difficulty right now. Rather surprising, given the 14 stage pipeline vs. 39 for Prescott.
    Reply
  • FesterOZ - Thursday, July 13, 2006 - link

    Actually its not a big thrill at all. One of the major pushes at the firm is to consolidate into VMware based servers or larger raw servers but in all cases stop the traditional 1 server per application that seems to affect most firms. Therefore we are more focused on 4 socket 8 core style servers i.e. HP BL45 blades than 2 socket blades. We had all the top level Dell executives coming in trying to convince us to stay with Dell because at this time, they have no answer for the larger server (the Oct/Nov timeframe for the Dell 4 socket AMD server is too far out). So in the short term we will be a hybrid Dell/HP shop. Maybe we will shift back if Dell's commitment to AMD indeed ramps as expected. Reply
  • Dubb - Thursday, July 13, 2006 - link

    I doubt this has much practical use, but I am nonetheless curious...could you "pinmod" a cloverton to run 1333 FSB?

    might make for some speedy rendering if it was stable.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    quote:

    I doubt this has much practical use, but I am nonetheless curious...could you "pinmod" a cloverton to run 1333 FSB?
    No that is not possible, dont you think that intel would release it, if it would be possible ? Just think about it, to lower the FSB bandwidth on the 4core part doesnt make sense, does it ? 4 cores are much more bandwith hungry than 2 ..

    Reason is the "Intel bolt-together" architecture. The 4core part is just 2 Dies in one package, thus it will have twice the bus load of a single (dual core) CPU. Intel did the same already with Netburst dual cores, hence you have the same FSB limitations there.

    All in all it is a little bit odd, Cloverton/Kentsfield performance increases will be much less than linear, but Intel has the advantage of time to market vs. the AMD K8L quad core. Though AMD's QC design looks much more sound I expect intel to be 1st with releasing a quad core CPU.

    cheers

    Kiijibari
    Reply
  • Dubb - Thursday, July 13, 2006 - link

    okaaaayyy...

    cloverton's platform supports 1333, and the kentsfield ESs run 1333 easily. most clovertons probably CAN, it's just a question of if the 1066>1333 pinmod some have suggested for dempsey or 1066 woodcrests actually works, and if so, clovertons might be an interesting application of it.

    I'm just curious, is'all.
    Reply
  • Kiijibari - Thursday, July 13, 2006 - link

    It may work, however that kind of overclocking is more dangerous than normal overclocking. It is easy to oc a chip that run at, lets say 2 GHz, and there is e.g. a 3 GHz top model. Chances are good that yields are well, thus your 2 GHz model may be able to run faster as most models pass 2.6 GHz tests, thus your model was just down binned to 2 GHz.

    However with the FSB1066 vs. FSB1333 I assume that you are playing around at the absolutly maximum. Intel would do everything to raise FSB speeds, exspecially with Quad cores. It is nonsense from the performance point of view, to decrease the available bandwidth while the number of bandwidth consumers (i.e. cores) increases.

    It might boot & work with a FSB1333 though, but Intel cant and wont gurantee that. It may be good enough for Super Pi or other "fun stuff", but if you run I/O intensive applications, cross your fingers and be prepared for data corruption.

    bb

    Alex
    Reply

Log in

Don't have an account? Sign up now