The Secret Boost of the Opteron 2224

Socket F Opterons have a small secret weapon: a speed bump offers more than just a faster CPU. To understand this, take a look at the table below. We measured the L2 cache's bandwidth with Lavalys Everest 3.51.

Lavalys Everest 3.51 L2 Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s)
Dual Xeon 5160 3.0 GHz 22019 17751 23628
Xeon E5345 2.33 GHz 17610 14878 18291
Opteron 2224 SE 3.2 GHz 14636 12636 14630
Opteron 8218HE 2.6 GHz 11891 10266 11891

The L2 cache of the Opteron 8218 at 2.6GHz is slower than the Core 2's L2 cache at 2.33. At about 10-11 GB/s it barely matches the theoretical peak bandwidth that DDR2 at 667MHz can deliver (10.6 GB/s), while its exclusive nature also forces it to exchange quite a bit of data with the L1 cache. Now combine this table with the following one, where we measured memory bandwidth.

Lavalys Everest 3.51 Memory Bandwidth
  Read (MB/s) Write (MB/s) Copy (MB/s) Latency (ns)
Dual Xeon 5160 3.0 GHz 3656 2771 3800 112.2
Xeon E5345 2.33 GHz 3578 2793 3665 114.9
Opteron 2224 SE 3.2 GHz 7466 6980 6863 58.9
Opteron 8218HE 2.6 GHz 6944 6186 5895 64

It is no secret that a higher clocked integrated memory controller can increase the actual delivered bandwidth of the same DDR2 modules. But it also helps that the L2 cache is able to swallow the bandwidth that the memory is capable of delivering. Also notice that without the use of SSE2 instructions, the memory subsystem of the 5000p chipset delivers relatively disappointing amounts of bandwidth. As most applications do not use carefully tuned SSE2 code to get data from memory, this should reflect the real world situation most of the time. And of course, until Intel introduces the Nehalem family, memory latency will continue to be one of the strong points of AMD.

Processor Latency Comparison
CPU L1 L2 L3 min mem max mem Absolute latency (ns)
Xeon 5160 3.0 - DDR2 533 3 14   69 380 127
Xeon 5160 3.0 - DDR2 667 3 14   67 338 113
Core 2 Duo 2.933 - DDR2 533 3 14   67 180 61
Quad Xeon E5345 2.33 - DDR2 533 3 14   80 280 120
Quad Xeon E5345 2.33 - DDR2 667 3 14   80 271 116
Xeon 7130M 3.2 - DDR2 400 4 29 109 245 624 195
Opteron 880 2.4 - DDR333 3 12   84 228 95
Opteron 2224 SE - DDR2 667 3 12   72 189 59
Opteron 2218 HE - DDR2 667 3 12   62 157 60

The latency penalty that FB-DIMM introduces is huge. To get an idea, we added the latency measured with a Core 2 Duo 2.933 using 2x 2GB 533MHz DDR2. The staggering conclusion is that registered FB-DIMMs add - in the worst case - about 200 cycles or 66ns of latency. Sure, some of that latency can be attributed to the buffering which is necessary for server memory. Buffered memory contains registers which will actually hold data for one full clock cycle before it's passed on. So this means that registered memory should add about 8ns (2 clock cycles at 266MHz base clock, DDR2-533).

The secondary benefit of FB-DIMMs is that motherboards can use more DIMMs per bank, potentially increasing total memory capacity. AMD already gets around this quite easily with up to eight DIMM sockets per CPU socket, however, so this benefit really doesn't materialize in any reasonable form. The bottom line is that while FB-DIMMs were a potentially good idea from a purely theoretical point of view, it is rather obvious that in practice they have some pretty bad consequences.

Tyan Transport TA26 SPECjbb2005
POST A COMMENT

30 Comments

View All Comments

  • 2ManyOptions - Monday, August 06, 2007 - link

    ... for most of the benchmarks Intel chips performed better than the Opterons, don't know why Intel should get scared from these, they can safely wait for Barcelona. Didn't really understand why you have out it as AMD is still in game with these in the 4S space. Reply
  • baby5121926 - Monday, August 06, 2007 - link

    intel got scared because they dont want to see the real result from AMD + ATI.
    the longer intel lets AMD lives, the more dangerous intel will be.
    that's why you guys can see Intel is attacking AMD really really hard at this meantime... just to kick AMD out of the game.
    Reply
  • Justin Case - Monday, August 06, 2007 - link

    What are the units in the WinRAR results table? Reply
  • coldpower27 - Monday, August 06, 2007 - link

    Check Intel own pricing lists, and you will see that Intel has already pre-empted some of these cuts with their Xeon X5355 at $744 or Xeon E5345 at $455 and the "official" Xeon X5365 should be cout soon if not already...

    http://www.intel.com/intel/finance/pricelist/proce...">http://www.intel.com/intel/finance/pric...rice_lis...
    Reply
  • TheOtherRizzo - Monday, August 06, 2007 - link

    I know nothing about 4S servers. But what's the essence of this article? Surely not that NetBurst is crap? We've known that for years. Is the real story here that Intel doesn't really give a s*** about 4S, otherwise they would have moved on to the core 2 architecture long ago? Just guessing. Reply
  • coldpower27 - Monday, August 06, 2007 - link

    Xeon 7300 Series based on the Tigerton core which is a 4 Socket Capable Kentsfield/Clovertown derivatives is arriving in Sepetember this year, so Intel does care in becoming more competitive in the 4S space, but it is just taking some time.

    They decided to concentrate on the high volume 2S sector is all first, since Intel has massive capacity, going for the high volume sector first makes sense.
    Reply
  • mino - Monday, August 13, 2007 - link

    Yes and no, actually to have two intel quads running on a single FSB was a serious technical problem.

    Therefore they had to wait for 4-FSB chipset to be able to get them out the door. Not to mention the qualification times which are a bit onger for 4S platforms that 2S.

    AMD does not have these obstacles as 8xxx series are essentially 2xxx series from stability/reliability POW.
    Reply
  • Calin - Monday, August 06, 2007 - link

    The 5160 processor is Core2 unit, not a NetBurst one. Also, the 5345 is a quad core based on Core2 Reply
  • jay401 - Monday, August 06, 2007 - link

    People built 3.0GHz - 3.33GHz E4300 & E4400 systems six months ago that cost roughly $135 for the CPU. Others went for an E6300 or more recently an E6320, both again under $200.
    They were all relatively easy overclocks.

    Why does anyone with any skill in building their own computer care about an $800+ CPU again?
    Reply
  • Calin - Monday, August 06, 2007 - link

    Why don't Ford Mustangs use a small engine, overclocked to hell? Like an inline 4 2.0l with turbo, and a high rpm instead of their huge 4+ liter engines?
    Why do trucks use those big engines, when they could get the same power from a smaller, gasoline, turbocharged engine?

    People pay $800+ for processors that work in multiprocessor systems (your run of the mill Athlon64 or E4300 won't run). Also, they use error checking (and usually error correcting) memory in their systems - again, Athlon64 doesn't do this. They also use registered DDR in order to access more memory banks - your Athlon64 again falls short. On the E4300 side, the chipset is responsible with those things, so you could use such a processor in a server chassis - if the socket fits.
    Reply

Log in

Don't have an account? Sign up now