Despite the fact that the 45 nm Quad-core Opteron was the best server CPU at launch, a few months later AMD’s success was washed away by a tsunami called “Nehalem”. The Nehalem architecture combined subtle tweaks to an already superior integer engine with brute force tactics such as a triple channel integrated memory controller. The IMC delivered low latency and massive amounts of bandwidth thanks to the highest clocked DDR-3 DIMMs. But it was not enough for the ambitious Intel engineers. They added Simultaneous MultiThreading (SMT), and this was the final blow to any competition left standing in the server market. SMT or Hyperthreading as Intel calls it, boosted performance by 30% and more in key applications such as SAP, Oracle and MS SQL Server. The end result is that the current Xeon outperforms AMD’s best CPU’s by 60 to 85%! Historic, as Intel never had such a commanding lead since AMD entered the market with it’s Athlon MP.

One could start debating about some of the details of these benchmarks, but that would mostly be splitting hairs. Yes, these scores were obtained with DDR3-1333, while the vast majority of X55xx servers are equipped with DDR3-1066. And yes, power consumption of the fastest Xeons is about 20W higher per CPU than on the “Shanghai” Opterons. So in order to compare in the same power range, you should compare with the E5540 at 2.53 GHz. But even with DDR3-1066 and at 2.53 GHz, the latest Xeon would - roughly estimated – outperform the best quad-cores of AMD with 40 to 70%. The lead is even higher in bandwidth intensive applications. Only in the pretty rare dense matrix applications, with Linpack being the most popular benchmark, AMD could still make a point. AMD can deliver the same amount of Gigaflops at lower power consumption and a lower price. Nice, but we are talking about the 1% of the applications on the market. The other ray of hope for AMD was the competitive performance that the Opteron 2389 2.9 GHz delivered on ESX 3.5 on our virtualized benchmark vApus Mark I. But with ESX 4.0, the new Xeon “Nehalem” should widen the gap again thanks to better hyperthreading support and the fact that EPT is fully supported in the latest ESX hypervisor. AMD’s next generation CPU is scheduled to appear in 2012, so it looks like AMD will have to leave the high-end and midrange server CPU market to Intel. Unless…

Ever since the introduction of the 45 nm CPUs, AMD has been executing very well. So well, even, that it reminds us of the K75 times. You might remember how in October 1999, AMD introduced the “K75” in 250 nm and sped up the “x86-Alpha” to 1 GHz in March 2000, only 5 months later. It has indeed been 10 years since AMD has executed so well. Only six months after the successful launch of their 45 nm quad-core, AMD rolls out their hex-core “Istanbul” at 2.6 GHz well ahead of schedule. It is basically a “Shanghai” Opteron with 2 extra cores and a slightly tweaked memory controller. What is more impressive, though, is that AMD is capable of launching a hex-core at 2.6 GHz today, a CPU that consumes only a few watt more than the six month older quad-core at 2.7 GHz. Well done, AMD. But should the IT professional care about the new six-core of AMD? In which applications does it make sense to consider an “Istanbul” based server? Are two extra cores enough to bring back AMD’s Opteron on the specsheet of your next high performance server?

Do Six Cores Make Sense?

The question is not theoretical. When Intel launched their hex-core “Dunnington”, quite a few applications did not make good use of it. The quad-socket “Istanbul”-based servers will face the same problems as “Dunnington”: some server applications prefer “2n cores”, a few will not scale above eight cores and many will not get past 16 very successfully. Yes, even in the server world, quite many applications do not scale well beyond 8-16 cores. Mailservers, webservers and even some databases may be in that situation. If your database gets a lot of locks on the same amount of data, locking contention will kill off your performance once you get beyond a certain number of cores. Rendering applications are another group that start to show diminishing returns with more than 8 cores. It is pretty likely that clustering dual-socket quad-cores makes more sense that adding more cores to the same machine.

But the six-core “Istanbul” CPU has advantages too. The Nehalem Xeon offers 8 logical cores, but the two threads on each core have to share the 32 KB L1 and the tiny 256 KB L2. Istanbul can work with “only” 6 threads, but each thread gets a 64 KB L1 and an in comparison copious amount of 512 KB of L2. In a nutshell, It is clear that the new AMD “Istanbul” Opteron targets a specific market: a few compute intensive HPC applications, large databases and most importantly: “heavy” virtualized workload. The reason why we say “heavy” is that the six-core is a drop-in replacement for the current quad-core Opterons. That means that the memory capacity of the servers based on the new six-core will probably be the same. If you are consolidating lots of light loads together, you are likely to run into memory limits before you run into processing power limits.

Istanbul's Improvements
POST A COMMENT

39 Comments

View All Comments

  • iocedmyself - Wednesday, June 17, 2009 - link

    Well something that was failed to be mentioned was that the 2P opteron machine costs about $6700, where as the nehalem 2p machine is very near to $16,000.

    as for power consumption a straight up comparison would be HP380 Xeon and HP 385 Opteron. At idle, both are 140W. With 100% CPU / Ram, 385 is around 300W, 380 (Xeon) is about 450W.

    another thing not discussed here - 4P Istanbul is 70-80% faster than 2P Nehalem, and there is no 4P Nehalem. 8P Istanbul is over 3 times as fast as 2P Nehalem. so until next gen Nehalem, there is no competition in the high end which probably has something to do with istanbul orders being through the roof.

    I also have to wonder if these benchmarks were conducted using one of Intel's little helpful optimized compilers.
    Reply
  • yasbane - Wednesday, June 10, 2009 - link

    would be nice to see some unix or linux benchmarks... Reply
  • riskyburden - Thursday, June 04, 2009 - link

    I might be naive here but surely the majority of these applications are favouring clock speed and no more than two cores, should there not be a bench for those companies that run multiple apps such as SQL and AD or IPFX etc all from one server and make a comparison there. I don't suggest it to be good network practice but that would interest me more. Reply
  • mino - Friday, June 05, 2009 - link

    For this part of SMB market pretty much any dual core CPU will do.

    Their bottleneck is almost allways on the storage side, sometimes with insufficient memory.
    And most also run default install where basic SW tweaks would make 100's percents in performance.
    Reply
  • befair - Wednesday, June 03, 2009 - link

    Johan never proves me wrong. Even an article meant to talk about AMD Opteron starts with a good deal of "Intel is the king!" stuff, as usual. Reply
  • alpha754293 - Wednesday, June 03, 2009 - link

    What happened to them?

    I would have to loved to have seen what the new 6-core AMDs would be able to do in this arena since it is (presumably) a much more competitive offering than the fastest Xeons all around.
    Reply
  • lopri - Tuesday, June 02, 2009 - link

    A Question: Is the 'snoop-filter' a hardware-based? I read that it can be enabled/disabled via BIOS, and since the cores are same as Shanghai cores.. But my question is, whether it's hardware-based or software-based (BIOS), shouldn't this work for inter-core communication as well if AMD decides to implement it? Reply
  • JohanAnandtech - Tuesday, June 02, 2009 - link

    I have to check, but I am pretty sure it is both. The "uncore" part has changed somewhat on Istanbul.

    "shouldn't this work for inter-core communication as well if AMD decides to implement it"

    Since the L3-cache keeps copies of shared L2-cachelines, I don't think that will help. There is already a very fast way of communicating with little overhead.
    Reply
  • tygrus - Monday, June 01, 2009 - link

    I would like to know the performance difference when using a cell size of 3 not 6 on the 6-core units or of 8 not 4 on Xeon 4Core8Thread ?

    Will have to wait for latter for more raw performance numbers (eg. memory local/system, SPEC CPU, task switching, OS/IO task servicing).

    How long before they update the boards for DDR3 based memory and better IO onboard ?

    It's a pity the ESX 4.0 update hasn't helped AMD .. are the improvements only available for Intel or was it to correct a previous Intel only problem ? What can AMD/partners do to improve performance ?
    Reply
  • JohanAnandtech - Tuesday, June 02, 2009 - link

    "I would like to know the performance difference when using a cell size of 3 not 6 on the 6-core units?"

    A cell size of 3 will not do any good if your VMs are MP. Eventhough ESX features "relaxed co-scheduling", there might quite a few cases where the Scheduler is not able to use all "slots" as some of vCPUs of the VMs might be behind. From the momemt you use more than 2 vCPUs, you will get situations where only one VM with 2 CPUs is scheduled on a cell of 3 CPUs. 8-cell: I have to try it.

    "How long before they update the boards for DDR3 based memory and better IO onboard ? "

    The AMD's Fiorano platform that will be available in a few weeks should have better I/O (PCIe gen 2) but will still be DDR-2 based.

    DDR-3 CPUs are scheduled for 2010.

    "It's a pity the ESX 4.0 update hasn't helped AMD .. are the improvements only available for Intel or was it to correct a previous Intel only problem ? "

    VMware's docs tell us they that CPU locking goes more quickly and that the scheduler is "cache aware", but most of the biggest improvements are EPT and better support for Hyperthreading.

    Reply

Log in

Don't have an account? Sign up now