Nehalem Xeon EP update: too good but trueby Johan De Gelas on February 11, 2009 12:00 AM EST
- Posted in
- IT Computing general
We were quite amazed, even slightly suspicious, when HP and Fujitsu-Siemens Published their SAP numbers. These numbers showed that the newest Xeon X5570 (Nehalem EP) series offer an enormous performance boost over the Xeon X5470 (Harpertown). After all, an almost 100% improvement at a slightly lower speed (2.93 GHz vs 3.3 GHz) is nothing short of amazing. Turns out that the real clockspeed is 3.2 GHz (2.93 GHz + 266 MHz turbo) but that does not alter the fact that these are truly incredible performance numbers.
I can now confirm that there are no tricks behind these numbers: they paint the right picture about the Xeon Nehalem EP. Talking to SAP benchmarking specialists, it became clear that few tuning tricks exist that are not know to the big OEM. The benchmark has been analyzed and tuned so well, that even the use of a different database (for example MS SQL instead of DB2) only makes a 2 to 3% difference most of the time. So you might even compare SAP numbers which are obtained on different databases. To resume, the SAP numbers can only be really boosted by better hardware (CPU-memory).
Now why I am talking so much about SAP benchmarking numbers? It is not like the expensive ERP software is run by everyone.
Well, the SAP numbers are showing a dual 2.93 GHz (or 3.2 GHz) Xeon beating the only quad AMD 8384 (Shanghai at 2.7 GHz) score of 22000 we have so far. Granted, a blade server is most of the time a bit slower. But four AMD 8384 2.7 GHz will be in the same league as a dual Xeon X5570, which will be out very soon now.
Even worse for AMD is that the SAP benchmark is not some exotic exceptional benchmarking case for the Xeon 55xx series. It shall be no surprise that the HPC numbers will be very impressive too.So it looks like AMD is in a tough spot.
As the SAP threads are sharing a lot of data (as is typical for these kind of database driven applications), hyperthreading can not be the only explanation why Nehalem is simply doubling performance and annihilating the competition. SAP benchmarking specialists expect hyperthreading to be good for about one third of the performance boost. We tend to believe these people who performed this benchmark for years now. The reason why it is not one of the "top cases" for hyperthreading on Nehalem is that this OLTP based benchmark spends a lot of time on shared data. Our own Nehalem OLTP benchmarking (Oracle and MySQL) points also in that direction.
As we have pointed out before the benchmark also
- responds very well to low latency cache and memory latency
- does not care too much about memory bandwith
- and is very sensitive to "syncing latency".
Since the AMD Shanghai CPU has the same fast way to sync between cores (via the L3-cache) as Nehalem, it can not explain why AMD falls behind. Another explanation is of course that these benchmarks are run on a CPU which uses turbo, which explains about a 5% advantage as the Nehalem CPU actually runs at 3.2 GHz.
Nehalem has faster access to the memory than AMD's latest quadcore (70 ns vs 110 ns), which is probably the second reason why Shanghai falls behind. But AMD will probably have to redesign it's integer execution pipeline significantly before it will catch up with Nehalem (think memory disambiguation for example). Basically, AMD's better NUMA - integrated memory controller platform was hiding this disadvantage. Now that the new Intel platform does not put "the brakes" on the integer execution engine anymore, the superiority of Intel's integer engine is showing.
The lack of any form of multi-threading is hurting AMD badly. It is well known that most of these business applications achieve very low IPC (0.2-0.6) and that modern superscalar CPUs have ample execution resources for running two threads in these applications. The results is Simultaneous Multi Threading offers a typical 20 tot 40% performance advantage. And that is huge, considering that you need 25 to 50% more clockspeed to counter that. It is basically a mission impossible for a modern CPU without SMT to outperform a similar superscalar CPU with SMT in OLTP, Java, webserver, rendering and ERP workloads. AMD really dropped the ball there, SMT should have been part of the K10 architecture.
Difficult times ahead for AMD
Even if AMD is able to speed up beyond 3 GHz, chances are slim that AMD will be able to compete with the new Nehalem Xeons. Add Turbo mode, hyperthreading, a lower latency memory controller and a better integer core together and you get a performance gap the size of the "Grand Canyon".
So does AMD have any chance at all beyond a new architecture in 2011? Is it over and out for AMD in 2009 and 2010? Adding 2 cores at the end of 2009 is a good step in the right direction. But even if AMD executes flawlessly the 32 nm Xeon Westmere will only give a window of a few months to the AMD hexacore "Istanbul". Istanbul should appear at the end of 2009, the Westmere Xeon is scheduled for very early 2010.
Westmere has few performance optimizations, it seems to be a pretty straight forward shrink. Slightly higher clockspeeds, about 20% lower power consumption, and yet another addition to the ridiculously long list of SSE-instructions in the form of seven new instructions (six instructions are for crypto/AES acceleration). Westmere is only an evolutionary step forward, but the "Grand Canyon" gap that Nehalem EP has made is probably large enough.
It is sure that we'll see better (lower) virtualization switching from virtual machine to hypervisor time and some small tweaks in AMD's Istanbul CPU, but it remains unclear if there are any significant performance boosters in the core. So it looks like Intel will own the dual socket space throughout 2009 and 2010, if we may believe the current roadmaps.
As the SAP numbers indicate, even the slowest Intel Xeons will show a large performance gap with the best AMD Opteron's. Is AMD doomed completely? In a large part of the market, yes. AMD's istanbul will make the gap a bit smaller but probably not small enough.
There are some unknown factors that together with one of the few remaining weaknesses (or rather less strong points) of Nehalem that might make it possible that AMD's opteron comes close enough in a particular area of the market. In my next post, I will clarify the one and only opportunity that I see for AMD in the next two years. Until then, don't shoot the messenger :-).