Finally, an EPIC battle!
Intel has executed well the past 2-3 years - Swiss clockwork well. The 45nm family has lowered power consumption significantly and raised performance by about 10 to 20%. That allows Intel to win almost every benchmark in the desktop and workstation market. But don't worry; things are a lot more interesting in the server market.
You might remember from our in-depth analysis that the floating point power of the 45nm Intel CPUs is at least as good as or better than AMD's latest in raw FP performance on a clock-for-clock basis. When it comes to pure integer power, the quad-core Opteron does not have a chance against the 45nm Intel CPUs: the latter is clock-for-clock an impressive 40-45% faster. Add to that the fact that the fastest Intel CPU runs at 3.2GHz while AMD is stuck at 2.5GHz for the moment, and it is clear that the AMD chips do not have a chance in single-threaded integer workloads. HP posted the SPEC CPU 2006 scores of two very similar servers:
|SPEC2006 Performance Comparison|
|CPU||Tested Server||SpecInt2006 (Base - Peak)||Specfp2006 (Base - Peak)|
|Opteron 2356 2.3 GHz||Proliant BL465c G5||13.2 - 14.8||16.2 -17.8|
|Xeon L5410 2.33 GHz||Proliant BL460c||18.8 - 21.6||16.8 -19.8|
The latest Opteron is left far behind in the integer benchmark, but is competitive in floating point when you compare clock-for-clock. It is not hard to see why Intel's 45nm CPUs are superior in single-threaded workloads. Luckily (for AMD), Intel took it's time to introduce its impressive 45nm technology in the quad-socket market, and AMD only faces the Intel's 65nm family for now.
|Extended SPEC2006 Performance Comparison|
|CPU||Tested Server||SpecInt2006 (base - peak)||Specfp2006 (base - peak)||SpecInt2006 rate (base - peak)||SpecFp2006 rate (base - peak)|
|Opteron 8356 2.3 GHz||ProLiant BL685c G5||12.2 - 13.8||15.1 - 17.2||160 - 184||143 -157|
|Xeon 7340 2.4 GHz||ProLiant BL680c G5||18 - 20.4||15.9 - 18.3||157 - 188||100 - 108|
|Xeon 7330 2.4 GHz||PRIMERGY RX600 S4||151 - 177||97.6 - 104|
While AMD's flagship processor is still no match in single-threaded integer code, it matches a slightly higher clocked Intel Xeon 7340 in multi-threaded integer performance. Single-threaded floating performance is essentially the same (clock-for-clock), and when it comes to multi-threaded floating point performance performed upon huge datasets, there is no stopping to the best AMD chip: it is up to 50% faster than its competitor. It is interesting to note that the x7350 Xeon at 2.93GHz is not faster than its slower brother at 2.4GHz in SPECfp2006, clearly indicating a bottleneck.
You probably guessed what the bottleneck in the Xeon system is. We used our multi-threaded, 64-bit Linux Stream binary (Courtesy Alf Birger Rustad) based on v2.4 of Pathscale's C-compiler, compiled with the following switches:
-Ofast -lm -static -mp
We tested with 16 threads.
|Memory Performance Comparison|
|Quad Opteron 8356||20867||20860||20892||20945||20891|
|Quad Xeon 7330||9778||8973||9008||9008||9192|
No matter which Xeon 73xx you use, the best each can hope for is less than 600MB/s of memory bandwidth. That is slightly better than a PIII 1GHz with 133MHz SDRAM! Considering a current Xeon 2.4GHz is at least 4 times (and more) faster than a PIII at 1GHz, it is clear that this is a severe bottleneck that won't be solved until a Xeon "Nehalem" MP with CSI is available. Until then, Intel's Xeon MP faces a very capable competitor.