The Best Server CPUs Compared, Part 1by Johan De Gelas on December 22, 2008 10:00 PM EST
- Posted in
- IT Computing
ERP and OLTP Benchmark 1: SAP S&D
The SAP S&D (sales and distribution, 2-tier internet configuration) benchmark is an extremely interesting benchmark as it is a real world client-server application. We decided to look at SAP's benchmark database. The results below are two tier benchmarks, so the database and the underlying OS can make a big difference. Unless we keep those parameters the same, we cannot compare the results. The results below are all run on Windows 2003 Enterprise Edition and MS SQL Server 2005 database (both 64-bit). Every "two tier Sales & Distribution" benchmark was performed on the SAP's "ERP release 2005".
In our previous server oriented article, we summed up a rough profile of SAP S&D:
- Very parallel resulting in excellent scaling
- Low to medium IPC, mostly due to "branchy" code
- Not really limited by memory bandwidth
- Likes large caches
- Sensitive to sync ("cache coherency") latency
There are no quad socket results for the latest 45nm AMD parts, but we can still get a pretty good idea where it would land. The "Barcelona" Opteron scales from 10520 SAPS (2 CPUs) to 17650 (4 CPUs), or an improvement of about 68%. The quad Opteron 8384 will probably scale a bit better, so we speculate it will probably attain a score of about 23000 to 24000 SAPS. It won't beat the best Intel score (Dunnington), but it will come close enough and offer an excellent performance/watt ratio. If you are wondering about the phenomenal Xeon X5570 scores, we discussed them here.
Also interesting is that the dual 2.7GHz "Shanghai" is about 31% faster than the dual 2.3GHz "Barcelona", while the clock speed advantage is only 17%. It clearly shows that the larger L3 cache pays off here. Now let's look at some more exotic setups: octal socket or similar systems.
This overview is hardly relevant if you are deciding which x86 server to buy, but it is a feast for those following the complete server market -- or those of us who are interested in different CPU architectures. There is a battle raging between three different philosophies: the thread machine-gun SUN UltraSPARC T2, the massive speed daemon IBM POWER6, and the cost effective x86 architectures. The UltraSPARC 2 machine only has four sockets, but each socket contains eight CPUs that have a fine grained multithreaded, in order, "Gatling gun" that cycles between eight threads. That means one quad socket machine keeps up to 256 threads alive. The POWER6 machine contains eight CPUs, but each CPU is only a dual-core CPU. However, each POWER6 CPU is a deeply pipelined, wide superscalar architecture running at 4.7GHz, backed up with massive caches (4MB L2, 32MB L3). The very wide superscalar architecture is used more efficiently thanks to Simultaneous Multi-Threading (SMT).
Each T2 with eight "mini cores" needs 95W compared to 130W for two massive POWER6 cores, so the SUN Server needs 4 x 95W for the CPUs, while the IBM server needs 8 x 130W. These differences could be smaller percentagewise when you look at how much power each server system will need, but when it comes to performance/watt, it will be hard to beat the T2 here. The latest octal Opteron server should come close (8 x 75W) as it does not use FB-DIMMs while the UltraSPARC T2 does. However, we are speculating here; let's get back to our own benchmarking.