The Opteron 6276: a closer lookby Johan De Gelas on February 9, 2012 6:00 AM EST
SAP S&D Benchmark
The SAP SD (sales and distribution, 2-tier internet configuration) benchmark is an interesting benchmark as it is a real world client-server application contrary to many server benchmark (such as SpecJBB, SpecIntRate, etc.). We looked at SAP's benchmark database for these results. The results below all run on Windows 2008 and MS SQL Server 2008 database (both 64-bit).
Every 2-tier Sales & Distribution benchmark was performed with SAP's latest ERP 6 enhancement package 4. These results are NOT comparable with any benchmark performed before 2009. We analyzed the SAP Benchmark in-depth in one of our earlier articles. So far, our profile of the benchmark shows:
- Very parallel resulting in excellent scaling
- Likes large caches (memory latency)
- Very sensitive to sync ("cache coherency") latency
- Low IPC
- Branch memory intensive code
We managed to get even better profiling of the benchmark. IPC is as low as 0.5 (!) on the most modern Intel CPU architectures. About 48% of the instructions are loads and stores and 18% are branches. One percent of those branches is mispredicted, so the branch misprediction ratio is slightly higher than 5% on modern Intel cores.
Especially the instruction cache is hit hard, and the hit rate is typically a lot lower than in other applications (probably 10% misses and lower). Even the large L3 caches are not capable of satisfying all requests. The SAP SD benchmarks needs between 10-30GB/s, depending on how aggressive the prefetchers are.
SAP is one of the benchmarks that scale very well and it is shows: the server CPUs with the highest thread count are on top. We remember from older benchmarks that enabling Hyper-Threading (on Nehalem and later) boosts SAP's performance by 35%. As the IPC of a single SAP thread is relatively low (0.5 and lower), the decoding front end of the Bulldozer core should be able to handle this easily. Therefore, the extra integer cluster on the Opteron can really do its magic.
We don't have any Xeon X5650 benchmarks, but a quick calculation tells us that the new Opteron 6276 should be about 20% faster than the X5650. It is also about 18% faster, clock for clock, than the older Opteron 6176. The new Opteron does well here.