Original Link: http://www.anandtech.com/show/6430/amd-launches-opteron-6300-series-with-piledriver-cores

Taking Small Steps forward

Today unveiled its new AMD Opteron 6300 series server processors, code name Abu Dhabi. The Opteron 6300 containes the new Piledriver cores, an evolutionary improvement of the Bulldozer cores.

We did an in depth analysis of the Bulldozer core and we came to the conclusion that there are three primary weak spots that resulted in the underwhelming performance of the Bulldozer core:

  1. The L1 instruction cache: when running two threads simultaneously, the cache misrate increased significantly; the associativity is too low.
  2. The branch misprediction penalty
  3. Lower than expected clock speed

Secondary bottlenecks were the high latency and low bandwidth of the L2 cache, and the very high latency of the L3 cache, which signficantly increased the overall memory latency.

The lack of clock speed has been partially solved in Piledriver with the use of hard edge flops and the resonant clock edge, which is especially useful for clock speeds beyond 3GHz. Vishera, the desktop chip with Piledriver cores, runs at clock speeds of up to 4GHz, 11% higher than Bulldozer, without any measureable increase in power consumption. As you can see further below, the clockspeed increase are a lot smaller for the Opteron 6300: about 4-6%. The fastest but hottest (140W TDP) Opteron now clocks at 2.8GHz instead of 2.7GHz, and the "regular" Opteron 6380 now runs at 2.5GHz instead of 2.4GHz (Opteron 6278). That means that the Opteron is still not able to fully leverage the deeply pipelined, high clockspeed architecture: the power envelope of 115W is still limiting the maximum clockspeed. The more complex and less deeply pipelined Intel Xeon E5 runs at 2.7GHz with a 115W TDP.

Piledriver also comes with a few small improvements in the branch prediction unit. Two out of three of the worst bottlenecks got somewhat wider. The most important bottleneck, the L1 Icache, is only going to be fixed with the next iteration, Steamroller.

The L2 cache latency and bandwidth has not changed, but AMD did quite a few optimizations. From AMD engineering:

"While the total bandwidth available between the L2 and the rest of the core did not change from Bulldozer to Piledriver, the existing bandwidth is now used more effectively. Some unnecessary instruction decode hint data writes to the L2 that were present in Bulldozer have been removed in Piledriver. Also, some misses sent to the L2 that would get canceled in Bulldozer are prevented from being sent to the L2 at all in Piledriver. This allows the L2’s existing resources to be applied toward more useful work.”

We talked about the whole list of other improvements when we looked at Trinity:

  • Smarter prefetching
  • A perceptron branch predictor that supplements the primary BPU
  • Larger L1 TLB
  • Schedulers that free up tokens more quickly
  • Faster FP and integer dividers and SYSCALL/RET (kernel/System call instructions)
  • Faster Store-to-Load forwarding

Lastly, the new Opteron 6300 can now support one DDR3 DIMM per channel at 1866MHz. With 2 DPC, you get a 1600MHz at 1.5V.

We're still working to get hardware in house for testing, but we wanted to provide some analysis of what to expect with Abu Dhabi in the meantime.

Performance According to AMD

For the first time in Opteron history, AMD was not able to provide us with samples before the launch. We are working with them to make sure we can show our independent benchmarks. Until then we have to work with what we AMD has published.

The clock speed advantage of the 6380 at 2.5GHz is 4%, so the SPEC CPU tests show that Piledriver is about 4% more effficient per clock when we focus on high IPC, low MLP benchmarks. However, as we have shown in previous articles, server applications behave very differently from SPEC CPU benchmarks.

AMD claims 24% higher performance in the Java benchmark SPECJBB, but that is an inflatable benchmark. We do not want to dismiss the benchmark result immediately, but AMD does not disclose the JVM settings. The following settings were disclosed:

The Opteron 6380 based server:

77.9W at Active Idle, 308W and 1,636,298 ssj_ops at 100% of target load, and 4,040 overall ssj_ops/watt using 2 x AMD Opteron™ processors Model 6380 in Supermicro 1022G-NTF server, 64GB (8 x 8GB DDR3-1600) memory, Supermicro PWS-563-1H20 power supply, 240GB SATA disk drive, Microsoft® Windows Server® 2008 R2 x64 Enterprise Edition.

The Opteron 6278 based server:

82.6W at Active Idle, 320W and 1,233,423 ssj_ops at 100% of target load, and 2,892 overall ssj_ops/watt using 2 x AMD Opteron™ processors Model 6278 in Supermicro 1022G-NTF server, 64GB (8 x 8GB DDR3-1600) memory, Supermicro PWS-563-1H20 power supply, 240GB SATA disk drive, Microsoft® Windows Server® 2008 R2 x64 Enterprise Edition.

It is very likely that some new JVM performance boosting tricks contribute more to the performance increase than better processor performance. Most of those JVM tricks are unacceptable in a real-world Java application, so we fear that the SPECJBB results tell us very little. As just one example, AMD uses 16 JVMs on 32 integer cores to obtain the SPECJBB results. That means that each JVM is running on one module, minimizing the coherency traffic and optimizing the cache hits. Of course everybody that posts these SPECJBB scores uses these kinds of very unrealistic settings, but it also means that we can deduce very little about the real performance increase that the Piledriver cores offer.

The power numbers of the SPECPower_ssj2008 benchmarks make us somewhat optimistic though. The 40% increase in performance/watt is clearly not the result of JVM performance tricks alone. The idle and maximum power numbers also confirm that the Opteron 6300 is quite a bit more efficient in server loads than the 6200. We estimate that the new Opteron offers a 20% (or better) higher performance/watt ratio in the real world. Let us wrap up with a look at the SKUs and prices.

SKUs and Pricing

The AMD Opteron 6300 series has the same specifications as the 6200 series. The only changes are slightly higher clockspeeds and minor architectural improvements. So how much does AMD charge you for that?

AMD Opteron 6300 versus 6200 SKUs
Opteron 6300 Modules/
TDP Clock
Price Opteron 6200 Modules/
TDP Clock
High Performance High Performance
6386SE 8/16 140W 2.8/3.2/3.5 $1392          
          6284 SE 8/16 140W 2.7/3.1/3.4 $1265
          6282 SE 8/16 140W 2.6/3.0/3.3 $1019
Midrange Midrange
6380 8/16   2.5/2.8/3.4 $1088          
6378 8/16   2.4/2.7/3.3 $867 6278 8/16 115W 2.4/2.7/3.3 $989
6376 8/16   2.3/2.6/3.2 $703 6276 8/16 115W 2.3/2.6/3.2 $788
          6274 8/16 115W 2.2/2.5/3.1 $639
          6272 8/16 115W 2.0/2.4/3.0 $523
6348 6/12   2.8/3.1/3.4 $575 6238 6/12 115W 2.6/2.9/3.2 $455
6344 6/12   2.6/2.9/3.2 $415 6234 6/12 115W 2.4/2.7/3.0 $377
High clock / budget High clock / budget
6328 4/8 115W 3.2/3.5/3.8 $575          
          6220 4/8 115W 3.0/3.3/3.6 $455
6320 4/8 115W 2.8/3.1/3.3 $293 6212 4/8 115W 2.6/2.9/3.2 $266
6308 2/4 115W 3.5 $501          
Power Optimized Power Optimized
6366HE 8/16 85W 1.8/2.3/3.1 $575 6262HE 8/16 85W 1.6/2.1/2.9 $523

The top models with slightly increased clockspeeds (+100MHz) are also slightly more expensive than the previous models, so you're basically paying more for more performance, which should hopefully work out as a net positive in the long run. More interesting are the midrange chips: the Opteron 6378 and 6376 are slightly more powerful than the 6278 and 6276 (same clock speeds but with architectural improvements), but they come with a 11-12% lower price.

Let's compare the AMD chips with Intel's offerings.

AMD vs. Intel 2-socket SKU Comparison
TDP Clock
Price Opteron Modules/
TDP Clock
High Performance High Performance
2680 8/16 130W 2.7/3/3.5 $1723          
2665 8/16 115W 2.4/2.8/3.1 $1440 6386 SE 8/16 140W 2.8/3.2/3.5 $1392
2650 8/16 95W 2/2.4/2.8 $1107          
Midrange Midrange
          6380 8/16 115W 2.5/2.8/3.4 $1088
2640 6/12 95W 2.5/2.5/3 $885 6378 8/16 115W 2.4/2.7/3.3 $867
          6276 8/16 115W 2.3/2.6/3.2 $703
2630 6/12 95W 2.3/2.3/2.8 $639          
          6348 6/12 115W 2.8/3.1/3.4 $575
2620 6/12
2/2/2.5 $406 6234 6/12 115W 2.6/2.9/3.2 $415
High clock / budget High clock / budget
2643 4/8 130W 3.3/3.3/3.5 $885          
2609 4/4 80W 2.4 $294 6320 4/8 115W 3.0/3.3/3.6 $293
2637 2/4 80W 3/3.5 $885 6308 2/4 115W 3.5 $501
Power Optimized Power Optimized
2630L 8/16 60W 2/2/2.5 $662 6366HE 8/16 85W 1.8/2.3/3.1 $575

Our Xeon E5-2600 review showed that the 8-core Xeon E5 was between 12% and 40% faster than the 8-module Opteron at more or less the same clocks (Xeon E5 2660 at 2.2GHz versus Opteron 6276 at 2.3GHz). The AMD benchmarks seem to indicate that the new Opteron is 5 to 15% faster at the same clocks, so a 6386SE at 2.8GHz might be able to stay close to the 2.4GHz Xeon 2665, but the higher TDP does not make it very attractive. The 6386SE 2.8GHz might make sense for some HPC people though. If you can recompile your code (and use FMA), AMD claims that a 2.5GHz 6380 is just as fast as a 2.9GHz 2690.

AMD may offer pretty good value in the midrange for the server market. We measured a 7% to 18% advantage (in the most important applications) for the Xeon with 12 threads compared to the Interlagos CPU with 16 integer cores. The 5% to 15% higher single-threaded performance of the Opteron 6378 (compared to a 6278) might be good enough to beat the 2640 in some benchmarks. Of course we have to see how well the Opteron fares in the power consumption measurements.

AMD also has a few very nice budget offerings: a 3GHz to 3.3GHz 6320 with 8 integer cores sounds good compared to the 4 cores of the 2609 at 2.4GHz in a market where performance per dollar is more important than performance per Watt.

AMD fails to convince the low power market. An 8 module chip at 1.8GHz will not be able to beat a 2GHz Xeon 2630L that will consume less power. The performance per watt of the Intel chip will be significantly better and the performance alone will be about 15 to 45% better.

So far...

Besides the low power offering, the Opteron 6300 series looks quite good. The specifications and pricing of the 6276 and 6278 in particular are attractive, and those chips are catering to the bulk of the market. But the benchmarks AMD presents are hardly convincing. The SPECJBB2005 test is easy to inflate, while the recompiled HPC benchmarks are interesting to a small niche of the market but useless to the rest of us. The jury is thus still out on what Abu Dhabi will mean for AMD servers, but we hope to have a verdict in the coming weeks.

Log in

Don't have an account? Sign up now