AMD's K10: a "dead" product or not?
by Johan De Gelas on May 12, 2008 12:00 AM ESTPosted in
IT Computing general
A few years ago it was fashionable to bash Intel's Pentium 4 as a braindead architecture. The fact that the Pentium 4 Northwood (533 MHz FSB) was the best performing processor from mid 2002 until late 2003 in many applications, and that the Pentium 4 Northwood remained competitive until early 2004 was conveniently forgotten: nuances do not make good headlines.
It is now trendy to bash AMD. One" PC doctor" at ZDNet goes as far to say that:
"When I look at AMD’s current product line, all I see is a forest of
deadness. Intel has products trump every category of products
going. Server, desktop, mobile, low-end, high-end, dual-core,
quad-core. Intel has all these markets stitched up."
Nuances, who needs them when you can make a sensational headline? And indeed, the lastest desktop CPU articles here at Anandtech show that Intel's midrange CPU have a significant lead over the fastest Phenom processors.
Like any design, the K10 is a trade-off. And most trade-offs were made in favor of the applications in the server and HPC market, at the expense of games and other desktop applications.
First take a look at this page which compares a Core 2 Duo 4400 (2 GHz, 2 MB L2 and 800 MHz FSB) with a slower 1.86 GHz Core 2 Duo E6320 (4 MB of L2 and a 1066 MHz FSB). One thing is for sure: games prefer the larger L2 cache. Some of the games were up to 10% faster on the CPU which was clocked 7% lower but with twice the L2-cache. The fact that games prefer a 4 MB L2 is not going to change when you run it on a AMD CPU with integrated memory controller. A L2 can deliver the necessary data in 12-20 cycles, an IMC needs about 100 cycles.
Now, take a look at the Cache architecture of AMD's K10/Barcelona. If your run a single threaded game on it, it gets a fast 512 KB L2-cache and after that a relatively slow (44-48 cycles!) 2MB L3. If you know that the same game can benefit from more than 2 MB cache, it is pretty clear that the 512 KB L2 is not going to cope, you'll end up using the L3 a lot. A dual threaded game might need a little less per thread, but the same problem will happen again: it needs to go to that slow L3 cache all too often. Run that same game on Intel Core CPU and each thread of your dual threaded game gets a low latency 4 MB (or 6 MB) L2.
Now let us now imagine that we run 4 threads of an HPC workload on it. Each thread has a very limited number of instructions, which perfectly fit in each of the L2 caches. You get 4 threads which gets a total of 4x the bandwidth of L2. In case of Intel, each two threads have to share the available bandwidth of the L2. The amount of data is huge, so caching the data is hardly possible. The fast IMC does wonders for the K10 chip.Data that is shared between the 4 cores remains in the L3-cache and all L2 caches are kept coherent at a incredibly fast SRI. So your cache coherency overhead does not increase with the number of caches, it increases per socket. Going from 2 to 4 sockets means that you double the amount of cache coherency traffic. Compare that to the Intel platform where all L2 caches need to be kept coherent.
It is just one example why we could never expect the K10 chip to be a super desktop chip. But how is Barcelona doing in the server world? Is it limited to an HPC niche market? Well, let us see what Intel thinks. First of all, where do most of the 45 nm chips go? Just a few weeks ago, Anand reported that Intel had no intention of flooding the desktop with 45 nm Core 2 chips quickly.
Those 45 nm chips are going to the server market. Why? Several reasons.
First of all, the server market might be only 20% of Intel's revenue. But look at this:
Profit margin (estimate)
Percentage of revenue Intel Server CPU >$400 >$300
AMD Server CPU
Intel Mobile/Desktop CPU
AMD Mobile/Desktop CPU
Secondly, Intel needs those 45 nm to be competitive in the HPC market. A 2 GHz Barcelona is capable of keeping up with the best 65 nm Xeons in those applications.
It is pretty clear why AMD focused on the server market. Without a complete redesign it is not possible to beat Intel's integer crunching power and the fast and big L2-cache and that is exactly what a modern game needs. Barcelona built further on the K8 architecture and inherited the relatively inflexible integer pipeline. While Core 2 has sophisticated reordering of loads and stores, Barcelona does a limited reordering of loads. While Core 2 offers a 32 entry queue to the integer units, Barcelona has 3 rather inflexible separated 8 entry queues.
So the right way forward for AMD was to focus on HPC and server applications where it could leverage it's strong points. We can bash AMD for being so late, and coming up with relatively low clocked CPUs, but even a 2.8 GHz Phenom would not have raise AMD's ASP significantly in the desktop market.
We are almost done with our first round of quad socket benchmarking and we can tell you that we are having a lot more fun than Anand: it is a good old exciting fight between AMD and Intel. Don't believe us? Let Intel do the talking again:
Yes, projecting the bad performance of the desktop chip to say that "AMD's products are a dead forest" is ... just silly. If you have missed the previous entries of our IT blog, just go to it.anandtech.com