A few years ago it was fashionable to bash Intel's Pentium 4 as a braindead architecture. The fact that the Pentium 4 Northwood (533 MHz FSB) was the best performing processor from mid 2002 until late 2003 in many applications, and that the Pentium 4 Northwood remained competitive until early 2004 was conveniently forgotten: nuances do not make good headlines.
 
It is now trendy to bash AMD. One" PC doctor" at ZDNet goes as far to say that:
 
"When I look at AMD’s current product line, all I see is a forest of deadness. Intel has products trump every category of products going. Server, desktop, mobile, low-end, high-end, dual-core, quad-core. Intel has all these markets stitched up."
 
Nuances, who needs them when you can make  a sensational headline? And indeed, the lastest desktop CPU articles here at Anandtech show that Intel's midrange CPU have a significant lead over the fastest Phenom processors.
 
Like any design, the K10 is a trade-off. And most trade-offs were made in favor of the applications in the server and HPC market, at the expense of games and other desktop applications.
 
First take a look at this page which compares a Core 2 Duo 4400 (2 GHz, 2 MB L2 and 800 MHz FSB) with a slower 1.86 GHz Core 2 Duo E6320 (4 MB of L2 and a 1066 MHz FSB). One thing is for sure: games prefer the larger L2 cache. Some of the games were up to 10% faster on the CPU which was clocked 7% lower but with twice the L2-cache.  The fact that games prefer a 4 MB L2 is not going to change when you run it on a AMD CPU with integrated memory controller. A L2 can deliver the necessary data in 12-20 cycles, an IMC needs about 100 cycles.
  
Now, take a look at the Cache architecture of AMD's K10/Barcelona. If your run a single threaded game on it, it gets a fast 512 KB L2-cache and after that a relatively slow (44-48 cycles!) 2MB L3. If you know that the same game can benefit from more than 2 MB cache, it is pretty clear that the 512 KB L2 is not going to cope, you'll end up using the L3 a lot. A dual threaded game might need a little less per thread, but the same problem will happen again: it needs to go to that slow L3 cache all too often. Run that same game on Intel Core CPU and each thread of your dual threaded game gets a low latency 4 MB (or 6 MB) L2.
 
Now let us now imagine that we run 4 threads of an HPC workload on it. Each thread has a very limited number of instructions, which perfectly fit in each of the L2 caches. You get 4 threads which gets a total of 4x the bandwidth of L2. In case of Intel, each two threads have to share the available bandwidth of the L2. The amount of data is huge, so caching the data is hardly possible. The fast IMC does wonders for the K10 chip.Data that is shared between the 4 cores remains in the L3-cache and all L2 caches are kept coherent at a incredibly fast SRI.  So your cache coherency overhead does not increase with the number of caches, it increases per socket. Going from 2 to 4 sockets means that you double the amount of cache coherency traffic. Compare that to the Intel platform where all L2 caches need to be kept coherent.
  
It is just one example why we could never expect the K10 chip to be a super desktop chip. But how is Barcelona doing in the server world? Is it limited to an HPC niche market? Well, let us see what Intel thinks. First of all, where do most of  the 45 nm chips go? Just a few weeks ago, Anand reported that Intel had no intention of flooding the desktop with 45 nm Core 2 chips quickly.
 
 
 
Those 45 nm chips are going to the server market. Why? Several reasons.
 
First of all, the server market might be only 20% of Intel's revenue. But look at this:
 
CPU
ASP
Profit margin (estimate)
Percentage of revenue 
 Intel Server CPU  >$400 >$300
 +/- 20%
 AMD Server CPU
$300-$400
$220-$330
 +/- 16%
 Intel Mobile/Desktop CPU
$100
$40-$50
 +/- 80%
 AMD Mobile/Desktop CPU
$50-65
$5-$30
 >80%
 
Secondly, Intel needs those 45 nm to be competitive in the HPC market.  A 2 GHz Barcelona is capable of keeping up with the best 65 nm Xeons in those applications.
  
It is pretty clear why AMD focused on the server market. Without a complete redesign it is not possible to beat Intel's  integer crunching power and the fast and big L2-cache and that is exactly what a modern game needs. Barcelona built further on the K8 architecture and inherited the relatively inflexible integer pipeline. While Core 2 has sophisticated reordering of loads and stores, Barcelona does a limited reordering of loads. While Core 2 offers a 32 entry queue to the integer units, Barcelona has 3 rather inflexible separated 8 entry queues.
 
So the right way forward for AMD was to focus on HPC and server applications where it could leverage it's strong points. We can bash AMD for being so late, and coming up with relatively low clocked CPUs, but even a 2.8 GHz Phenom would not have raise AMD's ASP significantly in the desktop market.
 
We are almost done with our first round of quad socket benchmarking and we can tell you that we are having a lot more fun than Anand: it is a good old exciting fight between AMD and Intel. Don't believe us? Let Intel do the talking again:

 

Yes, projecting the bad performance of the desktop chip to say that "AMD's products are a dead forest" is ... just silly.  If you have missed the previous entries of our IT blog, just go to it.anandtech.com


POST A COMMENT

74 Comments

View All Comments

  • Locutus465 - Wednesday, May 14, 2008 - link

    I bought AMD for all of the reasons stated including stability and an interst in seeing where spider goes. Frankly I'm very happy with my choice, I've been having a terrific experience. Reply
  • Itany - Monday, May 12, 2008 - link

    AMD would loss very thing facing Nehalem. K10 core vs enhanced penryn core, two channel DDR2 IMC vs three channel DDR3 IMC, HT 2.0 interconnect vs QPI, no SMT vs SMT...

    The crash of AMD is just a matter of time
    Reply
  • INeedCache - Monday, May 19, 2008 - link

    Isn't everything simply a matter of time? Do you realize how long folks have been saying just that about AMD? Trust me, you really don't want AMD to crash and disappear. If so, kiss those inexpensive CPUs goodbye. Reply
  • K20 - Saturday, May 17, 2008 - link

    Why do unqualified people feel compelled to comment, I don't. But I do feel compelled to correct:

    "K10 core vs enhanced penryn core"
    It's K10.5/Shanghai Vs. "enhanced penryn" (considering people refer to C2D as Conroe then C2.5D should be Wolfdale, should it not?) "core".

    "two channel DDR2 IMC vs three channel DDR3 IMC"
    Nehalem needs more RAM bandwidth due to the cache coherency protacol it uses.

    "HT 2.0 interconnect vs QPI"
    It's HyperTransport 3.0.

    Both HyperTransport and Intel's Quick Path Interface transfer the same ammount of data per transfer and each transfer at this speed:
    HyperTransport = 5.2 GT/s Vs. QPI's 6.4 GT/s.
    But it still depends on the efficiency of the protocol being used.

    "no SMT vs SMT..."
    Yes it's sharing a 256KB L2 between 2 threads and a <7 MB L3 cache between 8 threads (assuming 4 cores) whereas K10.5/Shanghai will have a 512 KB L2 cache for 1 thread and a 6 MB L3 for 4 threads and 1 MB more of >L1 cache.

    "The crash of AMD is just a matter of time"
    Erm... What if they start turning a profit next quarter and a bigger profit next quarter etc.
    But obviously that can't happen as you can predict the future.


    Anyway it's nice to see that Anandtech hasn't completely written off the K10, I might pop by here more often now.
    Reply
  • Griswold - Wednesday, May 14, 2008 - link

    I dont speak gibberish... Reply
  • Ensoph42 - Monday, May 12, 2008 - link

    Not to jump the gun or anything right? What about Shanghai?

    Anyway, this was an interesting blog entry to read, and sort of echoed what I had been thinking even before the phenoms were released when I was just looking over some of the architectural details that had been made public. It looks like a server chip.

    I think AMD made some engineering choices that didn't translate to good marketing for the hardcore/gaming/OC crowd. AMD isn't dumb and it must have crossed their minds that Intel wasn't going to take the ass whooping handed out to them by the Athlon architecture forever. Since AMD just isn't able to have two seperate development lines for desktop and server due to their small size (conjecture,) they may have decided to engineer a chip more heavily towards what is probably the more profitable segment (servers). It's also possible that AMD's biggest engineering mistake is being too farsighted. The K10 is good engineering, but doen't have the benchmarking numbers to back it up. Simply put: modern desktop software isn't engineered to take advantage of the K10 architecture. Server software on the other hand just might.

    I look forward to the review. Keep up the good work guys.
    Reply
  • Regs - Wednesday, May 14, 2008 - link

    What do you mean, Shanghai? It's based off the Barcelona. The only way AMD is going to improve on applications for Desk Top is with a redsign. If they're going to come out with 2.2 - 2.6GHz CPUs with 512KB L2, with little to no improvements on interger performance, then all Shanghai will be is a cooler running Barcelona. Reply
  • Ensoph42 - Wednesday, May 14, 2008 - link

    I'm trying to do some digging around on some details about shanghai, but it's a more than a die strink and some extra cache if I'm not mistaken.

    Not keep in mind that the Phenom X4 9850 main competitor is the C2Q 6600. Looking at a couple of sites benchmarks they spend most of their time within spitting distance of eachother. The 6600 pulls away on a few, as does the 9850. Pound for pound Phenom as it stands is a respectable chip, it just can't reach the frequencies it needs to compete with the higher end, but I dont think the archtecture itself is bad.

    So if Shanghai can bring 20% due to archtecture tweeks, and a boost in clockspeed, that'll be great and keep AMD in the game. (Even assuming Nehalem will kick ass)

    Now your comment about integer performance and redesigning for desktop applications is sort going backwards. The future is multicore and desktop software needs to be written to take advantage of it from here on out, but software engineers aren't there yet. Not that there aren't improvments to be made in the pipelines of the individual cores.

    Of course this is all speculation on both our parts till we actually see some benchmarks by third parties.
    Reply
  • Locutus465 - Wednesday, May 14, 2008 - link

    I would like to see AMD increase the IPC of the phenom's, the archetecture isn't bad (like you say) and there is plenty of room for improved IPC... Obviously I'm not in a position to say this absolutly, but I seriously doubt AMD needs to go back to the drawing board like Intel did. Reply
  • mtdewcmu - Monday, May 12, 2008 - link

    AMD isn't going to be standing still waiting for Nehalem. AMD is going to turn up the heat with 45 nm later this year. Reply

Log in

Don't have an account? Sign up now