Motherboards Memory Storage Cases/Cooling/PSUs IT Computing Displays Mobile Mac CPUs & Chipsets Video Digital Cameras Linux Gadgets Systems Trade Shows Guides Home Increase Font Size Decrease Font Size Change Page Size
AMD Core Counts and Bulldozer: Preparing for an APU World
AMD Core Counts and Bulldozer: Preparing for an APU World
Date: November 30th, 2009
Topic: CPU & Chipset
Manufacturer: AMD
Author: Anand Lal Shimpi
Buy the AMD HDX945WFGIBOX II Processor
Blank
 Newegg $153.99
 Buy.com $165.98
 CompUSA $159.99
 
 

Last week Johan posted his thoughts from an server/HPC standpoint on AMD's roadmap. Much of my analysis was limited to desktop/mobile, so if you're making million dollar server decisions then his article is better suited for your needs.

He also unveiled a couple of details about AMD's Bulldozer architecture that I thought I'd call out in greater detail. Johan has been working on a CMP vs. SMT article so I'll try to not step on his toes too much here.

It all started about two weeks ago when I got a request from AMD to have a quick conference call about Bulldozer. I get these sorts of calls for one of two reasons. Either:

1) I did something wrong, or
2) Intel did something wrong.

This time it was the former. I hate when it's the former.

It's called a Module

This is the Bulldozer building block, what AMD is calling a Bulldozer Module:

AMD refers to the module as being two tightly coupled cores, which starts the path of confusing terminology. A few of you wondered how AMD was going to be counting cores in the Bulldozer era; I took your question to AMD via email:

Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:

http://images.anandtech.com/reviews/cpu/amd/FAD2009/2/bulldozer.jpg

Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.

AMD responded:

Anand,

Think of each twin Integer core Bulldozer module as a single unit, so correct.

I took that to mean that my assumption was correct and 4 Bulldozer cores meant 4 Bulldozer modules. It turns out there was a miscommunication and I was wrong. Sorry about that :)

Inside the Bulldozer Module

There are two independent integer cores on a single Bulldozer module. Each one has its own L1 instruction and data cache (thanks Johan), as well as scheduling/reordering logic. AMD is also careful to mention that the integer throughput of one of these integer cores is greater than that of the Phenom II's integer units.

Intel's Core architecture uses a unified scheduler fielding all instructions, whether integer or floating point. AMD's architecture uses independent integer and floating point schedulers. While Bulldozer doubles up on the integer schedulers, there's only a single floating point scheduler in the design.

Behind the FP scheduler are two 128-bit wide FMACs. AMD says that each thread dispatched to the core can take one of the 128-bit FMACs or, if one thread is purely integer, the other can use all of the FP execution resources to itself.

AMD believes that 80%+ of all normal server workloads are purely integer operations. On top of that, the additional integer core on each Bulldozer module doesn't cost much die area. If you took a four module (eight core) Bulldozer CPU and stripped out the additional integer core from each module you would end up with a die that was 95% of the size of the original CPU. The combination of the two made AMD's design decision simple.AMD has come back to us with a clarification: the 5% figure was incorrect. AMD is now stating that the additional core in Bulldozer requires approximately an additional 50% die area. That's less than a complete doubling of die size for two cores, but still much more than something like Hyper Threading.

The New Way to Count Cores   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

93 Comments - Last by mino, 23 days ago
Username:
Password:
Die size? by gost80, 71 days ago
Judging apparent benefit of this architecture over Intel's can be done only if the die size per _module_ is also made available. So, how about it?

Reply
RE: Die size? by swindelljd, 69 days ago
I bet Oracle is salivating over the new core count technique since it is sure to create a huge surge in their revenue because they charge per core on the x86 platform.

Reply
RE: Die size? by Zool, 68 days ago
The thing is that the picture in this article contains shared L2 cache and L3 cache too and its quite unclear from the picture if L2 is shared to one module or all modules.(sharing all modules 2 times with L2 and L3 would be quite useless)
The bulldozer picture in the other article from anandtech http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3 shows clearly that the L2 cache belongs to module.
So clearly ading 50% to the core(which is everything till L1) is much less than 2 whole cores each with its own same size L2 cache ( Nehalem has only tiny 256KB cache per core from die area reasons).
If we take whole die size with 8MB L3 cache and 1MB L2 cache per module/core (+ things like memmory controler,hypertransport core/module conects) the final die increase could end in 10-15% or even less.

Reply
RE: Die size? by Zool, 68 days ago
So a 4 module Bulldozer core with 512KB L2 cache and 6MB L3 cache could be something like 10-15% bigger than a 4 core PhenomII with 512KB L2 cache per core and same 6MB L3 cache. For 80% more integer performance that wouldnt be bad.
And about Oracle , the server cpu-s from both intel and AMD are running in ranges from few hundred dolars to over 2k dolars with minimal performance increase just more sockets suported and everyone is buying them. So i wouldnt care less for them than a fly on my window. It will end on final pricing per core for cpu not core/modul license price.

Reply
I miss you AMD. by Nocturnal, 71 days ago
Very interesting. I hope that AMD will one day regain their edge they once held against. Oh how I miss those days. I embrace Intel nonetheless.

Reply
RE: I miss you AMD. by blyndy, 71 days ago
I'm very excited about AMD's brand-new design and how it's new ideas translate into performance, however:

"The quad-core Zambezi should have roughly 10 - 35% better integer performance than a similarly clocked quad-core Phenom II"

That sounds a bit low, I hope the final comparable CPUs can manage something more like 15 - 40% better integer performance over their PhII counterparts. Then again perhaps that's just because of Intels large performance increases between their recent architectures making us expect more -- they are more the exception than the rule, so 10 - 35 % shouldn't be sneezed at, although that just may not be competitive on their release in 2011.

Reply
RE: I miss you AMD. by nafhan, 71 days ago
Based on AMD's re-defining of the word core that's actually a HUGE improvement. A quad core Zambezi has a similar transistor budget as a dual core Phenom II, and a 10-35% performance improvement.
In other words, quad core integer performance for dual core price.

Reply
RE: I miss you AMD. by psychobriggsy, 70 days ago
A quad-core Bulldozer has the same transistor budget as a tri-core Phenom II (if they existed natively), yet performs around 20% better than a quad-core.

I think that SMT would have provided easier performance pickings (20% for 5% die space). I don't understand why AMD have been avoiding SMT so far. Sure, 80% more performance for 50% die space isn't to be sneezed at, but it's not so easy pickings.

In addition there are more integer resources than in a Phenom II core, and the FPU has two 128-bit FMAs, so each core could be reasonably bigger. In effect it could be that 1 Bulldozer module is the same size as two Phenom II cores - so all you have then is the 10-35% performance increase. I hope this is per-clock...

Reply
RE: I miss you AMD. by titan7, 65 days ago
Perhaps the k7/k8 didn't make sense to add SMT? The p4 was really designed for it and had it enabled in genII. Look how long it took Intel to get SMT into the Pentium Pro/Core/i7.

I suspect AMD is designing for SMT right now, but gen1 is just "get to market ASAP because Intel is faster right now" and genII will have SMT enabled.

Reply
RE: I miss you AMD. by Zool, 71 days ago
From the previous article "The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space".
So the quad core Zambezi (2 modules, 4 integer pipelines)should have roughly 10-35% better integer performance than a similarly clocked quad-core Phenom II. Thats a super boost per transistor count.

Reply
Comments Page 1 of 10

Unlicensed Software at Your Last Company
Anonymously Report Unlicensed Software with Our Form Now. Get Up to $1 Million.
We Buy Laptop and PC Memory! Sell to Us!
Min of 25 pieces required. Call us today at 239.354.1230.
Special Offer from The Economist
Get 12 issues of The Economist for $12. US subscribers only.
Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!




Latest news by
DailyTech

 February 9, 2010

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 February 8, 2010

Blank


more CPU & Chipset Discussions



pipeboost
Copyright © 1997-2010 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information