Original Link: http://www.anandtech.com/show/2287

AMD: Still in the Game

by Anand Lal Shimpi on July 26, 2007 2:00 PM EST


It always seems that the worse a company does, the more information it divulges. We saw this behavior with Intel at the end of the Pentium 4 era, and we're definitely seeing it now with AMD. Enjoy it while it lasts, because it sure makes the industry a lot more exciting to talk about.

Today's disclosures are many of the things we alluded to in our last article about AMD's future, what we called The Road Ahead. If you were waiting for us to fill in the blanks, this article will do just that.

Barcelona Update

Before getting into the new stuff, AMD gave us a brief update on Barcelona, whose launch is now hopefully less than a month away.



Barcelona is the second CPU to plug into what AMD is calling its 2nd generation Opteron platform, it will have one more socket-compatible successor before the platform is retired:



The first Barcelona processors available will be the HE (Energy Efficient) and standard performance CPUs, running at speeds of 2.0GHz or lower at launch:

In Q4 of this year AMD will introduce the SE (High Performance) Barcelona parts, running at 2.3GHz and above.

AMD is doing its best to sugar coat the low clock speed launch by saying that it's addressing the majority of the market at these clocks, but the fact of the matter is that AMD would be singing a different tune if it was able to achieve higher clock speeds at launch.



Shanghai: What Immediately Follows Barcelona

In the second half of 2008, AMD will introduce its first 45nm processor under the codename Shanghai.



Shanghai will be an evolutionary step above Barcelona, adding a larger L3 cache and some IPC enhancements at both the core and North Bridge levels. Shanghai will keep the same 512KB L2 cache per core of Barcelona, but grow the shared L3 cache from 2MB to a full 6MB. Note that this is still less cache than Penryn will offer in its quad-core configurations, but with AMD's integrated memory controller, larger caches aren't as necessary.

Shanghai, like Barcelona, is targeted at the server market. There will be a desktop variant also introduced in the second half of 2008 with similar specs. Shanghai and its desktop equivalent will be socket-compatible with Barcelona/Phenom motherboards.

AMD has also indicated that Shanghai will begin AMD's transition to DDR3 on the desktop, indicating that the desktop version may be available in two different sockets: AM2 for DDR2 support, and AM3 for DDR3 support. In the first half of 2009, Sandtiger will follow Shanghai in the server space with a brand new architecture (we'll talk about this core shortly). Sandtiger will be exclusively DDR3 and will definitely require a new platform, and just like Shanghai there will be a desktop variant of Sandtiger as well.

Intel : tick-tock :: AMD : Pipe?

One thing we've been waiting to hear from AMD is how it would compete with Intel's tick-tock cadence of microprocessor releases. To recap, Intel's new model shows that every two years it will introduce a new microprocessor architecture, and on the alternate year in between it will introduce a new manufacturing process on the existing architecture.



Today, AMD confirmed that it, too, would be releasing a new microprocessor architecture every two years, and a new manufacturing process on the alternate years. AMD attempted to go one step further and tie platform technology to the cadence as well; confirming that every two years the platform would change, alongside the microprocessor architecture, to support new features as they are available. It's a minor distinction but the main point is that AMD is committed to Intel's 2-year processor cycle.

Oh, and the acronym is horrible guys.

mmmm-space

In our AMD: The Road Ahead article we looked at AMD's modular approach to CPU design going forward:

AMD has now attached a marketing name to its modular core approach: M-Space:



The principle is still the same; these discrete blocks within a microprocessor can be things like GPU cores, CPU cores, memory controllers, specialized hardware, etc... There's nothing really new here, we just wanted to make sure you were up to date with the latest and greatest in AMD marketing so you don't get confused.

The next part however, we can't guarantee that it won't confuse you.



Codename Mania

We're about to get into a discussion of two new CPUs cores, both due in the 2009 time frame and we'd like to apologize in advance for the sheer number of code names that may be thrown at you. In the past two days we've heard AMD tell us about the following projects:

  • Bulldozer
  • Bobcat
  • Falcon
  • Sandtiger
  • Shanghai
  • Predator
  • Budapest
  • Spider
  • Hardcastle
  • Pinwheel
  • Cartwheel


Many of these codenames refer to platforms including other codenames; understanding the architecture is honestly easier than understanding the nomenclature, which I believe is bad. We've done our best to organize it all in an easy to understand fashion, so please bear with us.

New CPUs

AMD and Intel agree on a lot these days; the 64-bit debate is over, Intel has already committed to bringing an on-die memory controller to market with Nehalem and both companies agree that to address the ultra low power market, you need a new architecture.

The rule of thumb is that a single microarchitecture can cover about an order of magnitude of thermal targets, anything higher or lower and you need to look at a different architecture for maximum performance-per-watt efficiency.



AMD divides the market into two spaces: devices that have a TDP of 10 - 100W and devices that are in the 1 - 10W range. AMD has two new CPU cores that it is announcing today: Bulldozer and Bobcat, both due in 2009. Bulldozer addresses the 10 - 100W segment (much like the current K8 and Core based processors do), while Bobcat is designed for the 1 - 10W portion of the market.

Bulldozer

Due out in the first half of 2009, AMD's Bulldozer core is the true revolutionary successor to the K8 architecture. While Barcelona and Shanghai are both evolutionary improvements to the current core, Bulldozer is the first ground-up redesign since the K7.

Bulldozer will require a brand new socket for two reasons: it will support a new version of AMD's Direct Connect architecture (and Hyper Transport), and it will also support DDR3. Both of these changes dramatically alter the pinout of the CPU, thus making Bulldozer the next core to not be backwards compatible with current motherboards. Once again, AMD is giving us a nice roadmap for obsolescence, which its customers have always appreciated.



Details on Bulldozer are still limited, but here's a quick list of what we know about the architecture

- Not VLIW, still OoO superscalar architecture
- Deeper pipeline than Barcelona/Shanghai
- New x86 instructions targeted at HPC and "media processing"
- increased computational density
- increased flow control capability
- extend SIMD capability targeted specifically at media data types - Hyper Transport 3 will be supported
- The chip will feature 4 HT3 links
- DDR3 support - G3MX Memory Technology
- PCIe 2.0 - IOMMU (Hardware Accelerated I/O Virtualization)

A deeper pipeline than present-day architectures means we are looking at higher clock speeds, and AMD was quick to point out that there is no dramatic change in the approach to microprocessor design with Bulldozer - it's still the same type of out of order, superscalar architecture as its predecessors and not an Itanium-like design.

Extending the x86 instruction set once more only makes sense as the usage model for general purpose microprocessors becomes more demanding. AMD wouldn't be any more specific about the types of instructions we are likely to see in Bulldozer other than they would be HPC and media processing focused.



Bulldozer's connectivity will be improved, supporting up to four HT3 links per processor (up from 3 HT links in present day CPUs). With four HT3 links you can expect to see some pretty robust multi-socket configurations built around Bulldozer cores in the server/HPC markets.

Why is PCI Express 2.0 listed as a feature of the Bulldozer core? Well, some implementations of the core (think Fusion) will actually have on-die PCI Express 2.0 controllers. These CPUs will be particularly interesting for small form factor devices, because the only additional chip you will need is a South Bridge.



The AMD Memory Roadmap: DDR3, FBD and G3MX Examined

With today's announcement we can finally talk about AMD's memory roadmap; the two questions we often hear are: when is AMD planning on moving to DDR3 and what about Fully Buffered DIMM? Both are answered with today's disclosure.

AMD will begin the DDR3 transition on the desktop in the second half of 2008 with its Shanghai processor, but AMD won't fully move to DDR3 until 2009 with Bulldozer. The DDR3 transition beginning with Shanghai and completing with Bulldozer is very similar to the cautious approach AMD took to DDR2 adoption. By the time 2009 rolls around, DDR3 should be very cost competitive with DDR2 and the transition should be seamless.

Despite rumors to the contrary, Bulldozer won't support Intel's Fully Buffered DIMM standard, and instead will use what AMD is calling the "G3 Memory Extender" (G3MX).

FBD addresses the problem of not being able to maintain memory frequency while increasing the number of memory sockets on a motherboard, something that impacts the high end server market. The FBD solution is to serialize the memory bus by placing a buffer chip on each memory module that communicates with the memory controller and memory devices. The memory controller only needs to worry about driving data to these buffers, and the buffers deal with getting data in/out of the memory devices.

While FBD supports up to 8 DIMMs per memory channel, there are three major drawbacks: 1) higher cost per module, 2) higher latencies due to serialization and 3) higher power consumption. AMD has said that it evaluates new memory technologies at each generation, and although it won't rule out FBD for future products, it simply doesn't make sense today.

G3MX addresses the same issue of maintaining memory performance while driving up the number of slots per channel in a more economical manner. The technology simply calls for custom, ASIC-class, buffer logic to be placed on the motherboard itself between the memory controller (in this case the CPU) and the memory slots. There is no conversion of memory interface and thus performance/power shouldn't be impacted nearly as much as FBD, and the big upside is that you can use standard DDR3 memory in the sockets since G3MX is implemented at a motherboard level.

The downside is that with G3MX you are still dealing with a parallel memory interface, which becomes difficult to implement at higher speeds and loads. AMD insists that it can work around any issues related to motherboard design and trace routing, and that G3MX is presently a better solution than FBD for its needs.

The first G3MX implementation will arrive with Bulldozer in 2009; like FBD, it will be limited to high end server/workstation platforms.



Bulldozer Performance Expectations

When Intel announced its first Core microarchitecture CPUs, we saw a number of charts that looked like this:

Intel used performance-per-watt to compare the efficiency of its new architecture to its predecessors. AMD is unveiling Bulldozer's performance target in a similar way, by looking at performance-per-watt:

In the client desktop/notebook space, Bulldozer will have around 1.3x better performance per watt than Barcelona. AMD also indicated that Bulldozer will deliver as much as 1.5 - 2.0x better performance per watt in the server and HPC space when compared to Barcelona.



Better efficiency is obviously important, but absolute performance can't be ignored which is why AMD tagged Bulldozer with the line: "designed to be the highest performing single and multi-threaded compute core in history."

Bulldozer will have to go up against Intel's Nehalem core, which to recap is a dramatically evolved member of the Core architecture.



Bobcat

If Bulldozer is the architecture that will compete with Nehalem, Bobcat is what will compete with Silverthorne. Bobcat is yet another ground up design from AMD, also due out in the 2009 timeframe, but it will address a more power constrained portion of the market. Systems that require a 1 - 10W TDP will use Bobcat, while Bulldozer is limited to the 10 - 100W range (obviously with some overlap between the two).

Bobcat is a far simpler core than Bulldozer, which allows AMD to place it in ultra low power devices (think TVs, set top boxes and smart phones), but it also means that costs will be low. Much like Intel's Silverthorne, Bobcat will be a part of a new class of extremely low priced x86 cores designed primarily for the consumer electronics market.

We asked AMD's CTO, Phil Hester, how simple of a core Bobcat would be - and the answer he gave us was quite telling. Two years ago Intel used the following chart to illustrate the need for multi-core CPUs, the driving factor being that you can no longer get good performance scaling by simply improving single core performance:



What isn't depicted on this chart is the relationship of power consumption to all of this, but as you can guess, the power consumption curve looks much like the multi-core curve. Incremental improvements in single core performance now require exponential increases in power consumption, which was a major driving factor behind the move to multi-core. By achieving higher performance through minor core improvements and adding more cores, we can maintain the sort of year-over-year performance increases we need while keeping power consumption in check.

Phil told us to imagine a graph of power consumption vs. instructions per clock over the history of microprocessor cores, which you can imagine would be linear for a while, before turning exponential.

We are presently in the very non-linear portion of the chart, where minor increases in IPC require significant power expenditures. Bobcat, takes the non-linear portion of this graph and chops it off, going back to a much simpler x86 core that can be built extremely efficiently on today's manufacturing processes.

If you can imagine a Pentium or Pentium Pro class microprocessor, built on a 65nm or 45nm process, you can already guess that power consumption would be quite low. Now add in a few optimizations that AMD's designers have learned over the years and you may be able to picture what Bobcat's architecture might look like. It harks back to a much simpler time in x86 history, but then again that's exactly what's necessary for the type of low power, low cost devices that Bobcat will end up in.



Fusion

Remember Fusion? The whole reason for the ATI acquisition? Well, AMD gave us a little more information on its plans for the first Fusion CPUs.

The first Fusion CPUs belong to a family of chips codenamed Falcon; note that Falcon refers to the Fusion CPU family and not the CPU or GPU cores themselves. Contrary to popular belief, the first Fusion CPUs will be built of a single die. On this die you will find the following components: a shared memory controller, Bulldozer or Bobcat based CPU cores, a DirectX GPU core with UVD support, a shared cache (shared between the CPU and GPU), and a PCIe controller - all on the same die.



For a one-die solution, the feature list for Falcon is pretty impressive. Let's discuss what we know:

The shared memory controller will most likely support DDR3 given the 2009 - 2010 launch timeframe for Fusion, and obviously it will be used by both the CPU and GPU portions of the die. We've already discussed the Bulldozer and Bobcat cores; you can expect the desktop/notebook Falcon chips to use Bulldozer cores while the smallest Ultra Mobile PCs, high performance smart phones and CE devices to use Bobcat based Falcon processors.



AMD just lists the graphics core as being a "Full DirectX GPU", but fails to attach any DX revision to the support sheet. AMD did mention that the GPU core would be a unified shader architecture, but we suspect that lower end Falcon CPUs may not support everything required by DX9/DX10.



The integrated UVD support will eliminate the need for an external graphics card just to decode high bitrate H.264 video. UVD only ends up being around 4.7 mm^2 of today's 65nm GPU die yet it is several orders of magnitude more efficient than the x86 CPU core at decoding H.264, highlighting the importance of its integration onto the CPU die itself. Given how powerful and efficient UVD is, we can't help but wonder how long it will take for AMD to include it in all of its CPUs. We may have to wait for a unified instruction set between the CPU/GPU before we get that sort of granular integration though.

The last item on the M-Space stack is the on-die PCIe 2.0 controller, which AMD said would support a minimum of 16 lanes externally. With an integrated PCIe controller, the only other chip needed is an external South Bridge that can connect via PCIe to the CPU itself.

The on-die PCIe controller won't kill the add-in GPU market, as you will be able to simply pop in an external graphics card if necessary. You can then either disable the on-die graphics or switch between the two as your usage demands change. In notebooks, AMD expects systems with discrete graphics to swap between it and the on-die GPU on the fly depending on usage.



Bobcat in your iPhone?

As much as everyone loves hearing about the iPhone every day, I must bother you with another reference - but I promise it has relevance.

Apple made its transition to Intel x86 processors in record time, everything from its notebooks to its desktops and even the Apple TV use x86 processors. However, the iPhone uses an ARM based processor - which means Apple has to maintain compatibility with a completely different platform when adding OS X functionality to the iPhone. There's no doubt that Apple would prefer to have an x86 processor in its iPhone if for no reason than to simplify its software development, but neither Intel nor AMD make an x86 core that could work in such a cost/power sensitive device.

Intel is working on a chip that could be used in a device like the iPhone down the road, called Silverthorne. Silverthorne is the processor, Poulsbo is the chipset and the entire platform is called Menlow. It's due out in the first half of 2008 and it, like the Bobcat core, will be extremely simplified in order to work in these very low power, low cost devices.



AMD has a different approach; ATI had a number of consumer electronics customers prior to the acquisition, and it was already selling microprocessors into that space - although they were based on MIPS cores. AMD's Imageon and Xilleon processors have been used in set top boxes and digital TVs for a while now but after Bobcat is introduced, there will be a new option.



AMD will offer Imageon and Xilleon processors into the same markets it always has, but customers will be given the option of having a low powered x86 core instead of the ARM/MIPS solution they are used to, if they wish. AMD has to offer the option because it is already entrenched in the market, but the end goal is the same: to transition to x86 from top to bottom.

AMD insists that these new Imageon and Xilleon processors would roughly be the same size as present day ones, regardless of whether the customer chooses an x86 or non-x86 core. Only approximately 20% of the SoC (System on Chip) core is taken up by the x86 CPU, so the impact is minimal.

There's no doubt in our minds that devices like the iPhone will move to x86 based solutions, either from AMD with Bobcat or Intel with Silverthorne and its successors, in the coming years. Such a move would greatly simplify software development, as well as increase the capabilities of these devices.



Eyeing NVIDIA's Lunch: AMD's New Chipsets

Details were scarce about the upcoming RS700 chipset other than the fact it will support DX10 graphics capabilities and include support for HT 3.0, PCI-E GenII, 45nm CPUs, and Avivo HD. This chipset will replace the somewhat successful AMD 690g/V in the low end market in the middle part of next year.



With new CPUs come new chipsets, and thankfully this part of the discussion will happen in the near term. By the end of this year, AMD will introduce its RD790 chipset, which AMD hopes will be competitive with NVIDIA's Socket-AM2 solutions.



The RD790 will obviously support Phenom, but it will also support what AMD is calling CrossFire 2.0. This enhanced multi-GPU spec will support up to four GPUs working in tandem, although we're not clear what GPUs will be supported in this mode or when. Not to mention whether or not we'll run into the same problems we did with NVIDIA's Quad-SLI and performance.

PCI Express 2.0 will also be supported by the RD790 chipset, which doubles bandwidth and dramatically reduces latency to PCIe 2.0 compliant devices. Backwards compatibility with PCIe 1.0 devices is maintained. The chipset will support 32 lanes for graphics (either in 4 x8 slots or 2 x16 slots), 6 x1 lanes for expansion and a single x4 lane to connect to the South Bridge.



The Demo: Phenom at 3.0GHz, Today

Clock speeds have been a major blemish for AMD's Barcelona thus far. At Computex, we couldn't even get our hands on a chip that ran faster than 1.6GHz, and AMD just recently announced it would launch in August at 2.0GHz and below. With Intel pushing 3GHz today on 65nm, AMD needs to do more to compete, as it isn't competing against a low IPC chip like the Pentium 4 any longer.



In a demonstration designed to prove that Phenom isn't broken, AMD featured a quad-core Phenom X4 processor, with standard cooling, running at 3.0GHz. While Phenom won't be anywhere near that clock speed when it launches at the end of this year, AMD expects to be at 3GHz within the first half of 2008.

It's a nice demo, at least we know that the architecture isn't broken, but the time frame is still troubling. Luckily for AMD, Penryn will only be available as a high end Extreme part in 2007, buying a little more competitive time.



The 3GHz Phenom had a somewhat talented backup singer: AMD also showcased three Radeon HD 2900 XTs running in CrossFire mode.



The ATI R7XX

Most of today's announcements were CPU/platform related, however AMD did drop a few nuggets about its 2008 GPU line: the R7xx series.

The R7xx GPU will be built on a 55nm process and it appears that, at least on the high-end, there won't be any UVD support. AMD's roadmaps clearly outline UVD as a part of the mainstream R7xx feature set, but the high end platforms are completely missing the checkbox. We'll find out next year for sure if the lack of UVD and Purevideo HD on high end parts will continue.

We can't say much more about R7xx, other than AMD is quite confident in its abilities despite the lackluster reception of the R600. AMD has its reasons...



AMD's Bob Drebin (CTO of the Graphics Products Group), reaffirmed the company's commitment to developing faster discrete GPUs, even in a post-Fusion era. Drebin stated that memory bandwidth and processing needs would keep discrete GPUs a part of the market; Fusion CPUs will simply bring more opportunities to the low cost and mainstream markets, while discrete GPUs will continue to flourish in the rest of the market.

What will continue to be the driving factors in discrete GPU development going forward? According to Drebin, memory bandwidth will still need to go up and at the same time, there's no shortage of a need for wider/faster GPUs.



Final Words

Concluding an article like this is very difficult, mainly because we're still waiting for Barcelona to launch and yet we have now talked about its next two successors. AMD insists that despite Barcelona's delays, its future roadmap will not change. If AMD is able to pull off a Shanghai launch in the second half of 2008, followed by a first half 2009 launch of Bulldozer and Bobcat, we will be quite impressed.



We're sticking to our original take on AMD, it definitely has the roadmap to compete, but we really need to start seeing some actions soon in order to back it up. Roadmaps alone don't make money, as pretty as they may be. And while we hardly ever focus on the financial side of the companies we cover, we can't help but worry for AMD's future given the troubles it has been having recently.

We just want to see this roadmap executed, as the competition will do wonders for the industry.

Log in

Don't have an account? Sign up now