K8L Architecture

At the outset, we hoped we'd have a very large section on AMD's new architecture. After our whirlwind of a three hour briefing, we aren't that much further along on the K8L architecture front than we were before. We've had some things confirmed by a few slides, but AMD didn't spend much time on these details. Over the next few days we will be sitting down with AMD and gathering as much detailed information about K8L as we can, but for now we can offer an overview of what we already know and have gathered from the slides we've seen.

The first K8L chips will be fabbed on a 65nm SOI process jointly developed by IBM and AMD, and manufactured at AMD facilities in Dresden. AMD has implemented a more modular approach to designing their next gen CPUs this time around in order to more easily meet the demands of a market craving ever increasing support for multicore technology. While CPUs are traditionally very hand tuned and designed on a low level, it appears AMD has taken an extremely object oriented approach to CPU design. The interfaces between different parts of the CPU are very strongly defined and it is possible for AMD to mix and match components as necessary.



This type of approach makes a lot of sense in today's world. Designing processors without the need to reengineer the entire CPU from the ground up in order to add another compute core, HT link or (maybe) another memory controller is a stroke of brilliance. Dual and quad core systems don't need 2 or four of everything, but needs do change depending on the application targeted by the hardware. Hopefully AMD will use this technology to enable the delivery of changing CPU configurations much the same way we see clock speeds and cache sizes change over time today.

On a very slightly lower level architecture side, we have a slide showing the overview of AMD's next server class processor with 4 cores based on K8L. Features include a shared L3 cache, "enhanced IPC" cores, OoO (Out of Order) loads, wider data paths, HT-3 (the third version of HyperTransport), and support for DDR2 (and DDR3 or FBDIMMS in the future). Details on some of these enhancements were way too light, especially on the IPC (Instructions Per Clock) front.



Cache enhancements include the capability to support 2x128-bit loads per cycle from the 64k L1 cache (which is half the size of the K8 L1 cache), and a shared L3 cache which will scale up from its introduction at 2MB. The shared L3 cache will help with features like node interleaving on multiprocessor systems as well as multithreaded apps which make use of shared data. We are still waiting for more detailed data on the cache architecture. It isn't clear whether the caches are all exclusive, and we would like to know more about associativity as well.

At a lower level, we have a block diagram of the compute core for K8L CPUs. Again, this diagram is a bit oversimplified, but we can see a few key features of the architecture. On the FP side, the CPU is able to handle 2x128-bit floating point or SSE operations per clock. While this isn't quite as flexible as Intel's Core with its 3 SSE units, AMD's K8L will be able to handle 4 double precision floating point operations per clock. . (Current K8 chips can only do 1x128/2x64-bit SSE instructions per clock.)

As with K8, K8L will have 3 ALUs (arithmetic logic units) and 3 AGUs (address generation units). Combined with cache enhancements and the new ability to reorder loads, K8L has a shot at outpacing Core in integer performance. Of course, we do still need more detail in this area to understand fully what's going on. No doubt, if AMD is claiming the ability to reorder loads, they can absolutely move loads ahead of loads (as this is the easiest case to handle). Where things get interesting is in the ability to move loads ahead of stores. Intel's Core architecture features some very interesting prediction technology in determining whether or not to move a load before a store. We haven't received an answer from AMD on whether they will tackle moving loads ahead of stores at all, let alone how they will handle memory disambiguation and/or prediction. In the past, we've seen a kind of "simpler is better" approach from AMD, so it will be interesting to see which direction K8L has taken.



When it comes to processor interconnect technology, AMD has led Intel since the introduction of the Opteron. With K8L comes a very interesting enhancement to the interconnect architecture: each of the four 16-bit HyperTransport links can be split into two 8-bit HyperTransport links. Apparently, each of the resulting eight 8-bit HT links will be coherent and will allow a direct connection to another processor. In large systems, this means direct access from one core to seven others plus I/O, resulting in the possibility of fully connected 8-way systems. In a quad core world, that would be 32 cores on one platform. AMD also indicates that these HT connections can be used to easily scale blade implementations as well.



AMD mobile processors will also benefit from enhancements to HyperTransport with link power management. Not only will the new dual core 65nm K8L Turion processors be able to throttle cores independently, but even the HT links can be powered down when not in use. These enhancements will go a long way towards expanding AMD's mobile capabilities, especially if the K8L architecture can deliver better performance per Watt than the K8 before it. Compared to NetBurst architectures, K8 may as well have been an icebox, but that all changed with the introduction of Banias, Dothan, Yonah and now Core technology. Intel is bringing the fight to AMD, and K8L will need to deliver on the power front in order to remain competitive. The only market segment that really throws power to the wind is the extreme enthusiast (to which AMD's 4x4 initiative will certainly cater), but volume business will require an eye to the efficient.





To round out what we learned about K8L architecture, here are the roadmap slides of technology AMD plans to roll out over the next three years.





Platform Strategy: 4x4, Torrenza, Trinity, and Raiden Final Words
Comments Locked

40 Comments

View All Comments

  • Squidward - Friday, June 2, 2006 - link

    umm, that should be astronomical

    the message is clear... my typing has failed!
  • Hulk - Friday, June 2, 2006 - link

    :|
  • PrinceGaz - Saturday, June 3, 2006 - link

    Conroe is already obsolete because K8L will grind it into the dirt. Anyone who buys a Core 2 Duo this year is wasting their money because AMD's K8L is better in every way. There's no point upgrading now unless you are stuck with a rubbish last-generation netburst processor like a Northwood or Prescott, because it's clear that K8L will totally annhilate Core 2 Duo and its successors. But if Intel fanbois want to waste their money on Core 2 Duo, that's fine. A little bit of competition from Intel before AMD strike their devestating counter-attack next year will ensure AMD don't cut corners in the K8L design.

    The best way to sum up the next year in CPUs is: Intel manage to gain a slight lead in the second half of 2006 and early 2007, but after that it will be AMD r0><0rs and Intel is teh su><0rs again!!!111

    P.S. The above is not meant to be taken entirely seriously ;) though I do believe from what we've seen that K8L should be a bit ahead of what Intel have next year, if nothing else because of its integrated memory controller.
  • JumpingJack - Monday, February 5, 2007 - link

    quote:

    Conroe is already obsolete because K8L will grind it into the dirt.


    Really, where can I buy a K8L then??
  • MrKaz - Monday, June 5, 2006 - link

    Tell me what does conroe, woodcrest and meron bring new to the market?

    Even the 5 years old amd hammer architecture presentation looks better than conroe, ..., ...
    http://www.amd.com/us-en/assets/content_type/Downl...">http://www.amd.com/us-en/assets/content...ableAsse...
  • Darth Farter - Friday, June 2, 2006 - link

    :|
  • xTYBALTx - Friday, June 2, 2006 - link

    Great article, but like everyone else I am interested in how 4x4 will improve gaming performance. Guess we'll have to wait and see.
  • Calin - Monday, June 5, 2006 - link

    More than two graphic cards will improve performance (over just two) only in the most insane resolutions (like 3000 by 2000 pixels). As for the use of four cores, there certainly exists - just not in the games right now. Even two cores won't bring a big boost in game performance - as of now. Who knows, maybe games ported from PS3 and XBox 360 will use them (hopefully)
  • Regs - Friday, June 2, 2006 - link

    I can kind of figure it out for myself but I wanted to make sure - what is cache coherency? Either way, Torrenza looks very interesting and very promising. Not only will AMD be delivering good competitive performance, but has a chance to unlock a whole new path in bringing a new standard through integrated computing.

    Hope you find out the goods for us Derek. Keep up the good work!
  • Ryan Smith - Friday, June 2, 2006 - link

    To go with the shortest description, cache coherency is the catchall term for methods used to organize and inform multiple processors of cache changes in multiprocessor/multicore systems. Because both processors can work on the same data set at once, if one changes the data, the other needs to be intelligently informed about this, otherwise it will likely do something incorrectly.

Log in

Don't have an account? Sign up now