Final Words

While AMD touched on an incredibly vast amount of technology and data over the course of their 3 hour webcast, the depth of each branch was not nearly enough to satisfy our tastes. We are in the process of scheduling briefings with as many AMD engineers as possible in order to get our questions answered, and we will certainly report on the details of our research as soon as we are able. Hopefully next week's Computex will be very fruitful on the AMD front.

We can't be too upset over the lack of detail though. In fact, for a day designed around presenting technology to analysts, AMD was pretty heavy on the technology and architecture. Now that they've officially confirmed some of the key features of their next gen processor and platform technology, we certainly hope they will be able to back up their claims with real architectural data on the hardware.

In the meantime, we can all dream sweet dreams over the possibilities AMD's Torrenza presents. Giving expansion cards the bandwidth and low latency of an HTX connection with the ability to support coherent HyperTransport will enable hardware vendors to create a new class of expansion card. Though AMD likes to call these "accelerators," we'll try our best to steer clear of buzz words and marketing speak. Suffice it to say that giving hardware vendors the capability of accessing any CPU or memory in the system directly with cache coherency should really shake things up. The advantages are probably most apparent to the HPC market, where HTX can offer an easy and standard way to add custom FPGAs or very specialized hardware to a massive system. However, there are absolutely advantages out there for those who want to build hardware to really work in lock-step with the CPU.

This applies directly to companies like AGEIA with their PhysX card which, when used in a game, must communicate bi-directionally with the CPU before a frame can be sent to the GPU for rendering. Additionally, GPU makers could easily take advantage of this technology to tie the graphics card even more tightly to the CPU and system memory. In fact, this would serve to eliminate one of the largest differences between PCs and game consoles. The major advantage that still remains on console systems (aside from their limited need for backwards compatibility compared to the PC) is the distance from the CPU to the GPU. There is huge bandwidth and low latency between these two subsystems in a console, and many games are written to take advantage of (or even depend on) the ability to actively share the rendering workload between the CPU and GPU on a very low level. Won't it be ironic if we start seeing high performance Xbox 360 and PS3 emulators only a couple years after their release? This is the kind of thing that could make it possible.

With Torrenza and the introduction of 4x4 in the consumer space, it seems clear that AMD will be offering consumer level CPUs with multiple external coherent HyperTransport channels. As the lack thereof has been the only limitation keeping us from building multiple processor systems with consumer products, we have to wonder how AMD will really differentiate its server and workstation parts this time around. Out of the gate, the K8L Opteron will be a 4 core part, while the desktop chip will only have 2, but eventually the desktop will support 4 cores as well. Will we start to see more specialized hardware "accelerators" on Opteron chips, or will we see more I/O oriented modules? Will HT-3's link unganging to allow 2 8bit links for every 16bit link only be available on the high end parts? AMD's leadership in performance in the 2P and 4P workstation market has been very solid since the beginning of Opteron, and we are excited to see the ways AMD will attempt to continue this trend.

The final word on AMD's Analyst Day? Performance. It's pure and simple, and AMD is all about it. On the high end it's 4x4 or 8 coherent HT links, and on the mobile side, its performance per Watt. By 2008, AMD hopes that 1/3 of the market place will let the world know that they've still got solid performance for the mainstream at good prices as well. The next gen CPU market will certainly be exciting to watch.

K8L Architecture
Comments Locked

40 Comments

View All Comments

  • MrKaz - Monday, June 5, 2006 - link

    Did you talk someone at AMD if they have some one interested (or going to do) some SQL accelerator, or CAD calculations accelerator, or even multimedia accelerator accelerator?

    It would be nice to boost the performance of SQL by 2X, or even media encoding from minutes to seconds...
  • DerekWilson - Tuesday, June 6, 2006 - link

    they are certainly talking heavily about the possibility of hardware like that, but no hardware designers have commited to building anything yet.
  • IntelUser2000 - Sunday, June 4, 2006 - link

    quote:

    As with K8, K8L will have 3 ALUs (arithmetic logic units) and 3 AGUs (address generation units). Combined with cache enhancements and the new ability to reorder loads, K8L has a shot at outpacing Core in integer performance.


    No. Because Core Duo(Yonah) with inferior decoder configuration, inferior memory bandwidth(which won't matter a lot but will make slight difference) and platform, still manages to outperform the current K8's. The Pentium M, which is even worse than Core Duo(slightly) still manages to outperform the K8's in integer. Now put Core with integrated memory controller, and comparison will look like Core Duo against Athlon XP.

    Core microarchitecture will exceed K8's in general integer architecture, and at least equal in K8L's ability. Integer superiority is still gonna be there, K8L will be faster than Core in FP and SSE performance because of low latency integrated memory controller with lots more real-world bandwidth(well that depends on how AMD implements SSE, Intel may still have an advantage if AMD puts a poor implementation like they did with Athlon XP's SSE, or at least it looked poor).
  • JarredWalton - Sunday, June 4, 2006 - link

    If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.

    At present, AMD is saying K8L is going to have the ability to reorder Loads. They might only do minor reordering, or they might go so far as to have something similar to Conroe's memory disambiguation. Given that AMD hasn't done a major update to K8 in over 3 years (no, DDR2 controller and going dual core don't really count as major updates to the underlying architecture), K8L could be a lot of things. It migth only match Core Duo 2 on a clock-for-clock basis; it might fall short; it might even come out ahead. Also, there has been no indication that Intel is seriously planning on-die memory controller in the near future, probably to continue to protect their chipset market.

    Personally, I really hope AMD manages to basically match CD2 performance, because runaway performance leads don't help the consumer. In the end, theoretical integer, PF, SSE, etc. performance isn't as important as real-world application performance. Right now, it's just too soon to declare a victor in the Core Duo 2 vs. K8L match-up. CD2 vs. K8 is already pretty much a done deal, though, and there's no indication that AMD will be able to come out on top in that rivalry. K8L is their "counterattack", and that's the architecture that needs to compete with CD2.
  • IntelUser2000 - Sunday, June 4, 2006 - link

    quote:

    If ~33% of all instructions are Loads, and K8 pretty much totally lacks the ability to reorder Loads, adding that feature could substantially boost performance. It definitely "has a shot" at beating Core, but it may also fall short. Anyone making blanket statements one way or the other - i.e. it *will* beat Core, or it *won't* come close - needs to take a step back and check what they really know and what they are just assuming.


    It's easy to see the performance in integer against Core. Core has ability to reorder loads, but Core Duo is in same situation as K8, it doesn't really have the ability either. Other than that, on the basic block diagram, K8 is superior architecturally to Core Duo, yet Integer performance is somewhat better on Core Duo. The difference probably goes deeper than that. One of the articles mention K7/K8 has similar technique to Intel's micro op fusion. It could be Intel's is much better, etc. If a K8 with substantially better microarchitecture(+ODMC) can't beat integer performance of Core Duo, will K8L with basically same microarchitecture(or may be worse) beat Core?? It's simple to see it probably won't.
  • DerekWilson - Tuesday, June 6, 2006 - link

    core duo can reorder loads as the Pentium M could reorder loads --

    http://anandtech.com/cpuchipsets/showdoc.aspx?i=27...">http://anandtech.com/cpuchipsets/showdoc.aspx?i=27...

  • MrKaz - Monday, June 5, 2006 - link

    P3 on steroids may beat the K7 on steroids in performance.
    But performance isn’t everything or Intel employees where out of job since K7 came out and beat P3 and P4. And Intel didnt recover yet!

    I didn’t see any presentation of Intel new architecture, but I bet even the Hammer look better than any thing Intel will release.
    http://www.amd.com/us-en/assets/content_type/Downl...">http://www.amd.com/us-en/assets/content...ableAsse...

    4MB cache, 128bit SSE that tells me nothing. Other than the P3 started with PC100 SDRAM, 256Kb cache and SSE and it's now at DDR2 667, 4MB cache and SSE4.
  • Sceptor - Saturday, June 3, 2006 - link

    Finally a real interconnect that can be used for a serious co-processor...perhaps a physics co-pro not limited by the PCI bus would help smooth transition to more realistic games.

    Or a dedicated video co-pro to use with Cad or 3D Modeling programs...
  • od4hs - Friday, June 2, 2006 - link



    http://images.anandtech.com/reviews/cpu/amd/analys...">http://images.anandtech.com/reviews/cpu/amd/analys...


    -> UK firm to unveil wall-socket PC

    The Jack PC thin client fits into a wall socket and is so energy-efficient it can get its power over Ethernet

    http://news.zdnet.co.uk/0,39020330,39272166,00.htm">http://news.zdnet.co.uk/0,39020330,39272166,00.htm
  • lopri - Friday, June 2, 2006 - link

    I totally agree that the "direct connect" is the most desirable way but I cannot help but think AMD is somewhat daydreaming. That is, what's showing in the slides seems way ahead of today's "practicality".

    I mean, we've had this PCI Express which has been strongly pushed by core logic vendors, but so far all we practically have are video cards. I sometimes think all these mobo makers pay more attention to "asthetic" point when they design PCI-E slots so the boards look prettier. (lol)

    If my understanding is correct, AMD will introduce a new type of slot, HTX, on motherboards. Will other technology/market follow? Or will it just give another chance to graphics card manufacturers to push us to buy new cards? On today's desktop boards, basically everything is "integrated", sans video. I know that a video card has its own core and frame buffer, and transfers data via Hyper Transport, but if a physics card can utilize the HTX, what stops a video card from connecting directly to CPU, without passing the core logic or system memory?

    I think this will also be closely related to the available bandwidth of HTX per CPU core (or cores), and I can't really think of any add-in board that'll prioritize the bandwidth other than video cards, (OK and the physics cards) even though the HTX will be an open standard. (look at the lazy/lame Creative)

    A very desirable case would be where storage (hard disks) can take advantage of this "direct" connection but then again there is a such thing called "memory", so my imagination stops there. (maybe solid-state/I-Ram type of storage can make use of the HTX? Then what's the use of memory? Taking care of I/O?) Talking about I/O, I just thought it'd be interesting to see keyboards/mice connect to CPU via HTX. (Sorry I couldn't resist)

    All in all, like the article says, this roadmap seems just too broad/ambiguous/futuristic. I'm not a CPU engineer so my thinking could be totaly off, though. If so, please enlighten.

    lop

Log in

Don't have an account? Sign up now