Along with Scott Wasson of The Tech Report and Kyle Bennett of HardOCP, we recently had some time to sit down and talk with AMD's CTO Fred Weber about his vision of the future of microprocessors.  We took the opportunity to compare and contrast his vision with our discussions from this year's Spring IDF that we had written about. 

Be sure to read our IDF Spring 2005 - Predicting Future CPU Architecture Trends article before continuing as it provides a lot of background information necessary for this piece. 

The ILP/TLP Debate in AMD's Shoes

When we talked to Intel at IDF, we had the distinct impression that the focus on improving microprocessor performance as a whole had shifted pretty significantly from ILP to TLP. To put it plain and simple, making individual cores faster was no longer top priority; rather, getting multiple cores to work together was the new focus. 

Weber's stance on ILP vs. TLP tended to agree with what we had heard from Intel; TLP is the future and using ILP to increase performance is at a point of extremely diminished returns.  That being said, we asked Fred where he thought the improvements in ILP would be going forward and he responded with the following four areas:
  1. Frequency
  2. Reducing Memory Latency
  3. Instruction Combining
  4. Branch Prediction Latency
Fred's number one increase for single core, single thread performance was clock frequency, so we will inevitably see that clock speed will go up as time goes on.  It is quite possible that combined with a reduction in branch prediction latency, future versions of the Athlon 64 will use a lengthened pipeline to reach higher operating frequencies.  If paired with Prescott-caliber branch predictors, a somewhat deeper pipelined K8 would provide additional frequency headroom without too much worry. 

Behind clock frequency, Weber saw reducing memory latency as the other major way of increasing single core performance.  Reducing memory latency in this sense basically means two things:
  • higher levels of cache hierarchy, and
  • better prefetching. 
More than once during our conversations with Weber, it became clear that future multi-core AMD processors will continue to have their L1 and L2 caches separate, but a shared L3 cache will eventually be introduced to help reduce memory latency and keep those cores fed. 

To Weber's second point, the use of helper threads (compiler or application generated threads that go out and work on prefetching useful data into cache before it's requested) will also improve single core performance.  Intel has been talking about using helper threads since before Hyper Threading, but there is no idea of when we can expect real world implementation of helper threads at this point. 

The topic of instruction combining was also interesting because it is something that we have only seen used in the Pentium M (Micro-Ops Fusion).  Weber couldn't elaborate on an AMD implementation of some form of instruction combining, but we did get the distinct impression that it's something that's in the cards going forward.  It looks as if elements from both AMD's and Intel's present day architectures will shape tomorrow's designs. 

In the end, Fred left us with the following: if you see single core performance improving at a rate of 40% per 12 - 18 months, it will now improve at about half that rate for the foreseeable future.

Weber’s Thoughts on Cell
POST A COMMENT

35 Comments

View All Comments

  • ceefka - Sunday, April 03, 2005 - link

    I assume this wafer and die stacking will also be used for increasing the GB's per RAM-stick. What else when 64-bit OSs and apps have become the standard? Is there any word from memory manufacturers on that? Reply
  • Athlex - Saturday, April 02, 2005 - link

    AMD seems to be missing the point of pitting Turion against Centrino. Intel's Centrino package requires a P-M, Intel chipset, and Intel wireless. Since most people don't know the diff between P-M and Centrino it's a brilliant way for Intel to move more silicon.

    Also confusing why AMD is using the same packaging for Turion CPUs as they do for normal A64 CPUs. The lowest-power XP-Ms use the smaller socket 563 (Sharp and Averatec systems for example). AMD already has a spec for a smaller 'socket 638' A64, seems like that should be the thin and light version.. C'mon AMD, let's see a real thin and light K8 notebook!
    Reply
  • suryad - Friday, April 01, 2005 - link

    I agree...I cant wait for a dual core FX proc with each core clocked @ 3 GHz...think what a monster system that would be...yikes!! Reply
  • ceefka - Friday, April 01, 2005 - link

    #23 What exactly is ILP/TLP ?

    ILP Instruction Level Parallism
    TLP Thread Level Parallism

    It is explained in one of the CPU articles here on AT.

    Happy surfing.
    Reply
  • BlvdKing - Friday, April 01, 2005 - link

    #26 - I would be torn between an IBM notebook and Turion too. IBM notebooks are amazing - full of features and so durable. Reply
  • cryptonomicon - Thursday, March 31, 2005 - link

    incredibly interesting article by anand.

    it seems like this is the kind of stuff you can only find at anantech.. the info is so in depth right from the source.
    Reply
  • Regs - Thursday, March 31, 2005 - link

    Thank's Anand. With all this Intel news running about, it's good to see AMD isn't just planning to be a bench warmer. Reply
  • Xunilla - Thursday, March 31, 2005 - link

    #25 -- I agree, that is making a generalization that doesn't necessarily apply across the board. Reply
  • Xunilla - Thursday, March 31, 2005 - link

    Reply
  • phaxmohdem - Thursday, March 31, 2005 - link

    I really want to see what kind of Turion notebooks spring forth. It will take a lot though to change my decision on the IBM T42 as my next notebook though. Reply

Log in

Don't have an account? Sign up now