The K8 is here to stay

One of the most interesting points that we came away from our discussion of future AMD architectures was Weber's stance that the K8 execution core is as wide as they are going to go for quite some time.  Remember that the K8 execution core was taken from the K7, so it looks like the execution core that was originally introduced in the first Athlon will be with us even after the Athlon 64. 

What's even more interesting is that Intel's strategy appears to confirm that AMD's decision was indeed the right one. After all, it looks like the Pentium M architecture is eventually going to be adapted for the desktop in the coming years.  Based on the P6 execution core, the Pentium M is inherently quite similar (although also inferior) to the K7/K8 execution core that AMD has developed.  Given that Intel is slowly but surely implementing architectural features that AMD has done over the past few years, we wouldn't be too shocked to see an updated Pentium M execution core that was more competitive with the K7/K8 by the time that the Pentium M hits the desktop. 

Fred went on to say that for future microprocessors, he's not sure if the K8 core necessarily disappears and that in the long run, it could be that future microprocessors feature one or more K8 cores complemented by other cores.  Weber's comments outline a fundamental shift in the way that microprocessor generations are looked at.  In the past, the advent of a new microprocessor architecture meant that the outgoing architecture was retired - but now it looks as if outgoing architectures will be incorporated and complemented rather than put out to pasture.  The reason for this reuse instead of retire approach is simple - with less of a focus on increasing ILP, the role of optimizing the individual core decreases, and the problems turn into things like: how many cores can you stick on a die and what sort of resources do they share? 

In the past, new microprocessor architectures were sort of decoupled from new manufacturing processes.  You'd generally see a new architecture debut on whatever manufacturing process was out at the time and eventually scale down to smaller and smaller processes, allowing for more features (i.e. cache) and higher clock speeds.  In the era of multi-core, its the manufacturing process that really determines how many cores you can fit on a die and thus, the introduction of "new architectures" is very tightly coupled with smaller manufacturing processes.  We put new architectures in quotes because often times, the architectures won't be all that different on an individual core basis, but as an aggregate, we may see significant changes. 

How about a Hyper Threaded Athlon?

When Intel announced Hyper Threading, AMD wasn't (publicly) paying any attention at all to TLP as a means to increase overall performance.  But now that AMD is much more interested and more public about their TLP direction, we wondered if there was any room for SMT a la Hyper Threading in future AMD processors, potentially working within multi-core designs. 

Fred's response to this question was thankfully straightforward; he isn't a fan of Intel's Hyper Threading in the sense that the entire pipeline is shared between multiple threads. In Fred's words, "it's a misuse of resources."  However, Weber did mention that there's interest in sharing parts of multiple cores, such as two cores sharing a FPU to improve efficiency and reduce design complexity.  But things like sharing simple units just didn't make sense in Weber's world, and given the architecture with which he's working, we tend to agree. 

Weber’s Thoughts on Cell An Update on Turion and Final Words
Comments Locked

35 Comments

View All Comments

  • Filibuster - Thursday, March 31, 2005 - link

    ...but...its HYPER!

    #11 it can also decrease performance by 10-50% depending on the application. Clearly it matters what you're doing with your PC.

    http://www.digit-life.com/articles/pentium4xeonhyp...

    I think Fred is talking about the inconsistant gains/losses. Its not the best way to spend transistors.
  • fitten - Thursday, March 31, 2005 - link

    #13, HT is kind of like hardware allowing context switching at instruction speed levels. Tyipcally, a thread that stalls on IO (like a hard drive) or something gets swapped out and another thread runs until the IO request completes. However, if a thread just can't use a cache well (streaming data, for example) all of those stalls due to memory loads just cause the CPU to sit and wait. These stalls are on the order of 10s of clock cycles. Other IO is on the order of 1000s of clock cycles (or more). A context switch is on the order of 100s of clock cycles. Obviously, you don't want to swap threads just because of a L2 cache miss. However, HT allows two thread contexts to be loaded so that when one thread stalls on a L2 cache miss, for example, the other thread can execute instructions with no delay. It's like shuffling cards. Basically, it allows the CPU to execute two contexts on the granularity of a clock cycle or two rather than on 100s of clock cycles.

    So, as an example, the worst case for a thread is that every piece of data it wants will generate an L2 cache miss. On a non-HT processor, this means that this thread will not be swapped out until its scheduling quantum is met. But, during that time, the CPU will in effect be idle for probably 90% of the time due to all the cache misses. Since the thread won't be swapped out, your CPU will effectively be used for only 10% of the time during that quantum, then the next thread is allowed to run. With HT, both threads are loaded and those 90% of the cycles that the "bad" thread would waste can actually be used by the other thread.
  • xtknight - Thursday, March 31, 2005 - link

    #11-not sure what you mean by "processing efficiency". all HT does is virtually separate the processor into two threads. maybe I'm missing something, but I can't figure out why everyone associates HT with performance gain.
  • PeteRoy - Thursday, March 31, 2005 - link

    The future of processors is Software that make use of them.
  • hectorsm - Thursday, March 31, 2005 - link

    Does anyone know why Fred thinks that HT is a misuse of resources?

    Doesn't HT increase processing efficiency by 10-30%?

    Sounds to me like he got it backward.
  • xsilver - Thursday, March 31, 2005 - link

    Could it be that possibly the reason for the slowdown in clock increases is not due to AMD/Intel R&D but rather software companies that are not keeping up.... As far back as I can remember many programs were able to utilize the new speed increases effectivly whereas now, a budget "3000" cpu is already kinda overkill for many office apps....
    gaming is the only arena where the software is pushing the hardware (maybe video editing too but that market is much smaller?)

    there needs to be more innovation on the software front to utilize the added hardware benefits... is a positive reinforncement routine....
    If there was that push, I have no doubt that the speed increases would happen at a much better rate
  • Calin - Thursday, March 31, 2005 - link

    An architecture with several cores, with one more powerful than the others, requests the programmer to tell to each thread what kind of performance it needs. While this could be accepted by console developers (that work very close to the hardware layers), you can say bye bye to easy porting to that platform.
    while the performance increase can be substantial, the trade off is very specific code even at the highest level
  • Jeff7181 - Thursday, March 31, 2005 - link

    Comment WAS mad on HT...

    "Fred’s response to this question was thankfully straight forward; he isn’t a fan of Intel’s Hyper Threading in the sense that the entire pipeline is shared between multiple threads, in Fred’s words 'it’s a misuse of resources.'"
  • Zebo - Thursday, March 31, 2005 - link

    Wish some comment was made on HT, intel only real saving grace for last couple years. Guess with DC it becomes a non-issue though at that point.


    Hehe nice to see CPU world going full circle... AMD copied Intel like nobodies biz now it's the other way around. Props to AMD for innovating dispite thier punny size.. they definity should be rewarded by sales. I know I made the right choice with A64, the latency he mentions you can feel all the time, hard to "benchmark" it other than system just feels snappy compared to any other CPU I've used to including a P4C oC'ed to 3.4, A-XP OCed to 2.7, and IBM chips from apple at 2.5.
  • bupkus - Thursday, March 31, 2005 - link

    Reduced processor complexity is a step neither manufacturer is willing to take.
    OR
    Neither manufacturer appears willing to reduce processor complexity.

Log in

Don't have an account? Sign up now