AMD's Hammer Architecture - Making Sense of it Allby Anand Lal Shimpi on October 23, 2001 2:57 AM EST
- Posted in
Going from 'somewhat different' to 'drastic change'
Extending the pipeline by 2 stages will give AMD some additional frequency headroom but at the start of this article we mentioned an increase in IPC, not in clock speed, that would carry the Hammer. But where does this increase in IPC come from?
AMD's Athlon (K7) execution path
One way of increasing IPC would be to increase the number of execution units. The K7 architecture provided the Athlon with three Arithmetic & Logic Units (ALUs - these handle integer math), three Address Generation Units (AGUs - for loads/stores from/to cache) and three floating point units (FPUs - these handle floating point, or decimal math). AMD could have outfitted the Hammer with twice as many ALUs, AGUs and FPUs unfortunately they would not have seen a proportional increase in performance. Keeping the Athlon's execution units busy is a very difficult task; in fact, it's a difficult task for most of today's processors, including the Pentium 4. This is one of the reasons why there is such a large performance benefit to be had by increasing FSB clock since your CPU's execution units can be fed even more data.
Intel's solution to starved execution units is their Hyper-Threading technology that allows a MP aware OS to treat a single Hyper-Threaded processor as two CPUs and send two threads to it simultaneously. The idea behind this is that in most situations, a CPU's execution units are far from being fully utilized and by sending twice as many threads to the CPU you will be making more efficient use of those execution units. Intel expects to see a 10 - 20% increase in performance on regular applications courtesy of Hyper-Threading which is quite believable.
Like Intel, AMD realizes that throwing more execution units at the problem isn't going to solve this issue of increasing performance. Theoretically it may but in the real world things just don't work that way.
The Hammer's Execution Units are no different than the Athlon's
In an interesting but definitely not poorly chosen move, AMD has decided to stick with the K7's 3 ALUs, AGUs and FPUs. Although this may seem far from technical, the justification honestly comes down to "if it ain't broke, don't fix it." We're sure that AMD has done much more extensive profiling on the usage of the Athlon's execution units than we have, but it's safe to assume that the Athlon has no problem of running out of hands to work with.
That's great but we're still back to square one, how does AMD plan to increase IPC on the Hammer?
The answer to that comes in three of the major enhancements over the K7 architecture:
1) integrated memory controller & North Bridge
2) vastly improved branch prediction unit, and
3) what AMD likes to call "large workload TLBs"
There are no cute names for these benefits of the Hammer architecture and we'd have it no other way, so now it's time to dig into what really makes the Hammer special.