Intel's Atom Architecture: The Journey Beginsby Anand Lal Shimpi on April 2, 2008 12:05 AM EST
- Posted in
Fighting Power Consumption...with a Longer Pipeline?
Atom's pipeline is a fairly deep 16 stages, with a 13 stage mispredict penalty. Note that this is longer than the Core 2 Duo's 14 stage pipeline, which is surprising given the low power focus the design team had for Atom.
A 16-stage pipeline complete with 3 instruction fetch and 3 instruction decode stages, more than expected
Longer pipelines are generally associated with greater power consumption especially as of late due to the Pentium 4's tenure. Intel gave us three reasons for the long pipeline:
When faced with a decision between trading off latency for power, the Austin design team always favored keeping power low, even if it meant increasing latencies. Atom doesn't fire the large banks of its caches unless the cache controller knows there's a true hit in the cache, unfortunately this increases the access latency of the cache. In order to keep clock speeds high, these cache accesses have to be further pipelined. The benefit is that power is kept low; Atom keeps things as physically tagged caches to avoid the power burden of a virtually tagged cache.
The same sort of latency tradeoff is made in the decoding stages. Remember the slow vs. fast paths through the decoders? The slow path is higher latency but is guaranteed to properly decode an instruction, the added latency forces Atom to have three decoding stages instead of two.
Finally there are some algorithms in which SMT added a stage or two to the pipeline, the end result being a fairly lengthy pipeline for such a simple CPU. The reasoning however makes sense; there is no NetBurst nonsense here, all of these decisions were made to keep power consumption as low as possible while hitting the right frequency targets. As a fairly simple two-issue core, Atom needs clock speed in order to give us the sort of performance we are expecting of it.