Return of the CISC: Macro-op Execution

The Pentium Pro was Intel's first CPU that finally ended the RISC vs. CISC debates of the early 1990s. To the programmer it was still a x86 CISC machine like every previous Intel processor, but internally once it received its x86 instructions it decoded them into smaller micro-ops to run on a simpler, faster and more efficient RISC core.

By maintaining backwards compatibility with all previous x86 processors Intel was able to leverage one of the major strengths of its CISC architecture (mainly the installed x86 user base) while continuing to evolve by relying on a high performance RISC core.

It turns out that some x86 instructions shouldn't be broken up into smaller micro-ops because they tend to augment each other. With the Pentium M Intel began fusing certain micro-ops into single operations so that they would go down the processor's pipelines atomically, thus saving power and improving efficiency. Intel called this feature micro-op fusion. If two micro-ops were treated as one when going down the pipeline that effectively increased the "width" of the CPU, allowing more instructions to be operated on at once. The internal core was still very much a RISC machine, it was just able to do a little more in certain circumstances.

The Atom takes things one step further and most x86 instructions aren't even broken down into micro-ops internally. As Atom isn't an out-of-order core, it doesn't exactly make sense for it to have tons of micro-ops in flight since it can't reorder them for optimal execution. Furthermore, by keeping most instructions as single operations Intel is able to effectively increase the "width" of Atom.

Instructions that are of the format load-op-store or load-op execution are treated as a single micro-op by Atom's decoder. In other words, if you have an instruction that loads data, operates on it, and stores the result - that's now treated as a single micro-op instead of being broken up into three. The benefit being that there's only a single micro-op that's going down the pipeline, leaving room for another one. Atom may only be a 2-issue architecture, but in certain situations it can behave like a much wider machine.

Intel has spent much of the past decade perfecting its ability to break down x86 instructions into smaller, RISC-like operations and building very high performance cores to deal with these small atomic operations. What's most interesting is that we've now come full circle where in the quest for greater performance per watt Intel is now doing the opposite and not breaking down these x86 instructions in many cases.

Instructions Gone Wild: Safe Instruction Recognition It Does Multiple Threads Though: The Case for SMT
POST A COMMENT

46 Comments

View All Comments

  • adntaylor - Tuesday, April 08, 2008 - link

    On that chart with price / power, you need to be clearer...

    For price, you show the combined price for CPU + Chipset. For power, you say just the CPU... so 0.65W for the CPU... but you're conveniently ignoring the >2W figure for the chipset!!! This absolutely flatters Intel wherever possible.

    AMD are just as misleading - they describe the Geode LX as "1W" which excludes the non-CPU core parts of the chip (which is an integrated CPU + GMCH)

    Just please be honest - the figures are out there in the Intel datasheets... it takes 10 minutes to check.
    Reply
  • Clauzii - Friday, April 04, 2008 - link

    I still have a PowerVR 4MB addon card, runnung in tandem with a Rage128Pro. Quite a combination w. 15 FPS in Tombraider. Constant(!) 15FPS, that is..

    Amazing what they actually achieved back in 95!
    Reply
  • Clauzii - Friday, April 04, 2008 - link

    Ooops!

    Totally misplaced that. Sorry.
    Reply
  • wimaxltepro - Friday, April 04, 2008 - link

    The Atom represents a shift in processor architecture that is the most dramatic departure for Intel since introduction of x86 processors... the philosophy of how computing itself occurs from centralized processors to distributed processing based on an extension of the popular x86 instruction set.

    The Atom is not about the immediate prospects for the Atom or Nehalem products: we will likely see members of Intel's new product family be used in embedded applications in consumer products and in areas where specialized communications processors are more the rule. While not optimized for use in specific networking applications, the products capitalize on the wide range of support available in IT/Networking to develop common functions that leverage the low cost, low power/processing capability to be used as a common denominator for a wide range of applications.

    Intel has been built on the 'Wintel' architecture: massively integrated chips needed to handle the massively integrated operating systems and applications of Windows (and Apple) environments. The Atom allows migration and broadening out from that architectural motif to a very highly distributed architecture. So, the increased parallelism found in the internal chip architecture is enabling of changes in external system architectures and device applications that go well beyond the typical domain of Intel.. and right into the domain of 'personal wireless broadband' and SDWN, Smart Distributed Wireless broadband Network.

    The decisions about in-order vs. out of-order instruction streams, memory architecture, I/O architecture have been made in light of the broad vision for how computing, networking and, out of hand, how wireless enabled broadband networking including WiMAX will occur. This should be understood for what it represents as a shift in direction for Intel both in response to broad industry shifts and as a trend setting development.
    Reply
  • jtleon - Friday, April 04, 2008 - link

    Thanks to all the flash player ads, etc., a mobile web device will continuously avoid switching to low power states. Thus one could argue that advertising will be carbon footprint enemy of the internet's future. This is already becoming the case for desktop/laptop machines.

    Without such continuous (arguably wasted) consumption of CPU power, then Intel's engineered power management might have a significant impact on the value of the Atom.

    Regards,
    jtleon
    Reply
  • 0WaxMan0 - Friday, April 04, 2008 - link

    I am definatly much impressed and enthused by intels work here, the future looks interesting esp for those of us who like low power cross compatible computing products.

    However I have to point out that a low power modern x86 cpu has allready been done infact 4 years ago with AMD's Geode. While technically much weaker than the Atom and with out any where near the scalability (single core design etc.) the Geode has been available in the same TDP ranges for a good long while. Take a look here http://www.amdboard.com/geode.html">http://www.amdboard.com/geode.html for some old stuff.

    I do hope that the Intel name and hype makes more of an impact than AMD managed.
    Reply
  • whycode - Thursday, April 03, 2008 - link

    Does the TDP quoted include the chipset? Or is that CPU only? Reply
  • IntelUser2000 - Thursday, April 03, 2008 - link

    Anand, the Pentium M does not feature Macro Ops Fusion. Its Core 2 Duo that started Macro Ops Fusion. Reply
  • Anand Lal Shimpi - Thursday, April 03, 2008 - link

    You're correct, I was referencing micro-op fusion. I've made the appropriate correction :)

    Take care,
    Anand
    Reply
  • squito - Wednesday, April 02, 2008 - link

    Am I the only one shocked to see that Poulsbo is a 130nm part... Reply

Log in

Don't have an account? Sign up now