Return of the CISC: Macro-op Execution

The Pentium Pro was Intel's first CPU that finally ended the RISC vs. CISC debates of the early 1990s. To the programmer it was still a x86 CISC machine like every previous Intel processor, but internally once it received its x86 instructions it decoded them into smaller micro-ops to run on a simpler, faster and more efficient RISC core.

By maintaining backwards compatibility with all previous x86 processors Intel was able to leverage one of the major strengths of its CISC architecture (mainly the installed x86 user base) while continuing to evolve by relying on a high performance RISC core.

It turns out that some x86 instructions shouldn't be broken up into smaller micro-ops because they tend to augment each other. With the Pentium M Intel began fusing certain micro-ops into single operations so that they would go down the processor's pipelines atomically, thus saving power and improving efficiency. Intel called this feature micro-op fusion. If two micro-ops were treated as one when going down the pipeline that effectively increased the "width" of the CPU, allowing more instructions to be operated on at once. The internal core was still very much a RISC machine, it was just able to do a little more in certain circumstances.

The Atom takes things one step further and most x86 instructions aren't even broken down into micro-ops internally. As Atom isn't an out-of-order core, it doesn't exactly make sense for it to have tons of micro-ops in flight since it can't reorder them for optimal execution. Furthermore, by keeping most instructions as single operations Intel is able to effectively increase the "width" of Atom.

Instructions that are of the format load-op-store or load-op execution are treated as a single micro-op by Atom's decoder. In other words, if you have an instruction that loads data, operates on it, and stores the result - that's now treated as a single micro-op instead of being broken up into three. The benefit being that there's only a single micro-op that's going down the pipeline, leaving room for another one. Atom may only be a 2-issue architecture, but in certain situations it can behave like a much wider machine.

Intel has spent much of the past decade perfecting its ability to break down x86 instructions into smaller, RISC-like operations and building very high performance cores to deal with these small atomic operations. What's most interesting is that we've now come full circle where in the quest for greater performance per watt Intel is now doing the opposite and not breaking down these x86 instructions in many cases.

Instructions Gone Wild: Safe Instruction Recognition It Does Multiple Threads Though: The Case for SMT
Comments Locked

46 Comments

View All Comments

  • AssBall - Thursday, April 3, 2008 - link

    I was a little surprised at first but when I got thinking... I don't really know if my P35 northbridge is even a 90nm chip.

    - Why would Intel not want to use its old rock solid 130 process that it spent a ton of money on to build simpler parts than it was designed for. As long as their materials and equipment are working fine for a cheap enough 130 part and there is no dire market for lower power chipsets that fit in an ATX standard...

    - Intel's most recent strategy has been to design and manufacture their latest designs on solid existing tech before they shrink it.

    I'm also fairly certain that if this little CPU takes off they will have the 90-65 version of it with some simple refinements out in two shakes of a lamb's tail. I'm really surprised myself that they set it up with ddr2 support... but again its so cheap now, why not?
  • rmlarsen - Wednesday, April 2, 2008 - link

    It is articles like this that make me come back to Anandtech. Well written and researched and with the right level of detail. Keep up the good work!
  • Woodchuck2000 - Wednesday, April 2, 2008 - link

    One of the best I've seen on Anandtech for quite a while. I've been following this one quite closely and it's great to have such a detailed exposition all in one place.

    Do we have any news on the availability of this as a desktop part? I've been looking to construct an always-on server sitting in a cupboard somewhere, just to act as a file/print server and to do a little light database/web hosting for testing and developement. A 1.6GHz Atom would easily provide enough horsepower to accomplish that and with a notebook HDD, the whole thing should stay well under 10W load power consumption!

    I saw a photo of Silverthorne + Poulsbo on an Intel reference board built to a Mini-ITX form factor but couldn't find any details on whether they were planning to release it to the general public...

    Also any news on when we might expect benchmarks?

    Keep up the good work!
  • yyrkoon - Wednesday, April 2, 2008 - link

    You could probably do this now with the VIA pico ITX reference board.

    http://www.logicsupply.com/products/px10000g?refer...">http://www.logicsupply.com/products/px1...=&gc...

    Sorry for the long link . . . but I think power usage is somewhere around ~12W.
  • AnnonymousCoward - Friday, April 4, 2008 - link

    Try tinyurl.com

    Thanks for such a high quality article, Anand!
  • tfranzese - Wednesday, April 2, 2008 - link

    "These days, Intel manufacturers millions of Core 2 Duo processors each made up of 410 million transistors (over 130 times the transistor count of the original Pentium) in an area around 1/3 the size."

    ...is incorrect. You could nearly fit six Core 2 Duos at 45nm in the same area that the original Pentium occupied or even more impressive, four of them on the die of the original Pentium 4 with room to spare.
  • tfranzese - Wednesday, April 2, 2008 - link

    I'm the one with fuzzy logic today and misinterpreted :)
  • Magnus Dredd - Wednesday, April 2, 2008 - link

    It is completely true that there are benefits to having a single platform to support. However, the article is completely off the mark about where the benefits are most realized.

    It's not the HARDWARE.

    It's the API.

    It's all about the API. Unless you're writing drivers or an OS you're not writing to the hardware, with VERY few exceptions. The exceptions to this are for optimizations for seriously intense code like Photoshop filters and video game engines, where 90% or more of the code is to the API. So basically one way or another you're writing to an API. That's Application Programming Interface.

    Since it was mentioned in the article...

    If I'm writing a program that's supposed to run on OSX, the newest version supports TWO hardware platforms (PPC and x86, and not just x86 as the article claimed), and I want to create a "window" using the built in API (named Cocoa) I use the command NSWindow.
    http://developer.apple.com/documentation/Cocoa/Ref...">http://developer.apple.com/documentatio...SWindow_...
    It makes no difference when writing the program whether it's a PPC or Intel based machine that it will be running on with the single exception that in a few places you have to use a small bit of code to make sure that the program uses the byte order appropriate to the processor.

    While I have yet to read info on it, I'd bet that NSWindow is also used by the iPhone which uses a MIPS cpu (yet a third architecture).

    I've written code in ANSI C for Linux that runs without making any changes on PPC, x86(32 bit and 64 bit), and Sparc. If I wanted to go to the trouble I could also compile it on my MIPS based SGI, a Motorola 68000 series Mac, and a HP PA-RISC (if I can ever figure out how to get the damned thing running). That's because nearly all modern applications are compiled.
    ---
    Now if by PC, Anand meant Windows, we're talking a different story, but one with similar flaws.

    So I'm writing an application for Windows and I want to create a "window". I use the win32 command CreateWindow.
    http://msdn2.microsoft.com/en-us/library/ms632679(...">http://msdn2.microsoft.com/en-us/library/ms632679(...

    So lets just say that I want it to run on an Itanium under Windows... I use the win32 command CreateWindow.

    So let's just say that I want to make it run on a WindowsCE based set-top box or internet tablet powered by a MIPS CPU...
    I use the win32 command CreateWindow.
    http://msdn2.microsoft.com/en-us/library/ms908192....">http://msdn2.microsoft.com/en-us/library/ms908192....
    Quoting Wikipedia: "It is supported on Intel x86 and compatibles, MIPS, ARM, and Hitachi SuperH processors."
    http://en.wikipedia.org/w/index.php?title=Windows_...">http://en.wikipedia.org/w/index.php?title=Windows_...
    ---
    I'm actually somewhat saddened to write this post, mostly due to the the amount of respect I have for Anand and many of the great articles he's written over the years. However, I suppose that it goes that way sometimes.

    I just want to make sure that people aren't misled about what makes it hard to port/move a program to another platform like a phone or a BlueRay player. And it's sure as hell not the CPU's instruction set. The CPU may not be fast enough or it's that programs are written for these APIs that may or may not support the hardware. And the fault for this lies with Microsoft, or Apple, or the GTK guys, or Trolltech, or whomever the API belongs to.

    Also simply dropping an x86 CPU into a machine does not mean that it can run Windows. With the sales of XP to cease, your only option for the new batch of supercheap x86 laptops like the ASUS EEE, or the cloudbook may be Linux, regardless of the fact that it's x86 based.

    The bottom line is, if Microsoft doesn't care about your platform, they won't support it and you won't be able to get it with Windows regardless of what the CPU is.

    While I do personally agree that x86 moving "downwards" is a great thing. I just see it taking over for completely different reasons, like Intel's manufacturing prowess.
  • yyrkoon - Wednesday, April 2, 2008 - link

    Windows XP Embedded, and the compact .NET framework while you're talking about 'APIs', and platforms. XPe, and Win2003 Embedded are not going away any time soon, and basically have barely been available on a non beta basis. Although CE Builder could probably do the same thing, and as a matter of a fact I've seen some fairly nifty things done with it(eg: a boot-able image that fits on a floppy with all the functionality of your standard NAS, including User Groups and permission policies).

    While sometimes having an OS on an Embedded device may be a hindrance, there are times it can be quite handy. Bank KIOSKs, and Cash registers are only two such examples, and I have worked on/with both that use WinXPe.
  • Anand Lal Shimpi - Wednesday, April 2, 2008 - link

    Agreed - the API also plays a large part, but for a company like Apple the pain of maintaining both PPC and x86 codepaths is significant. Perhaps the Firefox reference wasn't the best one, especially as I really see the strengths here for software companies like Apple (not to mention what other conventional hardware companies may start looking more like software companies as their devices get more complex).

    I think you've also hit on a major issue going forward: Microsoft is going to have to focus on these not-PCs a lot more seriously in the future. Instead of trying to scale Windows down, it needs a MCE-esque approach to these "fast enough" devices. Apple made the right first step with the iPhone OS, Microsoft can't stand by idle for too long without a good alternative. And MS does love x86... :)

    Take care,
    Anand

Log in

Don't have an account? Sign up now