OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
Comments Locked

174 Comments

View All Comments

  • Jaybus - Monday, May 13, 2013 - link

    In the full Win 8 tablet market, I don't think any low power SoC is going to be adequate to compete against 13 W Ivy Bridge.
  • 1d107 - Tuesday, May 7, 2013 - link

    Did I miss memory bandwidth comparison with A6X? Will it support hi-res displays with acceptable performance? And by performance I mean not playing Angry birds on a so 1366x768 or even 1080p, but smooth scrolling and fast text rendering on a 3840x2400 screen. This would be cool for a descent Windows tablet with an external display attached.

    I'm afraid that by the time Silvermont is released and incorporated into actual products, Apple will have iPad 5 already shipping with A7X chip that will have twice the battery life, while maintaining better performance than A6X. They will need it for the iPad mini, but full-sized iPads will benefit also.
  • fteoath64 - Tuesday, May 7, 2013 - link

    One cannot know what the A7X can deliver but can take a couple of guesses. Here: 1) Optimise Swift further with pipeline shortening but still staying on A9 architecture, 2) Leap to A15 dual core with minimal optimization. On gpu side, it becomes more tricky as Pvr554 being used is Max out at 4 cores, they would have to either jack that up(6 cores ?) or jack up the clock rate.
    Remember that S800 and T4 products are yet to be announced so there is some time to watch the progression.
    Intel's key weakness here is STILL on gpu side. To put 3 cores of PVR 554 would eat a lot of power while giving it respectable performance. Going 1/4 HD4000 is just a dumb idea as the drivers are very bad and will remain so. Again too much power budget to slot in 8EU on SIlvermont quad.
    On thing is for sure: Silvermont is going to make a wicked NAS cpu!.
  • thunng8 - Wednesday, May 8, 2013 - link

    1) Swift is not A9 architecture.
    2) A7X will likely get the next generation PVR graphics chip (SGX Series 6 aka Rogue).
  • nunomoreira10 - Wednesday, May 8, 2013 - link

    considering the power budget, 1/4 hd4000 is quite good
    hd4000 consumes around 10w during games, 1/4 with clock cut down and power improvements we should expect 1-2w which is the max they could allow.
    drivers are good for the games normally played on tablets.
  • BSMonitor - Tuesday, May 7, 2013 - link

    Awesome review! This is the one we have been waiting for from Windows Phone / Windows Tablets!!

    Anand, is it the next Lumia that Intel has scored a design win?? x86 Windows 8 on a next gen Lumia??
  • warezme - Wednesday, May 8, 2013 - link

    Sounds like Intel is going hammer time on the mobile SOC arena. It's gonna get ugly but very interesting.
  • futbol4me - Wednesday, May 8, 2013 - link

    Can someone out there answer a few questions for me?

    (1) If Intel Atom powered tablet were running android, do APPS available on Google Play need to be recompiled for the platform?
    (2) Will a Windows8 Intel Atom powered tablet have enough horsepower to run android effectively as a Virtual Machine?

    Do you think there is enough
  • biertourist - Wednesday, May 8, 2013 - link

    To answer Question #2: Yes. Current Intel Atom tablets can run Android apps ala the "BlueStacks" app currently.
  • rootheday - Thursday, May 9, 2013 - link

    re #1, Android apps written in Dalvik/Java require no recompile because they are compiled against a virtual machine spec. Android apps written as "native" against ARM instruction set -> Intel has implemented a binary translation capability called Houdini that converts them to x86 on the fly and optimizes them in the background.

Log in

Don't have an account? Sign up now