OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
Comments Locked

174 Comments

View All Comments

  • PolarisOrbit - Monday, May 6, 2013 - link

    Re: FSB
    Intel tried to get rid of the FSB several years ago, but it was seen as anti-competitive because they simultaneously locked out 3rd parties like Nvidia Ion. One lawsuit later, Intel was bound to keep the FSB in their low power architectures until 2013 for 3rd party support. Basically Intel wasn't playing fair and Nvidia burned their ship.
  • DanNeely - Tuesday, May 7, 2013 - link

    There was no usable FSB in anything beyond the first series of atom chips. The rest still had it within the die to connect the CPU with the internal northbridge; but the only external interface it offered was 4 PCIe2(?) lanes. ION2 connected to them; not to FSB.
  • Kevin G - Tuesday, May 7, 2013 - link

    Actually Intel is to keep PCI-e on their chips until 2016 by that anti-trust suit. This allows 3rd part IP, like nVidia's ION, to work with Intel's SoC designs.
  • tipoo - Monday, May 6, 2013 - link

    This makes me wonder if companies that make in-house SoCs (I guess Apple in specific, since Samsung also sells them to others while Apple just does it for themselves) will ever switch mobile devices to Intel if they just can't match the performance per watt of this and future Atom cores.
  • tipoo - Monday, May 6, 2013 - link

    Also won't the much anticipated SGX 600 series/Rogue be out by around then? That's the GPU that's supposed to take these mobile SoCs to the 200Gflop territory which the 360/PS3 GPUs are around.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    I would think Apple would (or any company) would want all of their software running on the same architecture/platform if they could.
  • R0H1T - Tuesday, May 7, 2013 - link

    And kill what a billion or so iDevices sold with incompatibility ? Me thinks you dunno what you're talking about !
  • CajunArson - Monday, May 6, 2013 - link

    Did somebody pay you to post that reply? Because if so, they aren't getting their money's worth.

    Silvermont Atoms are targeted at smartphones in 2-core configurations and tablets in the 4-core Baytrail configurations. Their power consumption is in a completely different league than even the low-end Temash parts. Let me reiterate: a Temash with a 4 watt TDP is going to have substantially higher real-world power consumption than even a beefy Baytrail and will likely only compete with the microserver Atom parts where Intel intentionally targets a higher power envelope.

    I'm sure you can't wait to post benchmarks of a Kabini netbook with a higher power draw than Haswell managing to beat a smartphone Atom as proof that AMD has "won" something, but for those of us on planet earth, these Silvermont parts are very interesting and we appreciate hard technical information on the architecture.
  • nunomoreira10 - Tuesday, May 7, 2013 - link

    Jaguar will be available on fanless designs wille haswell wont, you cant realy compare them.
    The facto is intel still doesn't hás a good enougf CPU for a good experiency on a legacy windows 8 fanless design, there is this big hole in the market that AMD is trying to seek.
  • raghu78 - Monday, May 6, 2013 - link

    Intel silvermont is the start of the Intelization of the mobile world. within the next 2 - 3 years Intel should have bagged Apple , Google or Samsung. with the world's best manufacturing process which is atleast 2 - 3 years ahead of other foundries and Intel's relentless tick - tock chip development cadence the ARM crowd is going to be beaten to a pulp. Qualcomm might survive the Intel juggernaut but Nvidia will not.

Log in

Don't have an account? Sign up now