ISA

The original Atom processor enabled support for Merom/Conroe-class x86 instructions, it lacked SSE4 support due to die/power constraints; that was at 45nm, at 22nm there’s room for improvement. Silvermont brings ISA compatibility up to Westmere levels (Intel’s 2010 Core microprocessor architecture). There’s now support for SSE4.1, SSE4.2, POPCNT and AES-NI.

Silvermont is 64-bit capable, although it is up to Intel to enable 64-bit support on various SKUs similar to what we’ve seen with Atom thus far.

IPC and Frequency

The combination of everything Intel is doing on the IPC front give it, according to Intel, roughly the same single threaded performance as ARM’s Cortex A15. We’ve already established that the Cortex A15 is quite good, but here’s where Silvermont has a chance to pull ahead. We already established that Intel’s 22nm process can give it anywhere from a 18 - 37% performance uplift at the same power consumption. IPC scaling gives Silvermont stable footing, but the ability to run at considerably higher frequencies without drawing more power is what puts it over the top.

Intel isn’t talking about frequencies at this point, but I’ve heard numbers around 2 - 2.4GHz thrown around a lot. Compared to the 1.6 - 2GHz range we currently have with Bonnell based silicon, you can see how the performance story gets serious quickly. Intel is talking about a 50% improvement in IPC at the core, combine that with a 30% improvement in frequency without any power impact and you’re now at 83% better performance potentially with no power penalty. There are other advantages at the SoC level that once factored in drive things even further.

Real Turbo Modes & Power Management

Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. This lack of flexibility even impacted the SoC at the CPU core level. When running a single threaded app, Medfield/Clover Trail/et al couldn’t take thermal budget freed up by the idle core and use it to drive the frequency of the active core. Previous Atom implementations were basically somewhere in the pre-Nehalem era of thermal/boost management. From what I’ve seen, this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals. To the best of my knowledge, none of the SoC vendors today actively implement modern big-core-Intel-like frequency management. Silvermont fixes this.

Silvermont, like Nehalem and the architectures that followed, gets its own power control unit that monitors thermals and handles dynamic allocation of power budget to various blocks within the SoC. If I understand this correctly, Silvermont should expose a maximum base frequency to the OS but depending on instruction mix and available TDP it can turbo up beyond that maximum frequency as long as it doesn’t exceed TDP. Like Sandy Bridge, Silvermont will even be able to exceed TDP for a short period of time if the package temperature is low enough to allow it. Finally, Silvermont’s turbo can also work across IP blocks: power budget allocated to the GPU can be transferred to the CPU cores (and vice versa).

By big-core standards (especially compared to Haswell), Silvermont’s turbo isn’t all that impressive but compared to how things are currently handled in the mobile space this should be a huge step forward.

On the power management side, getting in and out of C6 should be a bit quicker. There's also a new C6 mode with cache state retention. 

Sensible Scaling: OoO Atom Remains Dual-Issue The Silvermont Module and Caches
Comments Locked

174 Comments

View All Comments

  • xTRICKYxx - Tuesday, May 7, 2013 - link

    You're right. Intel has nothing to show at all.... Its not like they have the most powerful mobile and desktop consumer processors available.
  • R0H1T - Tuesday, May 7, 2013 - link

    Yeah, now sit & watch that market(x86) die a slow death at the hands of mobile/tablets that are powered by "good enough" ARM which doesn't need teraflop level of performance to sell their stuff unlike Intel !
  • misaki - Monday, May 6, 2013 - link

    Wow, clearly you are a new reader. This is an architecture overview, not a performance article, which means the information HAS to come from Intel. They have done these type of articles with every architecture redesign since the 90's.

    When chips are available to test that is when the real world performance articles will come out.
  • Ortanon - Monday, May 6, 2013 - link

    SERIOUSLY.
  • kyuu - Monday, May 6, 2013 - link

    Yes, but a lot of performance claims are being made in the article, and Anand really seemed to just be taking Intel's marketing speak for gospel. That's how it read, at least.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    Not really. He clearly states to take the graphs cautiously. Also Intel may be slightly misleading, but nothing in the graphs are lies. They chose the best possible scenario for the greatest advantage.
  • R0H1T - Tuesday, May 7, 2013 - link

    Like how they(AT) claimed Intel's "SDP" was superior after stress testing an Exynos Octa, yup loved that fairytale !
  • Kevin G - Monday, May 6, 2013 - link

    The article mentions that the IDI is similar to internal bus found on the Nehalem and later desktop processor. IDI here is mentioned as a point-to-point interconnect where as everything is linked via a ring bus in recent Core processors. Of course you can loop multipe point-to-point interfaces into a loop but the article's wording allows for other topologies.

    For example, each Silvermont module could have its down dedicated point-to-point link to the system agent. In Nehalem, the system against logically appears as another hop in the internal ring bus.
  • Exophase - Monday, May 6, 2013 - link

    Small correction:

    "Remember that with the first version of Atom, Intel enabled the fusion of load-op-store and load-op-execute instructions. Instead of these instruction combinations decoding into three and two micro-ops respectively, they would be fused post-fetch and treated like single operations throughout the entire pipeline."

    Atom (the current one anyway) doesn't have instruction (macro-op) fusion. It does handle load + op and load + op + store are one issue down the pipeline but they still came from single instructions that are a natural part of the x86 ISA. While these may be considered fused micro-ops in Intel's other CPUs that terminology doesn't fit Atom.

    These operations do need multiple instructions on most more RISCy ISAs like ARM. But the same is true the other way around (notably, 3 address arithmetic). I very much doubt you'll find typical x86 programs only need 2 instructions for every 3 ARM instructions on average, or at least any papers I've seen that measure micro-ops vs instructions on high-end CPUs are nowhere close to 1.5 (and a uop isn't generally more powerful than an ARM instruction, sometimes less when you consider two are needed for a store). But there are lots of other places that caused stalls on Atom that weren't related to decode, that it's easy to see how you could still gain a lot of perf/MHz without increasing it - as Bobcat and Jaguar have shown. All the details here do seem to point to a Bobcat-like design only with a much lower L2 cache latency and branch mispredict penalty which can only help more.
  • Anand Lal Shimpi - Monday, May 6, 2013 - link

    Er you're very correct. Atom doesn't break these instructions down further, they're treated like single ops throughout the pipeline. I've updated the section. Thank you!

Log in

Don't have an account? Sign up now