Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

Name: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Item: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Author: Anand Lal Shimpi

by Anand Lal Shimpi on May 6, 2013 1:00 PM EST

Posted in
CPUs
Intel
Silvermont
SoCs

174 Comments | Add A Comment

174 Comments

ISA

The original Atom processor enabled support for Merom/Conroe-class x86 instructions, it lacked SSE4 support due to die/power constraints; that was at 45nm, at 22nm there’s room for improvement. Silvermont brings ISA compatibility up to Westmere levels (Intel’s 2010 Core microprocessor architecture). There’s now support for SSE4.1, SSE4.2, POPCNT and AES-NI.

Silvermont is 64-bit capable, although it is up to Intel to enable 64-bit support on various SKUs similar to what we’ve seen with Atom thus far.

IPC and Frequency

The combination of everything Intel is doing on the IPC front give it, according to Intel, roughly the same single threaded performance as ARM’s Cortex A15. We’ve already established that the Cortex A15 is quite good, but here’s where Silvermont has a chance to pull ahead. We already established that Intel’s 22nm process can give it anywhere from a 18 - 37% performance uplift at the same power consumption. IPC scaling gives Silvermont stable footing, but the ability to run at considerably higher frequencies without drawing more power is what puts it over the top.

Intel isn’t talking about frequencies at this point, but I’ve heard numbers around 2 - 2.4GHz thrown around a lot. Compared to the 1.6 - 2GHz range we currently have with Bonnell based silicon, you can see how the performance story gets serious quickly. Intel is talking about a 50% improvement in IPC at the core, combine that with a 30% improvement in frequency without any power impact and you’re now at 83% better performance potentially with no power penalty. There are other advantages at the SoC level that once factored in drive things even further.

Real Turbo Modes & Power Management

Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. This lack of flexibility even impacted the SoC at the CPU core level. When running a single threaded app, Medfield/Clover Trail/et al couldn’t take thermal budget freed up by the idle core and use it to drive the frequency of the active core. Previous Atom implementations were basically somewhere in the pre-Nehalem era of thermal/boost management. From what I’ve seen, this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals. To the best of my knowledge, none of the SoC vendors today actively implement modern big-core-Intel-like frequency management. Silvermont fixes this.

Silvermont, like Nehalem and the architectures that followed, gets its own power control unit that monitors thermals and handles dynamic allocation of power budget to various blocks within the SoC. If I understand this correctly, Silvermont should expose a maximum base frequency to the OS but depending on instruction mix and available TDP it can turbo up beyond that maximum frequency as long as it doesn’t exceed TDP. Like Sandy Bridge, Silvermont will even be able to exceed TDP for a short period of time if the package temperature is low enough to allow it. Finally, Silvermont’s turbo can also work across IP blocks: power budget allocated to the GPU can be transferred to the CPU cores (and vice versa).

By big-core standards (especially compared to Haswell), Silvermont’s turbo isn’t all that impressive but compared to how things are currently handled in the mobile space this should be a huge step forward.

On the power management side, getting in and out of C6 should be a bit quicker. There's also a new C6 mode with cache state retention.

Sensible Scaling: OoO Atom Remains Dual-Issue The Silvermont Module and Caches

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

174 Comments

View All Comments

Homeles - Monday, May 6, 2013 - link
Yay confirmation bias!
R0H1T - Tuesday, May 7, 2013 - link
Nay, you fanboi(Intel's) much ?
powerarmour - Monday, May 6, 2013 - link
Have to agree, starting to get tired of these almost Intel PR based previews. No mention to how poor Intel's graphics drivers have consistently been over many many years.
Homeles - Monday, May 6, 2013 - link
"You can't make the ridiculous claims of 1.6x performance."

Sure you can. It was already a close race between a 5 year old architecture and a brand new one. The floodgates have opened -- this is 5 years of pent up performance gains from the largest R&D spender in the industry, on top of being on a significantly superior process for mobile devices.
Wilco1 - Monday, May 6, 2013 - link
Absolute performance of Silvermont cannot be higher than A15 or Bobcat, it's just 2-way OoO, has a single-issue in-order memory pipeline (no speculative execution of memory operations or dual issue of load-store like A15/Bobcat) and fairly small buffers in general. All in all it is more like A9 than A15 or Bobcat/Jaguar.
althaz - Monday, May 6, 2013 - link
Except that it certainly can (dependent on a lot of other factors)...

That said, I suspect it will only be faster at the same power level, not at the same frequency.
beginner99 - Tuesday, May 7, 2013 - link
That's covered in the article but I must admit I don't fully understand it. Anyway Anand writes about macro-op fusion and clearly states that because of this the 2-wide is misleading when directly comparing to ARM. My interpretation being that ARM doesn't have this and if your 2-wide CPU is running macro-ops with 2 instructions in them it's actually like 4-wide (but I guess this naive viewpoint of mine is completely wrong.
Wilco1 - Tuesday, May 7, 2013 - link
No, macro-ops don't make your CPU magically wider. For example Silvermont cannot actually execute 2 load+op instructions every cycle, and cannot even execute 1 read-modify-write every cycle...

Also note that most ARM CPUs do have similar capabilities, for example Cortex-A9 can execute 2 shifts and 2 ALU instructions every cycle, and loads and stores can have base update for free. So Anand is quite wrong claiming this is an advantage to Atom.

As I mentioned, the big bottleneck of Silvermont is it's single load/store unit. Typical code contains many loads and stores, and Cortex-A15 can execute these twice as fast as Silvermont.
Jaybus - Wednesday, May 8, 2013 - link
It can, however, execute 1 load and 1 store simultaneously, and that is its saving grace. That fits very well with code being executed in OoO fashion and why I doubt very much A15 is twice as fast executing typical code.
Wilco1 - Thursday, May 9, 2013 - link
No Silvermont can only execute 1 load or 1 store per cycle. A15 won't be twice as fast on typical code, but it will beat Silvermont on memory intensive code due to its single memory pipeline bottleneck.

Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

ISA

IPC and Frequency

Real Turbo Modes & Power Management

Post Your Comment

174 Comments

View All Comments

Homeles - Monday, May 6, 2013 - link

R0H1T - Tuesday, May 7, 2013 - link

powerarmour - Monday, May 6, 2013 - link

Homeles - Monday, May 6, 2013 - link

Wilco1 - Monday, May 6, 2013 - link

althaz - Monday, May 6, 2013 - link

beginner99 - Tuesday, May 7, 2013 - link

Wilco1 - Tuesday, May 7, 2013 - link

Jaybus - Wednesday, May 8, 2013 - link

Wilco1 - Thursday, May 9, 2013 - link

Log in

Don't have an account? Sign up now