Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

Name: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Item: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Author: Anand Lal Shimpi

by Anand Lal Shimpi on May 6, 2013 1:00 PM EST

Posted in
CPUs
Intel
Silvermont
SoCs

174 Comments | Add A Comment

174 Comments

ISA

The original Atom processor enabled support for Merom/Conroe-class x86 instructions, it lacked SSE4 support due to die/power constraints; that was at 45nm, at 22nm there’s room for improvement. Silvermont brings ISA compatibility up to Westmere levels (Intel’s 2010 Core microprocessor architecture). There’s now support for SSE4.1, SSE4.2, POPCNT and AES-NI.

Silvermont is 64-bit capable, although it is up to Intel to enable 64-bit support on various SKUs similar to what we’ve seen with Atom thus far.

IPC and Frequency

The combination of everything Intel is doing on the IPC front give it, according to Intel, roughly the same single threaded performance as ARM’s Cortex A15. We’ve already established that the Cortex A15 is quite good, but here’s where Silvermont has a chance to pull ahead. We already established that Intel’s 22nm process can give it anywhere from a 18 - 37% performance uplift at the same power consumption. IPC scaling gives Silvermont stable footing, but the ability to run at considerably higher frequencies without drawing more power is what puts it over the top.

Intel isn’t talking about frequencies at this point, but I’ve heard numbers around 2 - 2.4GHz thrown around a lot. Compared to the 1.6 - 2GHz range we currently have with Bonnell based silicon, you can see how the performance story gets serious quickly. Intel is talking about a 50% improvement in IPC at the core, combine that with a 30% improvement in frequency without any power impact and you’re now at 83% better performance potentially with no power penalty. There are other advantages at the SoC level that once factored in drive things even further.

Real Turbo Modes & Power Management

Previous Atom based mobile SoCs had a very crude version of Intel’s Turbo Boost. The CPU would expose all of its available P-states to the OS and as it became thermally limited, Intel would clamp the max P-state it would expose to the OS. Everything was OS-driven and previous designs weren’t able to capitalize on unused thermal budget elsewhere in the SoC to drive up frequency in active parts of chip. This lack of flexibility even impacted the SoC at the CPU core level. When running a single threaded app, Medfield/Clover Trail/et al couldn’t take thermal budget freed up by the idle core and use it to drive the frequency of the active core. Previous Atom implementations were basically somewhere in the pre-Nehalem era of thermal/boost management. From what I’ve seen, this is also how a lot of the present day ARM architectures work as well. At best, they vary what operating states they expose to the OS and clamp max frequency depending on thermals. To the best of my knowledge, none of the SoC vendors today actively implement modern big-core-Intel-like frequency management. Silvermont fixes this.

Silvermont, like Nehalem and the architectures that followed, gets its own power control unit that monitors thermals and handles dynamic allocation of power budget to various blocks within the SoC. If I understand this correctly, Silvermont should expose a maximum base frequency to the OS but depending on instruction mix and available TDP it can turbo up beyond that maximum frequency as long as it doesn’t exceed TDP. Like Sandy Bridge, Silvermont will even be able to exceed TDP for a short period of time if the package temperature is low enough to allow it. Finally, Silvermont’s turbo can also work across IP blocks: power budget allocated to the GPU can be transferred to the CPU cores (and vice versa).

By big-core standards (especially compared to Haswell), Silvermont’s turbo isn’t all that impressive but compared to how things are currently handled in the mobile space this should be a huge step forward.

On the power management side, getting in and out of C6 should be a bit quicker. There's also a new C6 mode with cache state retention.

Sensible Scaling: OoO Atom Remains Dual-Issue The Silvermont Module and Caches

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

174 Comments

View All Comments

tech4real - Tuesday, May 7, 2013 - link
"Absolute performance"? Do we consider power constraint here at all? Atom is optimized for power-efficiency. All the current information I've seen so far suggest Silvermont will outperform A15 by a large margin in terms of power efficiency. If we throw away power constraint, Intel has Core to take care of that.
Wilco1 - Tuesday, May 7, 2013 - link
I was talking about peak performance, but yes, power consumption matters too. What we've seen so far is Intel marketing suggesting that in 6-9 months time Silvermont will be more efficient than A15 was 12 months earlier. However that's not what Silvermont will have to compete with. At the end of this year A15 will have had 2 process shrinks down to 20nm in addition to a lot of tuning, so it will be far more efficient than it was 12 months ago. And A15 is just one example, Apple, QC and ARM will have new cores as well. It's reasonable to say that Intel will finally be able to compete with Silvermont, but it is far from clear that it is the overall winner like their marketing claims.
tech4real - Wednesday, May 8, 2013 - link
TSMC's 20nm process is still in the works, your Q4'13 volumn production estimate seems way too optimistic, especially considering TSMC's pain in 28nm ramp. Also 28nm->20nm shrink without finfet significantly reduces its benefit.
Wilco1 - Wednesday, May 8, 2013 - link
TSMC have learnt from the 28nm problems. They appear very aggressive this time, and so far the reports are they are 2 months ahead of schedule. Even if it ends up delayed to Q2'14 it's still around the same time Intel is planning to come out with Silvermont phones. The gains are not as large as with FinFETs, but enough to reduce power significantly.
tech4real - Wednesday, May 8, 2013 - link
my understanding is Q2'14 volume production with high yield is almost TSMC 20nm best case scenario. Of course, the term "high yield" is such a subjective thing vendors love to manipulate with almost infinite freedom...
zeo - Wednesday, May 8, 2013 - link
TSMC 20nm isn't set up for such optimization, but rather focused on cost reductions... The number of nodes, variations supported, etc will be fewer than they did with 28nm as they want to avoid the problems that caused the 28nm delays and that has resulted in a much more streamlined setup.

While power leakage issues increase as FAB size is decreased... So without a solution like FinFET the power efficiency would be increasingly harder to keep it where it is, let alone reduce it...

It's one of the reasons why ARM is trying to push other options like Big.LITTLE to boost operational efficiencies and not rely as much on FAB improvements.

While it's also why not all ARM SoCs have moved to 28nm yet as for many the power leakage was still too much of a issue for their designs to make the switch right away, so there could be additional delays for 20nm releases.

Though ARM should get FinFET in time for for the 64bit release... but by that time Intel would be on its way to 14nm...
Jaybus - Wednesday, May 8, 2013 - link
Think of it as 2-wide x86 vs. 3-wide RISC. Rather than translating the x86 microcoded instruction into 2 or 3 RISC-like instructions, Intel's decode keeps it a single instruction down the pipeline. The RSIC architecture has to decode more instructions, so needs the 3-wide to keep up with the x86 2-wide.

The point about the frequency scaling is this. The tri-gate design has a gate on top of 2 vertical gates. This gives it 3x the surface area as compared to FinFET. The greater surface area allows more electrons to flow within a given area of the die, and that allows a greater range of voltages and/or frequencies for which it can operate efficiently.
Wilco1 - Thursday, May 9, 2013 - link
Eventhough macro-ops helps decode, they need to be expanded before they are executed. So in terms of execution, macro-ops don't help. Also as I mentioned in an earlier post, most ARMs also support macro-ops, allowing a 2-way ARM to behave like a 4-way. So macro-ops don't give x86 an advantage over RISC.
jemima puddle-duck - Monday, May 6, 2013 - link
Without wishing to be overly cynical, Anandtech has a history of 'NOW Intel will win the mobile war' articles, which get recycled then forgotten in time for the next launch. It's all very clever stuff, but curiously underwhelming also.
Roffles12 - Monday, May 6, 2013 - link
I don't remember reading any 'NOW Intel will win the mobile war' articles on Anandtech. Perhaps your perception is skewed. Intel articles are typically of a technical nature discussing the inner workings of the architecture and fab process or discussing benchmarks. Intel is really the only company so completely open about how their technology works, so why not make it a point of discussion on a website on a website dedicated to the subject? If your head is clouded by fud from competing companies and the constantly humming rumor mill, maybe you need to back off for a while. At the end of the day, it's up to you to digest this information and form an opinion.

Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

ISA

IPC and Frequency

Real Turbo Modes & Power Management

Post Your Comment

174 Comments

View All Comments

tech4real - Tuesday, May 7, 2013 - link

Wilco1 - Tuesday, May 7, 2013 - link

tech4real - Wednesday, May 8, 2013 - link

Wilco1 - Wednesday, May 8, 2013 - link

tech4real - Wednesday, May 8, 2013 - link

zeo - Wednesday, May 8, 2013 - link

Jaybus - Wednesday, May 8, 2013 - link

Wilco1 - Thursday, May 9, 2013 - link

jemima puddle-duck - Monday, May 6, 2013 - link

Roffles12 - Monday, May 6, 2013 - link

Log in

Don't have an account? Sign up now