Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

Name: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Item: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Author: Anand Lal Shimpi

by Anand Lal Shimpi on May 6, 2013 1:00 PM EST

Posted in
CPUs
Intel
Silvermont
SoCs

174 Comments | Add A Comment

174 Comments

The Silvermont Module and Caches

Like AMD’s Bobcat and Jaguar designs, Silvermont is modular. The default Silvermont building block is a two-core/two-thread design. Each core is equally capable and there’s no shared execution hardware. Silvermont supports up to 8-core configurations by placing multiple modules in an SoC.

Each module features a shared 1MB L2 cache, a 2x increase over the core:cache ratio of existing Atom based processors. Despite the larger L2, access latency is reduced by 2 clocks. The default module size gives you clear indication as to where Intel saw Silvermont being most useful. At the time of its inception, I doubt Intel anticipated such a quick shift to quad-core smartphones otherwise it might’ve considered a larger default module size.

L1 cache sizes/latencies haven’t changed. Each Silvermont core features a 32KB L1 data cache and 24KB L1 instruction cache.

Silvermont Supports Independent Core Frequencies: Vindication for Qualcomm?

In all Intel Core based microprocessors, all cores are tied to the same frequency - those that aren’t in use are simply shut off (power gated) to save power. Qualcomm’s multi-core architecture has always supported independent frequency planes for all CPUs in the SoC, something that Intel has always insisted was a bad idea. In a strange turn of events, Intel joins Qualcomm in offering the ability to run each core in a Silvermont module at its own independent frequency. You could have one Silvermont core running at 2.4GHz and another one running at 1.2GHz. Unlike Qualcomm’s implementation, Silvermont’s independent frequency planes are optional. In a split frequency case, the shared L2 cache always runs at the higher of the two frequencies. Intel believes the flexibility might be useful in some low cost Silvermont implementations where the OS actively uses core pinning to keep threads parked on specific cores. I doubt we’ll see this on most tablet or smartphone implementations of the design.

From FSB to IDI

Atom and all of its derivatives have a nasty secret: they never really got any latency benefits from integrating a memory controller on die. The first implementation of Atom was a 3-chip solution, with the memory controller contained within the North Bridge. The CPU talked to the North Bridge via a low power Front Side Bus implementation. This setup should sound familiar to anyone who remembers Intel architectures from the late 90s up to the mid 2000s. In pursuit of integration, Intel eventually brought the memory controller and graphics onto a single die. Historically, bringing the memory controller onto the same die as the CPU came with a nice reduction in access latency - unfortunately Atom never enjoyed this. The reasoning? Atom never ditched the FSB interface.

Even though Atom integrated a memory controller, the design logically looked like it did before. Integration only saved Intel space and power, it never granted it any performance. I suspect Intel did this to keep costs down. I noticed the problem years ago but completely forgot about it since it’s been so long. Thankfully, with Silvermont the FSB interface is completely gone.

Silvermont instead integrates the same in-die interconnect (IDI) that is used in the big Core based processors. Intel’s IDI is a lightweight point to point interface that’s far lower overhead than the old FSB architecture. The move to IDI and the changes to the system fabric are enough to improve single threaded performance by low double digits. The gains are even bigger in heavily threaded scenarios.

Another benefit of moving away from a very old FSB to IDI is increased flexibility in how Silvermont can clock up/down. Previously there were fixed FSB:CPU ratios that had to be maintained at all times, which meant the FSB had to be lowered significantly when the CPU was running at very low frequencies. In Silvermont, the IDI and CPU frequencies are largely decoupled - enabling good bandwidth out of the cores even at low frequency levels.

The System Agent

Silvermont gains an updated system agent (read: North Bridge) that’s much better at allowing access to main memory. In all previous generation Atom architectures, virtually all memory accesses had to happen in-order (Clover Trail had some minor OoO improvements here). Silvermont’s system agent now allows reordering of memory requests coming in from all consumers/producers (e.g. CPU cores, GPU, etc...) to optimize for performance and quality of service (e.g. ensuring graphics demands on memory can regularly pre-empt CPU requests when necessary).

ISA, IPC & Frequency SoCs and Graphics, Penryn-Class Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

174 Comments

View All Comments

Hector2 - Friday, May 17, 2013 - link
There are only 3 companies right now left in the world who have the muscle and volume to afford high tech fabs -- Intel, Samsung & TSMC. And Intel has about a 2 year lead. That means not just higher performance and lower power than before, but lower cost. Making the chips smaller multiplies the number of chips on a single, fixed-cost wafer and lowers costs. If the chip area is 1/2, the costs to make it are about 1/2 as well. 22nm tech gives Intel faster chips with less power than their competition. 14nm hits it out of the park.
BMNify - Wednesday, June 5, 2013 - link
You're absolutely wrong about "lower cost". x86 requires more die area. The process is more volatile (more failed wafers).

If we combine the 2 above factors with better performance, lower power consumption and toss in a lack of experience we get GT3e. A technological marvel that few (OEMs) want.
BMNify - Wednesday, June 5, 2013 - link
Spot on Krysto - It's Intel's process advantage that is shining through. Soon they'll hit the point of diminishing returns and/or the rest of the market will catch up/get close enough. When I see AMD at 32nm (Richland) having lower power draw at idle than Intel at 22nm (Ivy Bridge) I wonder how special their "secret sauce" actually is.

How long can Intel loss-lead? Probably as long as Xeon continues to make up for it but ARM is getting into the server market now too (looking forward to AMD and Calexda ARM SoCs for the server market). Should be interesting in 3-5 years
TheinsanegamerN - Monday, August 26, 2013 - link
only issue, though, is when you put that richland chip under load. all of a sudden, intel is using much less power.
t.s. - Monday, May 6, 2013 - link
"The mobile market is far more competitive than the PC industry was back when Conroe hit. There isn’t just one AMD, but many competitors in the SoC space that are already very lean fast moving. There’s also the fact that Intel doesn’t have tremendous marketshare in ultra mobile."

Well, with their 'strategy' back then when facing AMD (http://news.bbc.co.uk/2/hi/8047546.stm), they surely'll win. :p
nunomoreira10 - Monday, May 6, 2013 - link
It´s kinda suspicious that there are many comparisons against arm but none against Amd jaguar or even bobcat.
jaguar will probably be a much better tablet cpu and gpu, while intel competes on the phone market.
Khato - Monday, May 6, 2013 - link
Which AMD Jaguar/Bobcat SKU runs at 1.5 watts? They aren't included in the comparison because they're a markedly higher power level.
nunomoreira10 - Monday, May 6, 2013 - link
they will both be used on fan-less tablet designs...
extide - Tuesday, May 7, 2013 - link
Totally different markets. Jaguar/Bobcat will likely line up next to low end Core/Haswell, not an Atom/Silvermont
Penti - Tuesday, May 7, 2013 - link
Both will sadly be way to underpowered when it comes to the GPU, and that matters greatly on general OS's and applications like running a desktop OS X or Windows (or GNU/Linux) machine. You won't really be able to game on them at all as it's not smartphone games people want to run. GPGPU won't really be fast enough for anything and we talk about ~100-200 GFLOPs GPU-power on the AMD side for what is essentially a full blown computer.

Intel is clearly targeting the phone market. Something AMD/ATI divested from years back with their mobile GPU tech going to Qualcomm (Adreno, which isn't Radeon-based) and Broadcom. ATIs/AMDs mobile GPU-tech was before that previously licensed to or used together with the likes of Intel (PXA/XScale – not integrated though), Samsung and Freescale among others. Their technology already is the mainstay of the mobile business and was departed from the company but in effect their technology know how was successful in the market without their leadership so why would they compete with that, of course they wouldn't.

AMD simply has not and will not likely any time soon invest in an alternate route to dominate their own part of the smartphone/ARM-tablet market while Intel has with integrated designs replacing the custom ARMv5TE design. AMD going after ARM-business is different since they will license the core and their manufacturer GloFo already does manufactures and even offers hard macros for ARM-designs that they sell a bunch of to other customers already. It's also going after other embedded fields and the emerging ARM-server/appliance space all without designing custom cores.

While PXA (Intel) was quite successful in the market, moving to x86 and doing away with stuff like ARM-based network processors, raid-processors allows Intel to focus on delivering great support for modern ISA across all sorts of devices, while it didn't make it into phones (until lately) like PXA which continued to power Blackberrys under Marvell, was the main Windows Mobile platform for years after Intels departure and so on it was able to become a multimediaplatform, and a widely adopted chip for embedded use, driving NAS-devices and the like. Thanks to the Intel purchase of Infineons Wireless portfolio including many popular 3G radios/modems and them forming a new wireless division their actual business and sales in the mobile market is also much higher than when they still had their custom PXA/XScale lineup. Plus they couldn't have competed with their XScale lineup without designing new ARM-ISA compatible cores/designs to be able to match Cortex A8, A9, A7, A15, Krait 600 etc. Plus puts them in a much better place to be a wireless/terminal supplier when they can support customers who want advanced wireless modems/baseband, Application processors, bt, wifi etc. While Nvidia will have Tegra 4i with integrated modem AMD couldn't offer anything similar as they have no team capable of producing radio baseband. Having modern compilers and x86-ISA sure makes it convenient now for Intel, as well as integrating their own GPU, just licensing ARM Ltd designs wouldn't have put them in a better position to continue their presence in the mobile field. They have basically developed and scaled their desktop GNU/Linux drivers in the Linux Kernel, added mobile features and so on years before they put the hardware and can leverage that software in mobile platforms (Android) but it makes sense and they don't have to rely on IP cores and third party drivers for graphics with the coming Bay Trail. They couldn't have shared that much tech if they were anything else then x86. Of course AMD won't be in the same place and scaling down a GPU designed for thousands of stream processors and Windows/OS X drivers to put it into phones is not the same. It would be awful if it is just scaled down to fit the power usage, even if Nvidia has kinda custom mobile gpu it's still worse then the competitors which has no presence in desktop computing. Drivers for QNX, Android/Linux, iOS etc is not the same as with Windows either. It takes a long time to start over when they did away with an okay solution (z460), and they haven't but other have and thats fine, there is more competition here then elsewhere. x86 is no stopper for Intel.

Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

The Silvermont Module and Caches

Silvermont Supports Independent Core Frequencies: Vindication for Qualcomm?

From FSB to IDI

The System Agent

Post Your Comment

174 Comments

View All Comments

Hector2 - Friday, May 17, 2013 - link

BMNify - Wednesday, June 5, 2013 - link

BMNify - Wednesday, June 5, 2013 - link

TheinsanegamerN - Monday, August 26, 2013 - link

t.s. - Monday, May 6, 2013 - link

nunomoreira10 - Monday, May 6, 2013 - link

Khato - Monday, May 6, 2013 - link

nunomoreira10 - Monday, May 6, 2013 - link

extide - Tuesday, May 7, 2013 - link

Penti - Tuesday, May 7, 2013 - link

Log in

Don't have an account? Sign up now