Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

Name: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Item: Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
Author: Anand Lal Shimpi

by Anand Lal Shimpi on May 6, 2013 1:00 PM EST

Posted in
CPUs
Intel
Silvermont
SoCs

174 Comments | Add A Comment

174 Comments

The Silvermont Module and Caches

Like AMD’s Bobcat and Jaguar designs, Silvermont is modular. The default Silvermont building block is a two-core/two-thread design. Each core is equally capable and there’s no shared execution hardware. Silvermont supports up to 8-core configurations by placing multiple modules in an SoC.

Each module features a shared 1MB L2 cache, a 2x increase over the core:cache ratio of existing Atom based processors. Despite the larger L2, access latency is reduced by 2 clocks. The default module size gives you clear indication as to where Intel saw Silvermont being most useful. At the time of its inception, I doubt Intel anticipated such a quick shift to quad-core smartphones otherwise it might’ve considered a larger default module size.

L1 cache sizes/latencies haven’t changed. Each Silvermont core features a 32KB L1 data cache and 24KB L1 instruction cache.

Silvermont Supports Independent Core Frequencies: Vindication for Qualcomm?

In all Intel Core based microprocessors, all cores are tied to the same frequency - those that aren’t in use are simply shut off (power gated) to save power. Qualcomm’s multi-core architecture has always supported independent frequency planes for all CPUs in the SoC, something that Intel has always insisted was a bad idea. In a strange turn of events, Intel joins Qualcomm in offering the ability to run each core in a Silvermont module at its own independent frequency. You could have one Silvermont core running at 2.4GHz and another one running at 1.2GHz. Unlike Qualcomm’s implementation, Silvermont’s independent frequency planes are optional. In a split frequency case, the shared L2 cache always runs at the higher of the two frequencies. Intel believes the flexibility might be useful in some low cost Silvermont implementations where the OS actively uses core pinning to keep threads parked on specific cores. I doubt we’ll see this on most tablet or smartphone implementations of the design.

From FSB to IDI

Atom and all of its derivatives have a nasty secret: they never really got any latency benefits from integrating a memory controller on die. The first implementation of Atom was a 3-chip solution, with the memory controller contained within the North Bridge. The CPU talked to the North Bridge via a low power Front Side Bus implementation. This setup should sound familiar to anyone who remembers Intel architectures from the late 90s up to the mid 2000s. In pursuit of integration, Intel eventually brought the memory controller and graphics onto a single die. Historically, bringing the memory controller onto the same die as the CPU came with a nice reduction in access latency - unfortunately Atom never enjoyed this. The reasoning? Atom never ditched the FSB interface.

Even though Atom integrated a memory controller, the design logically looked like it did before. Integration only saved Intel space and power, it never granted it any performance. I suspect Intel did this to keep costs down. I noticed the problem years ago but completely forgot about it since it’s been so long. Thankfully, with Silvermont the FSB interface is completely gone.

Silvermont instead integrates the same in-die interconnect (IDI) that is used in the big Core based processors. Intel’s IDI is a lightweight point to point interface that’s far lower overhead than the old FSB architecture. The move to IDI and the changes to the system fabric are enough to improve single threaded performance by low double digits. The gains are even bigger in heavily threaded scenarios.

Another benefit of moving away from a very old FSB to IDI is increased flexibility in how Silvermont can clock up/down. Previously there were fixed FSB:CPU ratios that had to be maintained at all times, which meant the FSB had to be lowered significantly when the CPU was running at very low frequencies. In Silvermont, the IDI and CPU frequencies are largely decoupled - enabling good bandwidth out of the cores even at low frequency levels.

The System Agent

Silvermont gains an updated system agent (read: North Bridge) that’s much better at allowing access to main memory. In all previous generation Atom architectures, virtually all memory accesses had to happen in-order (Clover Trail had some minor OoO improvements here). Silvermont’s system agent now allows reordering of memory requests coming in from all consumers/producers (e.g. CPU cores, GPU, etc...) to optimize for performance and quality of service (e.g. ensuring graphics demands on memory can regularly pre-empt CPU requests when necessary).

ISA, IPC & Frequency SoCs and Graphics, Penryn-Class Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

174 Comments

View All Comments

Spunjji - Wednesday, May 8, 2013 - link
+1
chubbypanda - Monday, May 6, 2013 - link
The article is about yet to be relased platform. Obviously you could get better information if you work for Intel or its OEM partners. If you don't, Anand's writing is as good as they get.
Thrill92 - Tuesday, May 7, 2013 - link
But what's your point?
raptorious - Monday, May 6, 2013 - link
It seems like every subsequent Anandtech article about Intel that I read sounds more and more like an Intel Marketing slide deck. I think I'd believe that the absolute performance of Silvermont is better than Cortex A15, but I'm very skeptical that the perf/watt will actually be better at the TDP that we care about for a tablet. I have a very hard time believing that a 2-wide OoO architecture will get better IPC than a 3-wide one. In order to achieve better performance, you'd have to very aggressively scale frequency, and as we all know, perf/watt usually decreases as you scale frequency up (C*V^2*F). It MIGHT be better perf/watt in a phone, simply because with a 2-wide architecture, you can scale dynamic power much lower, but of course, then you can't make the ridiculous claims of 1.6x performance.
JarredWalton - Monday, May 6, 2013 - link
FWIW, Intel is willing to provide these detailed slide decks long in advance of the launch of their hardware. The other SoC vendors are far less willing to share information. If Apple, Qualcomm, or some other vendor put together a nice slide deck, I can guarantee we'd be writing about it.
B - Monday, May 6, 2013 - link
@JarredWalton, I completely agree with your assessment. I have listened to every Anandtech Podcast and repeatedly hear Anand and Brian Klug lament the lack of transparency with the other SOC vendors. Those two go through great lengths to get any meaningful information on the roadmaps of Apple, Qualcomm, et al. The bottom line is that currently Intel is accustomed to sharing more information than its peers in the mobile industry and I suspect your readership wants to know what's coming long before the product is released, and this will always include a speculative component.
beginner99 - Monday, May 6, 2013 - link
The intel slides basically say intel will have 8x better performance/watt. Now if you don't believe them, just half the numbers and you are at 4x, which is still huge...I believe it.

Medfield uses a basically 5 year old design on an older process!!! than current ARM offerings and is competitive in performance/watt (it's actually better already). The only thing is how efficient the GPU will be and even more important how expensive the whole SOC will be. So even if the performance and power data is correct, not guarantee it will succeed.

I do see why some don't like the article but I think Anand is just enthusiastic and lets be honest, AMD has no delivered anything to be enthusiastic about in years and has a history of misinformation on slides What intel disclosed on slides was usually more or less true in the past so they have more credit than AMD.
raptorious - Monday, May 6, 2013 - link
Showing 8x perf/watt or even 4x perf/watt from generation to generation might be possible by milking numbers, but across the board that is laughably impossible. You're talking about defying the laws of physics. This architecture isn't radically different from A15 or other designs, and the process improvements of 22 nm over 32 nm don't just magically give you 4x perf/watt. If you want to live in Intel's fairy tale land, go ahead.
JDG1980 - Monday, May 6, 2013 - link
Intel has far better fabs than anyone else. That alone gives them a huge advantage. The reason they've been doing so poorly up until now is that (as the article mentions) they've basically been stagnating with an Atom design dating back to 2004. Now that they've updated to a modern design, they should be able to beat their competitors decisively on the hardware side. Whether that will lead to design wins or not, who can say... they're pretty late to this particular game. But they can give it a good shot.
t.s. - Tuesday, May 7, 2013 - link
Yeah, right. Same with AMD. After they 'upgrade' their architecture from star to bulldozer, they automagically have a huge advantage. Remember, changing architecture doesn't necessary a good thing. Moreover for the 1st time you do the change.

Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile

The Silvermont Module and Caches

Silvermont Supports Independent Core Frequencies: Vindication for Qualcomm?

From FSB to IDI

The System Agent

Post Your Comment

174 Comments

View All Comments

Spunjji - Wednesday, May 8, 2013 - link

chubbypanda - Monday, May 6, 2013 - link

Thrill92 - Tuesday, May 7, 2013 - link

raptorious - Monday, May 6, 2013 - link

JarredWalton - Monday, May 6, 2013 - link

B - Monday, May 6, 2013 - link

beginner99 - Monday, May 6, 2013 - link

raptorious - Monday, May 6, 2013 - link

JDG1980 - Monday, May 6, 2013 - link

t.s. - Tuesday, May 7, 2013 - link

Log in

Don't have an account? Sign up now