The Silvermont Module and Caches

Like AMD’s Bobcat and Jaguar designs, Silvermont is modular. The default Silvermont building block is a two-core/two-thread design. Each core is equally capable and there’s no shared execution hardware. Silvermont supports up to 8-core configurations by placing multiple modules in an SoC.

 

Each module features a shared 1MB L2 cache, a 2x increase over the core:cache ratio of existing Atom based processors. Despite the larger L2, access latency is reduced by 2 clocks. The default module size gives you clear indication as to where Intel saw Silvermont being most useful. At the time of its inception, I doubt Intel anticipated such a quick shift to quad-core smartphones otherwise it might’ve considered a larger default module size.

L1 cache sizes/latencies haven’t changed. Each Silvermont core features a 32KB L1 data cache and 24KB L1 instruction cache.

Silvermont Supports Independent Core Frequencies: Vindication for Qualcomm?

In all Intel Core based microprocessors, all cores are tied to the same frequency - those that aren’t in use are simply shut off (power gated) to save power. Qualcomm’s multi-core architecture has always supported independent frequency planes for all CPUs in the SoC, something that Intel has always insisted was a bad idea. In a strange turn of events, Intel joins Qualcomm in offering the ability to run each core in a Silvermont module at its own independent frequency. You could have one Silvermont core running at 2.4GHz and another one running at 1.2GHz. Unlike Qualcomm’s implementation, Silvermont’s independent frequency planes are optional. In a split frequency case, the shared L2 cache always runs at the higher of the two frequencies. Intel believes the flexibility might be useful in some low cost Silvermont implementations where the OS actively uses core pinning to keep threads parked on specific cores. I doubt we’ll see this on most tablet or smartphone implementations of the design.

From FSB to IDI

Atom and all of its derivatives have a nasty secret: they never really got any latency benefits from integrating a memory controller on die. The first implementation of Atom was a 3-chip solution, with the memory controller contained within the North Bridge. The CPU talked to the North Bridge via a low power Front Side Bus implementation. This setup should sound familiar to anyone who remembers Intel architectures from the late 90s up to the mid 2000s. In pursuit of integration, Intel eventually brought the memory controller and graphics onto a single die. Historically, bringing the memory controller onto the same die as the CPU came with a nice reduction in access latency - unfortunately Atom never enjoyed this. The reasoning? Atom never ditched the FSB interface.

Even though Atom integrated a memory controller, the design logically looked like it did before. Integration only saved Intel space and power, it never granted it any performance. I suspect Intel did this to keep costs down. I noticed the problem years ago but completely forgot about it since it’s been so long. Thankfully, with Silvermont the FSB interface is completely gone.

Silvermont instead integrates the same in-die interconnect (IDI) that is used in the big Core based processors. Intel’s IDI is a lightweight point to point interface that’s far lower overhead than the old FSB architecture. The move to IDI and the changes to the system fabric are enough to improve single threaded performance by low double digits. The gains are even bigger in heavily threaded scenarios.

Another benefit of moving away from a very old FSB to IDI is increased flexibility in how Silvermont can clock up/down. Previously there were fixed FSB:CPU ratios that had to be maintained at all times, which meant the FSB had to be lowered significantly when the CPU was running at very low frequencies. In Silvermont, the IDI and CPU frequencies are largely decoupled - enabling good bandwidth out of the cores even at low frequency levels.

The System Agent

Silvermont gains an updated system agent (read: North Bridge) that’s much better at allowing access to main memory. In all previous generation Atom architectures, virtually all memory accesses had to happen in-order (Clover Trail had some minor OoO improvements here). Silvermont’s system agent now allows reordering of memory requests coming in from all consumers/producers (e.g. CPU cores, GPU, etc...) to optimize for performance and quality of service (e.g. ensuring graphics demands on memory can regularly pre-empt CPU requests when necessary).

ISA, IPC & Frequency SoCs and Graphics, Penryn-Class Performance
Comments Locked

174 Comments

View All Comments

  • Jumangi - Monday, May 6, 2013 - link

    Let me know when Intel has something of actual substance to show and not just bunch of marketing/hype focused Powerpoint slides. ARM continues to delivers solid performance gains year after year with low power usage...Intel says yea we'll will get around to updating our 5 year old design...eventually we promise...yawn...
  • Krysto - Monday, May 6, 2013 - link

    Great point. Intel keeps promising how awesome they will be when they launch their new "mobile" chip, and at always it's ALWAYS disappointing, because in the mean time ARM chips keep shipping on their merry way, and keep improving. Fast.
  • A5 - Monday, May 6, 2013 - link

    Eh. A15 wasn't exactly a home run. The performance is good for what it is, but they overshot their TDP targets big time.
  • saurabhr8here - Monday, May 6, 2013 - link

    A15 wasn't a home run because it has been developed on an early bleeding edge technology. As the process technology matures and the design is optimized for the process, the power/performance numbers will improve.
  • DanNeely - Tuesday, May 7, 2013 - link

    A15's problem isn't overshooting TDP targets; it's that it was originally designed for use in entry level NASes and other similar level embedded systems/micro servers. A few extra watts for better CPU performance isn't a big problem there.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    Exactly. A15 was not initially designed for smartphones.
  • Wilco1 - Tuesday, May 7, 2013 - link

    That's not correct, ARM has said from the early announcements that it would go into mobiles at lower frequencies and core counts. Of course both core counts and frequencies turned out to be higher than originally expected, so power consumption is higher too. The Exynos 5250 appears to be released quickly in order to be first to market. The Octa core is far more tuned and will do better. NVidia has stated Tegra 4 uses 40% less power than Tegra 3 at equivalent performance levels.
  • Krysto - Monday, May 6, 2013 - link

    Let's do a recap. Performance is as high as Cortex A15...a chip launched in 2012.

    GPU performance is where iPad 4 was...in 2012.

    They are doing their benchmarks against last-gen ARM chips...okay.

    Intel Silvermont is expected late 2013/early 2014.

    Yeah...it's obviously so competitive! NOT.

    By the time Intel Silvermont arrives in smartphones (Merrifield), we will see 20nm ARMv8 chips in smartphones, already shipping. Good luck, Intel, another hit and a miss.

    As for what you said that Silvermont is conservative because they don't want to basically cannibalize Haswell - that's EXACTLY Intel's biggest problem right now. Their conflict of interest between the low-end, unprofitable Atom division, with the high-end very profitable Core division.

    This is exactly what killed their Xscale division, too. And it's what will kill Intel in the end. Because Intel will have to make Atom compete *whether they want to or not*. ARM chips are going to go higher and higher performance and become "good enough" for most everything. What is Intel going to do then? They'll have to keep up, which will slowly eliminate their *profitable* Core chips from the market. And what then? Survive on $20 chips with a dozen competitors? This is going to be very interesting for Intel in the next few years - and not in a good way, especially with a brand new CEO.
  • Kjella - Monday, May 6, 2013 - link

    It's been four months of 2013, how many quad-core ARM processors have launched since 2012? They're comparing against what is out now (if they were able to compare against unreleased ARM processors there'd be something very wrong) and beating them, not sure where your reading comprehension failed there. Looks to me like they're ready for a clash of the titans around year's end. Also 1-5W chips don't compete much with 15-85W Haswells no matter what, AMD is dying fast and people need their x86 computers so whatever. Reminds me of all the posts that say Windows is sooooooo dead.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    AMD is making a lot of money right now.

Log in

Don't have an account? Sign up now