SoCs and Graphics

Intel isn’t talking about implementations of Silvermont today other than to say that it will show up in smartphones (Merrifield), tablets (Baytrail), automotive (unannounced), communications infrastructure products (Rangeley) and microservers (Avoton). Baytrail, the tablet implementation of Silvermont, will be available by the end of this year running both Windows 8 (8.1/Blue?) and Android. Silvermont based Merrifield phones will show up early in 2014.

What we know about Baytrail is that it will be a quad-core implementation of Silvermont paired with Intel’s own Gen 7 graphics. Although we don’t know clock speeds, we do know that Baytrail’s GPU core will feature 4 EUs - 1/4 the number used in Ivy Bridge’s Gen7 implementation (Intel HD 4000). Ultimately we can’t know how fast the GPU will be until we know clock speeds, but I wouldn’t be too surprised to see something at or around where the iPad 4’s GPU is today. Given Intel’s recent announcements around Iris and Iris Pro, it’s clear that the mobile team hasn’t yet had the graphics wakeup call that the Core team just got - but I suspect the Atom group will get there sooner rather than later. Intel’s eDRAM approach to scaling Haswell graphics (and CPU) performance has huge implications in mobile. I wouldn’t expect eDRAM enabled mobile SoCs based on Silvermont, but I wouldn’t be too surprised to see something at 14nm.

Penryn-Class Performance

When Atom first came out, I put its CPU performance in perspective by comparing it to older Pentium M based notebooks. It turned out that a 1.6GHz Atom performed similarly to a 1.2GHz Pentium M. So how does Silvermont stack up in PC notebook terms?

On single threaded performance, you should expect a 2.4GHz Silvermont to perform like a 1.2GHz Penryn. To put it in perspective of actual systems, we’re talking about around the level of performance of an 11-inch Core 2 Duo MacBook Air from 2010. Keep in mind, I’m talking about single threaded performance here. In heavily threaded applications, a quad-core Silvermont should be able to bat even further up the Penryn line. Intel is able to do all of this with only a 2-wide machine (lower IPC, but much higher frequency thanks to 22nm).

There’s no doubt in my mind that a Baytrail Android tablet will deliver amazing performance, the real unknown is whether or not a Baytrail Windows 8 detachable/convertible will be fast enough to deliver a good enough legacy Windows experience. I suspect it’ll take Airmont before we really get there by my standards, but it’ll be close this round for sure.

What’ll really be interesting to see is how Silvermont fares in smartphones. Max clock speeds should be lower than what’s possible in a tablet, but not by all that much thanks to good power management. When viewed in that light, I don’t know that there’s a more exciting mobile architecture announced at this point. The ability to deliver 2010 11-inch MacBook Air performance in a phone is insane.

The Silvermont Module and Caches Tablet Expectations & Performance
Comments Locked

174 Comments

View All Comments

  • xTRICKYxx - Tuesday, May 7, 2013 - link

    You're right. Intel has nothing to show at all.... Its not like they have the most powerful mobile and desktop consumer processors available.
  • R0H1T - Tuesday, May 7, 2013 - link

    Yeah, now sit & watch that market(x86) die a slow death at the hands of mobile/tablets that are powered by "good enough" ARM which doesn't need teraflop level of performance to sell their stuff unlike Intel !
  • misaki - Monday, May 6, 2013 - link

    Wow, clearly you are a new reader. This is an architecture overview, not a performance article, which means the information HAS to come from Intel. They have done these type of articles with every architecture redesign since the 90's.

    When chips are available to test that is when the real world performance articles will come out.
  • Ortanon - Monday, May 6, 2013 - link

    SERIOUSLY.
  • kyuu - Monday, May 6, 2013 - link

    Yes, but a lot of performance claims are being made in the article, and Anand really seemed to just be taking Intel's marketing speak for gospel. That's how it read, at least.
  • xTRICKYxx - Tuesday, May 7, 2013 - link

    Not really. He clearly states to take the graphs cautiously. Also Intel may be slightly misleading, but nothing in the graphs are lies. They chose the best possible scenario for the greatest advantage.
  • R0H1T - Tuesday, May 7, 2013 - link

    Like how they(AT) claimed Intel's "SDP" was superior after stress testing an Exynos Octa, yup loved that fairytale !
  • Kevin G - Monday, May 6, 2013 - link

    The article mentions that the IDI is similar to internal bus found on the Nehalem and later desktop processor. IDI here is mentioned as a point-to-point interconnect where as everything is linked via a ring bus in recent Core processors. Of course you can loop multipe point-to-point interfaces into a loop but the article's wording allows for other topologies.

    For example, each Silvermont module could have its down dedicated point-to-point link to the system agent. In Nehalem, the system against logically appears as another hop in the internal ring bus.
  • Exophase - Monday, May 6, 2013 - link

    Small correction:

    "Remember that with the first version of Atom, Intel enabled the fusion of load-op-store and load-op-execute instructions. Instead of these instruction combinations decoding into three and two micro-ops respectively, they would be fused post-fetch and treated like single operations throughout the entire pipeline."

    Atom (the current one anyway) doesn't have instruction (macro-op) fusion. It does handle load + op and load + op + store are one issue down the pipeline but they still came from single instructions that are a natural part of the x86 ISA. While these may be considered fused micro-ops in Intel's other CPUs that terminology doesn't fit Atom.

    These operations do need multiple instructions on most more RISCy ISAs like ARM. But the same is true the other way around (notably, 3 address arithmetic). I very much doubt you'll find typical x86 programs only need 2 instructions for every 3 ARM instructions on average, or at least any papers I've seen that measure micro-ops vs instructions on high-end CPUs are nowhere close to 1.5 (and a uop isn't generally more powerful than an ARM instruction, sometimes less when you consider two are needed for a store). But there are lots of other places that caused stalls on Atom that weren't related to decode, that it's easy to see how you could still gain a lot of perf/MHz without increasing it - as Bobcat and Jaguar have shown. All the details here do seem to point to a Bobcat-like design only with a much lower L2 cache latency and branch mispredict penalty which can only help more.
  • Anand Lal Shimpi - Monday, May 6, 2013 - link

    Er you're very correct. Atom doesn't break these instructions down further, they're treated like single ops throughout the pipeline. I've updated the section. Thank you!

Log in

Don't have an account? Sign up now