The CPU

Medfield is the platform, Penwell is the SoC and the CPU inside Penwell is codenamed Saltwell. It's honestly not much different than the Bonnell core used in the original Atom, although it does have some tweaks for both power and performance.

Almost five years ago I wrote a piece on the architecture of Intel's Atom. Luckily (for me, not Intel), Atom's architecture hasn't really changed over the years so you can still look back at that article and have a good idea of what is at the core of Medfield/Penwell. Atom is still a dual-issue, in-order architecture with Hyper Threading support. The integer pipeline is sixteen stages long, significantly deeper than the Cortex A9's. The longer pipeline was introduced to help reduce Atom's power consumption by lengthening some of the decode stages and increasing cache latency to avoid burning through the core's power budget. Atom's architects, similar to those who worked on Nehalem, had the same 2:1 mandate: every new feature added to the processor's design had to deliver at least a 2% increase in performance for every 1% increase in power consumption.

Atom is a very narrow core as the diagram below will show:

 

There are no dedicated integer multiply or divide units, that's all shared with the FP hardware. Intel duplicated some resources (e.g. register files, queues) to enable Hyper Threading support, but stopped short of increasing execution hardware to drive up efficiency. The tradeoff seems to have worked because Intel is able to deliver performance better than a dual-core Cortex A9 from a single HT enabled core. Intel also lucks out because while Android is very well threaded, not all tasks will continually peg both cores in a dual-core A9 machine. At higher clock speeds (1.5GHz+) and with heavy multi-threaded workloads, it's possible that a dual-core Cortex A9 could outperform (or at least equal) Medfield but I don't believe that's a realistic scenario.

Architecturally the Cortex A9 doesn't look very different from Atom:

 

Here we see a dedicated integer multiply unit (shared with one of the ALU ports) but only a single port for FP/NEON. It's clear that the difference between Atom and the Cortex A9 isn't as obvious at the high level. Instead it's the lower level architectural decisions that gives Intel a performance advantage.

Where Intel is in trouble is if you look at the Cortex A15:

 

The A15 is a far more modern design, also out of order but much wider than A9. I fully expect that something A15-class can outperform Medfield, especially if the former is in a dual-core configuration. Krait falls under the A15-class umbrella so I believe Medfield has the potential to lose its CPU performance advantage within a couple of quarters.

Enhancements in Saltwell

Although the CPU core is mated to a 512KB L2 cache, there's a separate 256KB low power SRAM that runs on its own voltage plane. This ULP SRAM holds CPU state and data from the L2 cache when the CPU is power gated in the deepest sleep state. The reasoning for the separate voltage plane is simple. Intel's architects found that the minimum voltage for the core was limited by Vmin for the ULP SRAM. By putting the two on separate voltage planes it allowed Intel to bring the CPU core down to a lower minimum power state as Vmin for the L2 is higher than it is for the CPU core itself. The downside to multiple power islands is an increase in die area. Since Medfield is built on Intel's 32nm LP process while the company transitions to 22nm, spending a little more on die area to build more power efficient SoCs isn't such a big deal. Furthermore, Intel is used to building much larger chips, making Medfield's size a relative nonissue for the company.

The die size is actually very telling as it's a larger SoC than a Tegra 2 with two Cortex A9s despite only featuring a single core. Granted the rest of the blocks around the core are different, but it goes to show you that the CPU core itself (or number of cores) isn't the only determination of the die size of an SoC.

The performance tweaks come from the usual learnings that take place over the course of any architecture's lifespan. Some instruction scheduling restrictions have been lifted, memory copy performance is up, branch predictor size increased and some microcode flows run faster on Saltwell now.

Clock Speeds & Turbo

Medfield's CPU core supports several different operating frequencies and power modes. At the lowest level is its C6 state. Here the core and L2 cache are both power gated with their state is saved off in a lower power on-die SRAM. Total power consumption in C6 of the processor island is effectively zero. This isn't anything new, Intel has implemented similar technologies in desktops since 2008 (Nehalem) and notebooks since 2010 (Arrandale).

When the CPU is actually awake and doing something however it has a range of available frequencies: 100MHz all the way up to 1.6GHz in 100MHz increments.

The 1.6GHz state is a burst state and shouldn't be sustained for long periods of time, similar to how Turbo Boost works on Sandy Bridge desktop/notebook CPUs. The default maximum clock speed is 1.3GHz, although just as is the case with Turbo enabled desktop chips, you can expect to see frequencies greater than 1.3GHz on a fairly regular basis.

Power consumption along the curve is all very reasonable:

Medfield CPU Frequency vs. Power
  100MHz 600MHz 1.3GHz 1.6GHz
SoC Power Consumption ~50mW ~175mW ~500mW ~750mW

Since most ARM based SoCs draw somewhere below 1W under full load, these numbers seem to put Medfield in line with its ARM competitors - at least on the CPU side.

It's important to pay attention to the fact that we're dealing with similar clock frequencies to what other Cortex A9 vendors are currently shipping. Any performance advantages will either be due to Medfield boosting up to 1.6GHz for short periods of time, inherently higher IPC and/or a superior cache/memory interface.

Introduction The GPU, Process & Roadmap
Comments Locked

164 Comments

View All Comments

  • mrtanner70 - Wednesday, January 11, 2012 - link

    I am a long time fan of the site and still think your actual product testing is the best. However over the last year or so there has been a noticeable pattern of really being a bit too enthusiastic toward Intel PR and spin. I would like to see a little bit more intelligent skepticism when dealing with Intel reference designs and slide decks.
  • Lucian Armasu - Thursday, January 12, 2012 - link

    I have to agree, about Intel spinning stuff more and more lately, too. I've noticed it since they announced the "3D" chips. Everyone reacted as if the 3D are 10x better than SNB or whatever, when in fact that's not true. The 3D chips have only +37% performance at the same power level, OR -50% power consumption at the same performance level. That's barely as news worthy as they made it sound. They put 1 extra year ahead of AMD at most on top of the 6 months difference they already have now.

    So I don't know how everyone got from that that now that they have that they will be able to compete with ARM, which ARM at the same performance has much better power efficiency and also costs several times less.
  • Hector2 - Thursday, January 12, 2012 - link

    What you said is about right for "active" power. A couple of things not mentioned is that the 22nm chip will be quite a bit smaller than 32nm (meaning even lower cost) and that the FinFET transistors drop the "standby" power by a factor of 10X-20X lower than even the 32nm version.

    As was said somewhere, Medfield gets Intel's foot in the door with a competitive product --- and I think most here will agree that it's at least competitive. Certainly, Intel's new partners must think so. The coming of the 22nm upgrade in 2013 is where it really gets interesting.

    It's taken Intel awhile to carve out a viable single chip design that's product-worthy. Now that they have one, I expect them to march with the same tick-tock cadence that they've driven the PC market with -- and they already have working 3D FinFET 14nm chips in the lab.
  • Finally - Thursday, January 12, 2012 - link

    Mark my words:
    The day that AnandTech welcomes a revolutionary AMD product (think e.g. Brazos II) with the words "It's finally here" will be the very day hell freezes over.
  • Hector2 - Thursday, January 19, 2012 - link

    Hmmm. When do you think AMD might be entering the smartphone market ?
  • thunng8 - Wednesday, January 11, 2012 - link

    That intel graph for graphics performance is way off and misleading.

    If one of the smartphones tested is the iphone4S, then that phone should have >2x performance advantage in graphics compared to Medfield reference platform.
  • solipsism - Wednesday, January 11, 2012 - link

    If Medfield is as good as it looks today and they can keep moving along as Cortex-A15 comes along tablets and smartphones could be Intel-based and Apple could be serious trouble by supporting ARM. It could be the whole the PPC issue all over again where Apple will linger for way too long before switching architectures. That's a worse scenario perfect storm, of course, and a long ways off, but it's still something that can't be good for companies with vertical integration with heavy ARM support.
  • markit - Wednesday, January 11, 2012 - link

    as infact 2 year old cortex a-9 (even if single cored) outperform it / mhz

    e.g. if i scale down medfields results linearly
    (which is imho quite accurate for this benches)

    i get:
    @1ghz
    2130ms and 72766 points

    and compare against my old tegra 2 based toshiba ac100 (running ubuntu and chrome 13)

    with one core running @ 1 Ghz, and second core turned off i get:
    sunspider 1962ms ( per mhz 9% faster)
    browsermark 111872 ( per mhz 54% faster as medfield)

    so regarding browsermark infact the ac100 nearly matched the 1.6ghz results of this medfield reference plattform while running on only one 1 ghz core,..

    btw. both cores turned on and @ 1Ghz:
    sunspider 1941 ms
    browsermark 114523

    so hmm, i think i can guess why intel didn`t choose dual-core friendly benchmarks,..

    and remeber the ac100 did hit the stores June 2010!
  • FunBunny2 - Wednesday, January 11, 2012 - link

    I still am puzzled by this. It's been at least a decade since an Intel chip executed X86 code in hardware; the chip is a RISC design fronted by an emulator. Or has Intel gone back to X86 instructions in hardware for these SoC?
  • dealcorn - Thursday, January 12, 2012 - link

    Was I deceived? I read about fin-fet and 3d transistors and was impressed because it is credible technology that will permit Intel to do good things before others. That colored my perspective on mobile Atom leading to statements such as "Show time starts at 22 nm." I felt that absent the 3d benefits, Intel would find it tough to go head to head against ARM. While Intel focused attention on 3d, the implication was they are dead in the water until 22nm arrives.

    That is wrong. Mobile Atom is never going to be built using 22nm transistors. It will be built using 22nm LP transistors. Furthermore, it appears that 32nm LP transistor get the job done just fine (even if 32 nm is worthless in this context). Is 32nm LP the secret sauce that lets it all work? Should the headline have been "Intel's process advantage muscle works almost as good at 32 nm LP as it was supposed to work at 22nm."?

Log in

Don't have an account? Sign up now