The CPU

Medfield is the platform, Penwell is the SoC and the CPU inside Penwell is codenamed Saltwell. It's honestly not much different than the Bonnell core used in the original Atom, although it does have some tweaks for both power and performance.

Almost five years ago I wrote a piece on the architecture of Intel's Atom. Luckily (for me, not Intel), Atom's architecture hasn't really changed over the years so you can still look back at that article and have a good idea of what is at the core of Medfield/Penwell. Atom is still a dual-issue, in-order architecture with Hyper Threading support. The integer pipeline is sixteen stages long, significantly deeper than the Cortex A9's. The longer pipeline was introduced to help reduce Atom's power consumption by lengthening some of the decode stages and increasing cache latency to avoid burning through the core's power budget. Atom's architects, similar to those who worked on Nehalem, had the same 2:1 mandate: every new feature added to the processor's design had to deliver at least a 2% increase in performance for every 1% increase in power consumption.

Atom is a very narrow core as the diagram below will show:

 

There are no dedicated integer multiply or divide units, that's all shared with the FP hardware. Intel duplicated some resources (e.g. register files, queues) to enable Hyper Threading support, but stopped short of increasing execution hardware to drive up efficiency. The tradeoff seems to have worked because Intel is able to deliver performance better than a dual-core Cortex A9 from a single HT enabled core. Intel also lucks out because while Android is very well threaded, not all tasks will continually peg both cores in a dual-core A9 machine. At higher clock speeds (1.5GHz+) and with heavy multi-threaded workloads, it's possible that a dual-core Cortex A9 could outperform (or at least equal) Medfield but I don't believe that's a realistic scenario.

Architecturally the Cortex A9 doesn't look very different from Atom:

 

Here we see a dedicated integer multiply unit (shared with one of the ALU ports) but only a single port for FP/NEON. It's clear that the difference between Atom and the Cortex A9 isn't as obvious at the high level. Instead it's the lower level architectural decisions that gives Intel a performance advantage.

Where Intel is in trouble is if you look at the Cortex A15:

 

The A15 is a far more modern design, also out of order but much wider than A9. I fully expect that something A15-class can outperform Medfield, especially if the former is in a dual-core configuration. Krait falls under the A15-class umbrella so I believe Medfield has the potential to lose its CPU performance advantage within a couple of quarters.

Enhancements in Saltwell

Although the CPU core is mated to a 512KB L2 cache, there's a separate 256KB low power SRAM that runs on its own voltage plane. This ULP SRAM holds CPU state and data from the L2 cache when the CPU is power gated in the deepest sleep state. The reasoning for the separate voltage plane is simple. Intel's architects found that the minimum voltage for the core was limited by Vmin for the ULP SRAM. By putting the two on separate voltage planes it allowed Intel to bring the CPU core down to a lower minimum power state as Vmin for the L2 is higher than it is for the CPU core itself. The downside to multiple power islands is an increase in die area. Since Medfield is built on Intel's 32nm LP process while the company transitions to 22nm, spending a little more on die area to build more power efficient SoCs isn't such a big deal. Furthermore, Intel is used to building much larger chips, making Medfield's size a relative nonissue for the company.

The die size is actually very telling as it's a larger SoC than a Tegra 2 with two Cortex A9s despite only featuring a single core. Granted the rest of the blocks around the core are different, but it goes to show you that the CPU core itself (or number of cores) isn't the only determination of the die size of an SoC.

The performance tweaks come from the usual learnings that take place over the course of any architecture's lifespan. Some instruction scheduling restrictions have been lifted, memory copy performance is up, branch predictor size increased and some microcode flows run faster on Saltwell now.

Clock Speeds & Turbo

Medfield's CPU core supports several different operating frequencies and power modes. At the lowest level is its C6 state. Here the core and L2 cache are both power gated with their state is saved off in a lower power on-die SRAM. Total power consumption in C6 of the processor island is effectively zero. This isn't anything new, Intel has implemented similar technologies in desktops since 2008 (Nehalem) and notebooks since 2010 (Arrandale).

When the CPU is actually awake and doing something however it has a range of available frequencies: 100MHz all the way up to 1.6GHz in 100MHz increments.

The 1.6GHz state is a burst state and shouldn't be sustained for long periods of time, similar to how Turbo Boost works on Sandy Bridge desktop/notebook CPUs. The default maximum clock speed is 1.3GHz, although just as is the case with Turbo enabled desktop chips, you can expect to see frequencies greater than 1.3GHz on a fairly regular basis.

Power consumption along the curve is all very reasonable:

Medfield CPU Frequency vs. Power
  100MHz 600MHz 1.3GHz 1.6GHz
SoC Power Consumption ~50mW ~175mW ~500mW ~750mW

Since most ARM based SoCs draw somewhere below 1W under full load, these numbers seem to put Medfield in line with its ARM competitors - at least on the CPU side.

It's important to pay attention to the fact that we're dealing with similar clock frequencies to what other Cortex A9 vendors are currently shipping. Any performance advantages will either be due to Medfield boosting up to 1.6GHz for short periods of time, inherently higher IPC and/or a superior cache/memory interface.

Introduction The GPU, Process & Roadmap
Comments Locked

164 Comments

View All Comments

  • Exophase - Tuesday, January 10, 2012 - link

    ... my point being that by the time others have 28nm ARM SoCs - you know, in a couple months - Intel's 22nm Atom will still be a good year off, quite contrary to your original claim.. Did I actually need to type that?

    It's not 1-2 quarters, and we know this because we're seeing Transformer Prime TF202 with Krait, not to mention the release of AMD's Southern Islands which is also on TSMC's 28nm process.

    Of course it'll be a few months before we see Medfield phones on the market too: this article says Q2 for Chinese phones, which means 3+ months, and "by the end of the year" for something released from a decent global player. Not really uplifting schedules.

    Of course Intel certainly may gain a much bigger process lead with Atom on 22nm and beyond, but your original comment was just wrong. That's all I'm saying.
  • Exophase - Tuesday, January 10, 2012 - link

    Oh, and Intel's first step into the smartphone market was Moorestown. Just because almost no one used it doesn't mean that it wasn't an attempt and that we should congratulate Medfield as a first try.
  • french toast - Wednesday, January 11, 2012 - link

    ARM vendors will be RELEASING chips on 32nm HK and 28nm 4/months before Medfield hits anyware but China.

    The above tests are misleading, but desite that they are impressive as i thought they would be a lot further away than that.

    Medfield = tegra 2 class, you have to remember that the above chips are clocked alot lower than 1.66ghz and they are single thread benchmarks.
    regards to power consumption,as stated they are 45nm designs, when 28&32nm big_LITTLE arrives in a couple of months,with on die 4g, we are in another territory altogether.

    When silvermont arrives it will fair against 28nmHK QUAD CORE KRAITS running at 2.5ghz and above....cant see Intel smashing that, but long term if they can be competitive then they may win due to resouces.
  • Braumin - Tuesday, January 10, 2012 - link

    Yeah I always found the comments on x86 phones hillarious because of the ARM loving going on. I mean, it is an instruction set. Stop being an instruction set fanboy people.

    I've said from the beginning that Intel is not a company to expect to lay down. They just have way too many smart people, money, and by far the best fab in the industry to expect them not to be able to compete.

    I'm no Intel fanboy (and certainly not x86 I mean who can love an instruction set) but they have been killing it in all sectors lately. Even their GPUs don't suck as bad as before.

    I am impressed.
  • Hector2 - Wednesday, January 11, 2012 - link

    even AMD doesn't whine about being at least a couple years behind Intel's process technology and that fact has hurt them big time. In the end, the consumer doesn't care "why" Intel has a strategic advantage with their technology it'll make them successful in smartphones
  • tecknurd - Wednesday, January 11, 2012 - link

    Dark_Archonis, Does not matter that the Medfield processor is based on 5 year old hardware design. ARM processors uses hardwired microcode while 80x86 processors uses software microcode. There is a difference how fast and how easy it is to design and optimize a processor with either of these ways. Intel can take a ten year old model and optimize the microcode while ARM will have to keep on designing new models in hardware to push out higher performing processors.

    The number one question, will the mobile industry take this processor even though software have to be rewritten. Sure Intel said that about 75% of Android apps can be used. Sure 90% of apps can be used if emulation is used. Mac users knows all to well that Rosetta slows down PowerPC software on an x86 processor because of the emulation.

    Now why can Intel can fix their five year old Atom processor to be crammed into a smartphone is most of Intel's money goes into R&D. ARM does not have that money to go into R&D. If ARM did, Intel will have a very, very serious competitor even though ARM is an engine that runs on a completely different fuel. So the comparison of the Medfield processor from Intel and ARM Cortex A9 is like comparing apples to oranges.

    ARM processors suits niche markets better than x86 processors. x86 processors suits a general purpose setup. This means Intel will have to be ready to make dramatic changes to Medfield to suit many different configurations. Intel's history for this type of business is poor or is not capable because of the tight control of the license. ARM succeeds in a market the requires many different changes thanks to its lighter control of the license. ARM processors are like lego blocks and x86 processors are like as-is.
  • stadisticado - Wednesday, January 11, 2012 - link

    I know what you're saying but I really feel you're wrong on one point and need clarification on another.

    First, assuming Intel or x86 processors are not built with a (lego) block architecture is a fallacy that needs to stop right now. There's no reason x86 processors and SOCs can't be built this way and Penwell is the first example of this. I think you'll see Intel reusing and sharing a whole lot more IP blocks internally up and down the vertical stack because of this move into phones/tablets.

    Second, its starting to get a little old always referring to ARM, especially how you do with respect to RnD. Its not ARM spending those dollars, its four or five discrete companies: Qualcomm, TI, nVidia, Samsung, etc. ARM sells licenses...it doesn't generally give a rip how many of its chips are sold unless the license fee is predicated on volume. Its up to the chip designers to compete with Intel, not ARM.
  • french toast - Wednesday, January 11, 2012 - link

    Partly true, ARM does spend R&D of course, else there wouldn't be a Cortex/Mali reference design to buy..let alone any interconnects/future ISAs
  • tecknurd - Wednesday, January 11, 2012 - link

    You did not read my comment correctly. I said that ARM processors are like legos while Intel or x86 is as-is. The problem with Intel's Medfield is Intel's business model. Their model is as-is, so companies that want to use this processor will have to add the components outside of this processor. In ARM case, the components can be added in the chip which does not take any space on the motherboard. When components are placed outside of the processor, the space that could have been a bigger battery will be taken up by additional components. The companies that will use the Medfield is smartphone and tablet brands like Lenovo and others.

    ARM makes the processor models while Qualcomm, TI, nVidia, Samsung, and others integrates them in their designs and with other technologies that these companies have created themselves. The companies that takes these processors and put them in their smartphones and tablets are LG, HTC, Nokia, ASUS, Samsung and others.

    IMHO, ARM based smartphones and tablets will be cheaper than Intel based because there is least amount of effort for designing ARM based versions. Sure you can keep on supporting and hoping that Intel looks good in the smartphones and tablets, but the as-is business model is going to hurt Intel.
  • Exophase - Wednesday, January 11, 2012 - link

    Only a few percent of instructions executed on Atom even use microcode at all. It's not so much that ARM CPUs use "hardwired microcode", it's more that they don't have instructions that are complex enough to merit microcode at all. There are a few that have some sequencing in the decoder, mainly the load/store multiple instructions, but their mapping is really straightforward; there isn't any real element of being able to "optimize" them in the same uarch.

    Most x86 instructions that are implemented as microcode on Atom are legacy instructions that made sense 30 years ago but don't today, either due to changes in software requirements (see BSD numeric instructions for a good example of this..) or changes in what is efficient to implement directly (see loop instruction). For a lot of these instructions you're better off doing it using simpler non-microcoded instructions. So Intel optimizing their microcode is worth mentioning (some of the instructions looked WAY slower than they should have been, going by Agner Fog's timings) but not really some massive competitive edge against ARM. It'll probably barely register as a change on any benchmarks. Hopefully Intel will make the optimizations available for the older Atoms too (the microcode can be soft-patched) and then you can see for yourself.

Log in

Don't have an account? Sign up now