The CPU

Medfield is the platform, Penwell is the SoC and the CPU inside Penwell is codenamed Saltwell. It's honestly not much different than the Bonnell core used in the original Atom, although it does have some tweaks for both power and performance.

Almost five years ago I wrote a piece on the architecture of Intel's Atom. Luckily (for me, not Intel), Atom's architecture hasn't really changed over the years so you can still look back at that article and have a good idea of what is at the core of Medfield/Penwell. Atom is still a dual-issue, in-order architecture with Hyper Threading support. The integer pipeline is sixteen stages long, significantly deeper than the Cortex A9's. The longer pipeline was introduced to help reduce Atom's power consumption by lengthening some of the decode stages and increasing cache latency to avoid burning through the core's power budget. Atom's architects, similar to those who worked on Nehalem, had the same 2:1 mandate: every new feature added to the processor's design had to deliver at least a 2% increase in performance for every 1% increase in power consumption.

Atom is a very narrow core as the diagram below will show:

 

There are no dedicated integer multiply or divide units, that's all shared with the FP hardware. Intel duplicated some resources (e.g. register files, queues) to enable Hyper Threading support, but stopped short of increasing execution hardware to drive up efficiency. The tradeoff seems to have worked because Intel is able to deliver performance better than a dual-core Cortex A9 from a single HT enabled core. Intel also lucks out because while Android is very well threaded, not all tasks will continually peg both cores in a dual-core A9 machine. At higher clock speeds (1.5GHz+) and with heavy multi-threaded workloads, it's possible that a dual-core Cortex A9 could outperform (or at least equal) Medfield but I don't believe that's a realistic scenario.

Architecturally the Cortex A9 doesn't look very different from Atom:

 

Here we see a dedicated integer multiply unit (shared with one of the ALU ports) but only a single port for FP/NEON. It's clear that the difference between Atom and the Cortex A9 isn't as obvious at the high level. Instead it's the lower level architectural decisions that gives Intel a performance advantage.

Where Intel is in trouble is if you look at the Cortex A15:

 

The A15 is a far more modern design, also out of order but much wider than A9. I fully expect that something A15-class can outperform Medfield, especially if the former is in a dual-core configuration. Krait falls under the A15-class umbrella so I believe Medfield has the potential to lose its CPU performance advantage within a couple of quarters.

Enhancements in Saltwell

Although the CPU core is mated to a 512KB L2 cache, there's a separate 256KB low power SRAM that runs on its own voltage plane. This ULP SRAM holds CPU state and data from the L2 cache when the CPU is power gated in the deepest sleep state. The reasoning for the separate voltage plane is simple. Intel's architects found that the minimum voltage for the core was limited by Vmin for the ULP SRAM. By putting the two on separate voltage planes it allowed Intel to bring the CPU core down to a lower minimum power state as Vmin for the L2 is higher than it is for the CPU core itself. The downside to multiple power islands is an increase in die area. Since Medfield is built on Intel's 32nm LP process while the company transitions to 22nm, spending a little more on die area to build more power efficient SoCs isn't such a big deal. Furthermore, Intel is used to building much larger chips, making Medfield's size a relative nonissue for the company.

The die size is actually very telling as it's a larger SoC than a Tegra 2 with two Cortex A9s despite only featuring a single core. Granted the rest of the blocks around the core are different, but it goes to show you that the CPU core itself (or number of cores) isn't the only determination of the die size of an SoC.

The performance tweaks come from the usual learnings that take place over the course of any architecture's lifespan. Some instruction scheduling restrictions have been lifted, memory copy performance is up, branch predictor size increased and some microcode flows run faster on Saltwell now.

Clock Speeds & Turbo

Medfield's CPU core supports several different operating frequencies and power modes. At the lowest level is its C6 state. Here the core and L2 cache are both power gated with their state is saved off in a lower power on-die SRAM. Total power consumption in C6 of the processor island is effectively zero. This isn't anything new, Intel has implemented similar technologies in desktops since 2008 (Nehalem) and notebooks since 2010 (Arrandale).

When the CPU is actually awake and doing something however it has a range of available frequencies: 100MHz all the way up to 1.6GHz in 100MHz increments.

The 1.6GHz state is a burst state and shouldn't be sustained for long periods of time, similar to how Turbo Boost works on Sandy Bridge desktop/notebook CPUs. The default maximum clock speed is 1.3GHz, although just as is the case with Turbo enabled desktop chips, you can expect to see frequencies greater than 1.3GHz on a fairly regular basis.

Power consumption along the curve is all very reasonable:

Medfield CPU Frequency vs. Power
  100MHz 600MHz 1.3GHz 1.6GHz
SoC Power Consumption ~50mW ~175mW ~500mW ~750mW

Since most ARM based SoCs draw somewhere below 1W under full load, these numbers seem to put Medfield in line with its ARM competitors - at least on the CPU side.

It's important to pay attention to the fact that we're dealing with similar clock frequencies to what other Cortex A9 vendors are currently shipping. Any performance advantages will either be due to Medfield boosting up to 1.6GHz for short periods of time, inherently higher IPC and/or a superior cache/memory interface.

Introduction The GPU, Process & Roadmap
POST A COMMENT

164 Comments

View All Comments

  • hechacker1 - Tuesday, January 10, 2012 - link

    What's impressive to me is the fact that you have an Atom, which powers generations of netbooks, running as a SoC using only milliwatts of power most of the time.

    I'd love to see a tablet/netbook version with a huge battery that could run for the better part of a day.

    It would even do really well as a media server/HTPC if only it had I/O bandwidth for hard disks and HDMI outputs with surround sound.
    Reply
  • aNYthing24 - Tuesday, January 10, 2012 - link

    Not really all that impressed. By the time it's out, you'll start seeing phones ARM Cortex-A15 based SoCs which are about 40-50% faster than Cortex-A9 based SoCs. Reply
  • MonkeyPaw - Tuesday, January 10, 2012 - link

    Yeah, I'm still on the naysayer's side, too. The SOC market isn't as slow to develop as x86. We've seen rapid improvements over the course of the last 4 years. Where the next generation of x86 may get you 15% more performance (if you can even utilize it all), SOC's are gaining in relative leaps and bounds, and the improved performance is quite noticeable. I went from an iPhone3G to an iPhone4, and every aspect of performance was clearly better. Everything loads faster, you can "multitask" and take better pictures and HD video, and the battery life is still better. That was in 3 years time. 3 years from now looks to be just as positive. That means Intel has the tougher road. It took them 5 long years to get Atom DOWN to sub 1W TDP and into a phone, while ARM SOCs have always been working inside this power design.

    That said, I sure as heck hope this doesn't fowl up Android Market any further. I've said it before, App compatibility is already not a given due to OS version fragmentation. Adding x86 will just require more effort on the developer's part, which might doom Intel's early chances anyway.

    Rest assured, Intel will dump billions into this and not go away. They can't afford to miss out on this market. I wonder when AMD will figure that one out...
    Reply
  • yvizel - Wednesday, January 11, 2012 - link

    Clearly shows how little do you know... WOW. Reply
  • Griswold - Thursday, January 12, 2012 - link

    He may know little, whereas you know NOTHING.

    Back under your rock, peanut.
    Reply
  • stadisticado - Wednesday, January 11, 2012 - link

    I gotta say...I really disagree with your assessment. ARM SOCs have indeed improved a lot over the past few gens. However, that is largely due to moving to new nodes, going from single to double to quad core uarch and finally in moving up the ladder on ARM architectures. Now, Intel is jumping in here, let's assess along these vectors:

    I don't think anyone is arguing Intel has process leadership, perhaps not in SOC, but at least their 32nm node is competing directly with TSMC 28nm.

    As for the core wars, I predict Intel goes the route they go in laptops: dual core with HT is the right balance of perf/power for most applications. That said, I guarantee we see quad cores with HT for tablets in 2013 on 22nm.

    The uarch question is yet to be answered. Ultimately we'll have to see how A15 stacks up against this single core Saltwell and the dual core that's coming in the Clovertrail platform.
    Reply
  • name99 - Wednesday, January 11, 2012 - link

    What Intel naysayers (at least the ones that aren't twelve years old) say is that x86 complexity puts SEVERE constraints on how fast Intel can spin these chips around. It took something like 7 years for Nehalem to move from initial design to fully validated and ready to ship.

    Now, in the "traditional" x86 space, this has not been that much of a problem for Intel for a while because they run so much design in parallel, so that each team has its new CPU ready each year. Even so it does occasionally cause problems when Intel misjudges where things will be in the future and misses a feature for a few years. The most obvious example of this was x86-64, but we saw it again regarding moving the memory controller on-die, and regarding how powerful users expect on-chip GPUs to be.

    So, when it comes to Atom:
    (a) do they have multiple design teams in place, so that we can now expect a steady rate of serious improvement every year. Or is this the fruit of their one major design team, and we can expect it to stand still, apart from minor tweaks, for the next three years or so?
    (b) are they willing to modify the SOC as requested by different vendors, to improve power or reduce size? What if I want a custom flash controller on the SOC or a better than average camera processor? Maybe this doesn't matter --- but maybe it does. I think honestly none of us know. What we DO know is that the phone market is different from the old-school PC market. Unlike old school PCs, size (fewer parts) and power (better integrated parts) matter a huge amount, which gives an advantage to CPU vendors who are willing to tailor their SOCs to include anything extra the manufacturer wants, while tossing anything they don't want.

    Finally: "ARM SOCs have indeed improved a lot over the past few gens. However, that is largely due to moving to new nodes, going from single to double to quad core uarch and finally in moving up the ladder on ARM architectures. Now, Intel is jumping in here, let's assess along these vectors:"
    This process (for ARM) has not stopped. ARM-64 is a COMPLETELY new architecture, optimized for power and performance based on everything ARM has learned over the past few years, and tossing various components of the architecture that no longer make sense. ARM-64 devices plan (as far as I know) to be compatible, during the brief period that is necessary, to have a separate ARM-32 core on die to handle ARM-32 binaries --- no crippling of either the ARM-64 architecture or the core design in order to allow 32-bit binaries to still work with it.

    The point is that ARM is both improving their underlying architecture aggressively, AND that they are doing everything they can to ensure that they can remain nimble and easily able to spin out new cores with new ideas as the market evolves. Compare with Intel and their 7 year design cycles...
    Reply
  • stadisticado - Wednesday, January 11, 2012 - link

    On the 7 year design cycle: How long ago did ARM Holdings start RnD on A15? Whats the cycle time for that product to go from scratch to inside a product? That 7 year cycle you're quoting is literally from the first time the name showed up in a design document to ship, which I don't feel is a good comparative metric, especially for SoCs which share multiple common blocks across gens.

    Does Intel have more design teams on Atom? No idea - but looking at the roadmap they have Atom on they basically have to. One team is not capable of spinning Medfield and pushing the 22nm Atom out the door next year and then 14nm the year after.

    Basically you're stipulating that ARM is inherently faster than Intel in this space. Its yet to be demonstrated but I'm stipulating that is going to turn out to be a bad assumption.
    Reply
  • PubFiction - Thursday, January 12, 2012 - link

    Intel did not have a good dedication to atom but they are now getting there. So ARM obviously had the advantage before. Intel considered Atom second rate stuff to put on old nodes.

    Also while the design of a full desktop CPU is long from start to finish I do not think it has anything to do with x86. It is more about just competing at the highest end. Itanium was no walk in the park. And so far no other architecture has beat x86. At the very least it still takes anyone else a long time to design chips.

    The reason ARM moves faster is their chips are so much more simple. And atom is moving in that direction.

    Intel is a scary competitor for anyone in any processor space. Once they dedicate to something it will get competitive. Does not mean they will win, but if these chips have this performance and actually do ship this year it they may not beat A15 but they will be good enough to stick in phones. And that is the start intel needs.
    Reply
  • zeo - Saturday, January 14, 2012 - link

    Yes, Intel had the ATOM on a slow 5 year product cycle but now they are switching it to a 2 year cycle for similar rate of advancement as Intel's higher end chip offerings.

    The 22nm Silvermont update coming out in 2013 will be when the switch is official and we can start expecting the usual year by year tic toc of advancement.

    For example, Silvermont is more than just another die shrink but also the first major architectural change for the ATOM. Like Out Of Order Processing, as well as adding new technology like Intel's Tri-Gate Transistors.

    Intel ATOMs also will be supporting a range of 1 to 8 cores. The 8 core version presently being focused for the server market, but it's likely Intel will leverage multiple cores for most of their intended product lines.

    Cedar Trail is for example dual core for both the high end and low end chip versions.

    While the Clover Trail and Medfield shows Intel is making progress of SoC and improving power efficiency to be at least good enough to start competing in the mobile market.

    Though we still have to see if Intel can be price competitive and Intel will likely need to offer even better performance than ARM can offer with their next gen offerings to start to really expand into the mobile market. So Intel has a lot riding on coming through in 2013.
    Reply

Log in

Don't have an account? Sign up now