More Detail on ARM11 vs. Cortex A8

We’ve gone through the basic architectural details of the ARM11 and Cortex A8 cores, and across the board the A8 is far ahead. It gets even better for the new design once we drill a little deeper.

The L1 cache in the A8 gets a significant improvement. The ARM11 core had a 2 cycle L1 cache, while the A8 has a single cycle L1. In-order cores depend heavily on fast memory access, so an even faster L1 will have a dramatic impact on performance.
ARM11 actually supported a L2 cache but it was rarely used; the Cortex A8 is designed with a tightly coupled L2 cache varying in size. Vendors can choose from cache sizes as small as 128KB all the way up to 1MB, with a minimal access latency of 8 cycles. The L2 access time is programmable, with slower access more desirable to save power.

The caches also include way prediction to minimize the number of cache ways active when doing a cache access, this sort of cache level power management was also used by Intel back on the first Pentium M processors and is still used today in modern x86 processors.

The ARM11 core supported a 64-bit bus that connected it to the rest of the SoC; Cortex A8 allows for either a 64-bit or 128-bit bus. It’s unclear what vendors like Samsung and T.I. have implemented on their A8 based SoCs.


The S is for speed. Powered by the ARM Cortex A8.

With a deeper pipeline, the Cortex A8 also has a much more sophisticated branch prediction unit. While the ARM11 core had a 88% accurate branch predictor, the Cortex A8 can correctly predict branches over 95% of the time. If you care about stats, the A8 has a 512 entry branch target buffer and a 4K entry global history buffer. The accuracy of the branch predictor in the Cortex A8 is actually as high as what AMD claimed with its first Athlon processor, and this is an in-order core in a smartphone. With a 13-stage pipe however, a very accurate predictor was necessary.

While ARM11 supported some rudimentary SIMDfp instructions, Cortex A8 adds a full SIMDfp instruction set with NEON. ARM expects a greater than 2x improvement on media processing applications thanks to the A8’s NEON instructions - of course you’ll need to compile directly for NEON in order to see those gains. If you’re looking for a modern day relation, NEON is like the A8’s SSE whereas ARM11 basically had a sophisticated MMX equivalent. Both are very important.

The Cortex A8 is a more power hungry core than the ARM11, but the design also has much more extensive clock gating (turning off the clock to idle parts of the chip) than the ARM11. Since the A8 is newer it’s also going to be manufactured on a smaller manufacturing process. The bulk of ARM11 based SoCs used 90nm transistors, while A8 based SoCs are shipping at 65nm. ARM11 has started to transition down to 65nm, while A8 will move down to 45nm.

At the same clock speed and with the same L2 cache sizes, ARM shows the Cortex A8 as being able to execute 40% more instructions per second than the ARM11. That’s a generational performance improvement, something that can’t be delivered by clock speed alone, but the comparison is conservative. Cortex A8 designs won’t ship at the same clock speed and cache configurations as ARM11 chips; as far as I can tell, none of the major ARM11 based smartphones even had a L2 cache while Cortex A8 designs are expected to have one.

Furthermore, the ARM11 based smartphones were much lower in the frequency curve than the early A8 platforms. While a 40% improvement in instruction throughput is reasonable at the same specs, I would expect far larger real world performance improvements from a Cortex A8 based SoC compared to a ARM11 SoC.

Overall the Cortex A8 is much more like a modern day microprocessor. It’s still an in-order core, but it adds superscalar execution, a deeper pipeline, larger caches and a broader instruction set among other things. For any current high end smartphone there doesn’t seem to be a reason to choose the ARM11 over it, companies that insist on using ARM11 based designs even in 2009 are either not agile enough to implement a better chip in a quick manner or have no concern for performance and are more focused on cost savings. Neither option is a particularly good one and it is telling that the two manufacturers who seem to have gotten how to properly design a smartphone, Apple and Palm, have both opted to go with a Cortex A8 before most of the more established players.

A Call to Action

This leads me to a further point: we need more transparency in specs from smartphone manufacturers. The mobile phone market is all too shielded from the performance metrics and accountability that we’ve had in the PC space. When Intel was shipping Pentium 4s that performed slower than the Pentium IIIs they were replacing, we called them out on it. To this day, Apple refuses to talk about the processor in the iPhone 3GS. We get to hear all about what’s in the Nehalem Mac Pro, but the hardware behind the 3GS is off limits - despite the fact that it’s very good. This policy of not delivering specifics and a general unwillingness to talk about specs is absurd at best. It doesn’t take much more than a teardown and some homebrew code to figure out what CPU at what frequency is in any modern day smartphone; manufacturers should show pride in their hardware, or refrain from putting something inside a phone that’s they can’t be proud of.

What we need are cache sizes, clock speeds, full architecture disclosures. They don’t have to be on the phone’s marketing materials but make them accessible and at least some of the focus. These SoCs are so incredibly cool, they pack more power than the desktops of 10 years ago into a single chip smaller than my thumbnail - boast about them! Palm had a tremendous leg up on the competition with its OMAP 3430 processor, yet there was hardly any attention paid to it by Palm. I get that the vast majority of consumers don’t get, but those who do, would help tremendously if given access to this information. It’s something to get excited about.

And if the manufacturers won’t devote time and energy to this stuff, then I will.

Putting it in Perspective The CPU and its Performance
Comments Locked

60 Comments

View All Comments

  • MrJim - Wednesday, July 8, 2009 - link

    Why no mention of the heat issues?
  • ViRGE - Wednesday, July 8, 2009 - link

    Anand, if you haven't already, jailbreak the 3GS and grab SysInfoPlus from Cydia. It may be able to tell you the clock speed of the 3GS's ARM, although to what extent I'm not sure since it hasn't been specifically programmed for the A8.
  • ltcommanderdata - Wednesday, July 8, 2009 - link

    I don't suppose that program can also tell the GPU clock speed too?

    I always thought that the MBX work at bus speed, ie. 103MHz for the iPhone/3G and 133MHz for the 2nd Gen iPod Touch instead of the 60Mhz that Anand has speculated. Assuming the iPhone 3G S has a 150MHz bus speed, the SGX could run at 150MHz which is a reasonable compromise between Anand's 100MHz and 200MHz estimates.
  • fyleow - Wednesday, July 8, 2009 - link

    How useful is the new GPU? The iPhone's performance has come a long way from the first generation but I don't see developers taking full advantage of the jump. If you bump up the graphics of your game it might run smoothly on the 3GS but end up lagging on the 1st gen iPhone.

    The increase in load times and battery life is much welcomed, but when do we get to see some apps that take advantage of the upgraded hardware in other more interesting ways? I can see a resolution increase as being one way to do that. The game would look better on a higher resolution screen but performance wouldn't suffer on the older models because the lower resolution would place less demand on the hardware.

    2010 will be an interesting year. There should be a bigger upgrade to the iPhone, most likely a resolution bump and a significantly modified OS that supports background tasks. Apple has been keeping all the devices on the iPhone platform on feature parity so far with the OS upgrades (minus obvious limitations due to hardware differences). It would be interesting to see how they handle the switch and the resulting two classes of phones that come from it (i.e. old "legacy" iPhones/Touch vs new iPhones/Touch).
  • ltcommanderdata - Wednesday, July 8, 2009 - link

    You're right that it's difficult to take full advantage of the SGX without writing a separate dedicated code path for it and one for the MBX. However, there are simpler ways to take advantage of the iPhone 3G S power without writing 2 separate code paths. For example, you can scale draw distance based on hardware. Firemint demonstrated the iPhone 3G S accelerating 40 cars in Real Racing compared to 6 in the iPhone 3G, so the potential for better scalable AI is there. For a RPG, perhaps having more NPCs walking around to make the environment more lifelike. This can all be done using existing OpenGL ES 1.1 code playable on all iPhones/Touches, optimizing for each device, without making older iPhone users feel like they are playing some Lite version of the game as implementing shaders and HDR using OpenGL ES 2.0 in the iPhone 3G S might do.

    I believe the reluctance of Apple to change the resolution is that it could break the interface layout for existing apps and/or make things ugly if apps haven't used vector graphics. It would have been nice if they had enforced resolution independence early on, but I don't believe they did. Resolution independence is also what is needed for Apple to introduce an iPhone nano with a smaller screen and presumably smaller resolution.
  • smallpot - Wednesday, July 8, 2009 - link

    Thanks for the article Anand. Your long-form articles are the reason Anandtech is my number one tech website. I'm thinking of articles such as this, your articles on SSD performance, and the long-form story behind the RV770. After reading such articles, I really feel like I've learned something, rather than just had performance metrics thrown at me without context.
  • Baron Fel - Tuesday, July 7, 2009 - link

    Interesting article.

    As far as portable gaming goes, the Ipod Touch/iPhone/Zune HD dont have a chance against the DS or even the PSP. The software support just isnt there.

    PSP hardware runs circles around the DS, so why is the DS killing it in sales? Good games.

    and are we getting more SSD articles anytime soon? I think thats what we want to see :D
  • ltcommanderdata - Tuesday, July 7, 2009 - link

    Given all the media attention about discoloration and possible heat issues with the iPhone 3G S, I was wondering if you could comment on your experience in this area. Do you think it's a real concern or just stories popularized to generate page hits as Apple related stories tend to do? The latest reports on discoloration indicate that it might actually be from a reaction with some third-party cases that may be reversed by cleaning with alcohol.

    Similarly, there have been lower-key reports of build quality issues with the Palm Pre having a wobbly screen from it's slide-out keyboard. Has this been a major issue for you and do you think it'll be an issue over time?
  • Anand Lal Shimpi - Tuesday, July 7, 2009 - link

    I haven't seen anything to indicate heat as being a bigger concern with the 3GS. It's a new processor so there's bound to be some bad chips out there, but I wouldn't be too concerned.

    The build quality on the Pre did bother me. It's something that I think bothered me more because of my experience with the iPhone. The screen was a bit wobbly and overall the device just didn't feel as well put together. Part of it is because of the slide out keyboard, but part of it has to be cost/experience related. I think you'd get used to it over time, but if you then held an iPhone you'd quickly grow tired of the build quality issues once again :)

    Take care,
    Anand
  • tomoyo - Tuesday, July 7, 2009 - link

    Btw Anand, the chart for number of stages in the cpus shows the Iphone 3GS as 8 stage instead of 13.

Log in

Don't have an account? Sign up now