More Detail on ARM11 vs. Cortex A8

We’ve gone through the basic architectural details of the ARM11 and Cortex A8 cores, and across the board the A8 is far ahead. It gets even better for the new design once we drill a little deeper.

The L1 cache in the A8 gets a significant improvement. The ARM11 core had a 2 cycle L1 cache, while the A8 has a single cycle L1. In-order cores depend heavily on fast memory access, so an even faster L1 will have a dramatic impact on performance.
ARM11 actually supported a L2 cache but it was rarely used; the Cortex A8 is designed with a tightly coupled L2 cache varying in size. Vendors can choose from cache sizes as small as 128KB all the way up to 1MB, with a minimal access latency of 8 cycles. The L2 access time is programmable, with slower access more desirable to save power.

The caches also include way prediction to minimize the number of cache ways active when doing a cache access, this sort of cache level power management was also used by Intel back on the first Pentium M processors and is still used today in modern x86 processors.

The ARM11 core supported a 64-bit bus that connected it to the rest of the SoC; Cortex A8 allows for either a 64-bit or 128-bit bus. It’s unclear what vendors like Samsung and T.I. have implemented on their A8 based SoCs.


The S is for speed. Powered by the ARM Cortex A8.

With a deeper pipeline, the Cortex A8 also has a much more sophisticated branch prediction unit. While the ARM11 core had a 88% accurate branch predictor, the Cortex A8 can correctly predict branches over 95% of the time. If you care about stats, the A8 has a 512 entry branch target buffer and a 4K entry global history buffer. The accuracy of the branch predictor in the Cortex A8 is actually as high as what AMD claimed with its first Athlon processor, and this is an in-order core in a smartphone. With a 13-stage pipe however, a very accurate predictor was necessary.

While ARM11 supported some rudimentary SIMDfp instructions, Cortex A8 adds a full SIMDfp instruction set with NEON. ARM expects a greater than 2x improvement on media processing applications thanks to the A8’s NEON instructions - of course you’ll need to compile directly for NEON in order to see those gains. If you’re looking for a modern day relation, NEON is like the A8’s SSE whereas ARM11 basically had a sophisticated MMX equivalent. Both are very important.

The Cortex A8 is a more power hungry core than the ARM11, but the design also has much more extensive clock gating (turning off the clock to idle parts of the chip) than the ARM11. Since the A8 is newer it’s also going to be manufactured on a smaller manufacturing process. The bulk of ARM11 based SoCs used 90nm transistors, while A8 based SoCs are shipping at 65nm. ARM11 has started to transition down to 65nm, while A8 will move down to 45nm.

At the same clock speed and with the same L2 cache sizes, ARM shows the Cortex A8 as being able to execute 40% more instructions per second than the ARM11. That’s a generational performance improvement, something that can’t be delivered by clock speed alone, but the comparison is conservative. Cortex A8 designs won’t ship at the same clock speed and cache configurations as ARM11 chips; as far as I can tell, none of the major ARM11 based smartphones even had a L2 cache while Cortex A8 designs are expected to have one.

Furthermore, the ARM11 based smartphones were much lower in the frequency curve than the early A8 platforms. While a 40% improvement in instruction throughput is reasonable at the same specs, I would expect far larger real world performance improvements from a Cortex A8 based SoC compared to a ARM11 SoC.

Overall the Cortex A8 is much more like a modern day microprocessor. It’s still an in-order core, but it adds superscalar execution, a deeper pipeline, larger caches and a broader instruction set among other things. For any current high end smartphone there doesn’t seem to be a reason to choose the ARM11 over it, companies that insist on using ARM11 based designs even in 2009 are either not agile enough to implement a better chip in a quick manner or have no concern for performance and are more focused on cost savings. Neither option is a particularly good one and it is telling that the two manufacturers who seem to have gotten how to properly design a smartphone, Apple and Palm, have both opted to go with a Cortex A8 before most of the more established players.

A Call to Action

This leads me to a further point: we need more transparency in specs from smartphone manufacturers. The mobile phone market is all too shielded from the performance metrics and accountability that we’ve had in the PC space. When Intel was shipping Pentium 4s that performed slower than the Pentium IIIs they were replacing, we called them out on it. To this day, Apple refuses to talk about the processor in the iPhone 3GS. We get to hear all about what’s in the Nehalem Mac Pro, but the hardware behind the 3GS is off limits - despite the fact that it’s very good. This policy of not delivering specifics and a general unwillingness to talk about specs is absurd at best. It doesn’t take much more than a teardown and some homebrew code to figure out what CPU at what frequency is in any modern day smartphone; manufacturers should show pride in their hardware, or refrain from putting something inside a phone that’s they can’t be proud of.

What we need are cache sizes, clock speeds, full architecture disclosures. They don’t have to be on the phone’s marketing materials but make them accessible and at least some of the focus. These SoCs are so incredibly cool, they pack more power than the desktops of 10 years ago into a single chip smaller than my thumbnail - boast about them! Palm had a tremendous leg up on the competition with its OMAP 3430 processor, yet there was hardly any attention paid to it by Palm. I get that the vast majority of consumers don’t get, but those who do, would help tremendously if given access to this information. It’s something to get excited about.

And if the manufacturers won’t devote time and energy to this stuff, then I will.

Putting it in Perspective The CPU and its Performance
Comments Locked

60 Comments

View All Comments

  • iwodo - Thursday, July 9, 2009 - link

    Would really love Anand digg deeper and give us some more info. The info i could find for Atom, has 47 Million transistors. Ars report 40% of it is cache, while others report the core is 13.7 million. The previous iPhone article Jarred Walton commented that x86 decoder no longer matters because 1.5 - 2 million transistors inside a billions transistor CPU is negligible. However in Mobile space, 2M inside a 13.7M is nearly 15%. Not to mention other transistor used that is needed for this decoding.

    The space required for Atom is 25mm2 on a 45NM ( Including All Cache) . Cortex A8 require 9mm2 ( dont know how many cache ) on 65nm.

    What is interesting is how Intel manage to squeeze the north bridge inside the Atom CPU ( more transistors ) while making the Die Smaller. ( i dont know if Intel slides were referring to the total package size or the die size itself ).
  • snookie - Thursday, July 9, 2009 - link

    The Pre hardware, as in case, screen, keyboard is terrible. Cheap, plasticky and breaking left and right on people. If Palm survives long enough to get to Verizon etc Here's hoping they come out with better hardware soon. I've used Blackberries for year but I see no need for a physical keyboard. With the new iPhone widescreen keyboard I type with both thumbs very quickly and I have big hands.
  • snookie - Thursday, July 9, 2009 - link

    Jason, Apple has in fact agreed to using mini-usb as a standard. As if that is really a reason to buy a phone or not.

    To say Apple never changes shows no knowledge of the history of Apple, even their recent history.
  • Itaintrite - Wednesday, July 8, 2009 - link

    Heh, it's funny how you say that you can't just look at clock speed, then followed with "the 528MHz processor in the iPod Touch is no where near as fast as the 600MHz processor in the iPhone 3GS." Heh.
  • Anonymous Freak - Wednesday, July 8, 2009 - link

    I want my punch and pie!

    Or a lollipop.

    Good review, I could feel your hunger pangs toward both Palm and Apple toward the end...
  • monomer - Wednesday, July 8, 2009 - link

    Regarding Anand's comments about Android phones needing an upgraded CPU, rumors are that the upcoming Sony Xperia Rachael will be sporting a 1GHz Qualcomm Snapdragon processor (ARM Cortex A8 derivative). Would love to find out the details of this phone when they become available.

    http://www.engadget.com/2009/07/04/sony-ericsson-r...">http://www.engadget.com/2009/07/04/sony...chael-an...
  • Affectionate-Bed-980 - Thursday, July 9, 2009 - link

    Well I'd like to see Anand's experience with Android phones. What is it, just G1? Look at the new Hero or even G2. What about the Samsung i7500? Sorry I'm afraid that the limited nature of cell phone selection in the US makes it VERY HARD to review cell phones well here. I haven't seen a good cell phone site that's by people in the US and from the US only. Phone Arena, Mobile Burn, Phonescoop, GSM Arena, It seems the international guys get a LOT more exposure, and this is why I feel like Anand's comments about phones in general makes him sound inexperienced which I can certainly bet is the case.

    If you limit yourself to only carrier offered phones, then I don't think you can make accurate assessments about manufacturers like Nokia or certain OS phones like WinMo or Symbian or even Android unless the US starts offering more of what the world considers top notch popular phones.
  • Affectionate-Bed-980 - Wednesday, July 8, 2009 - link

    N97 specs should be 434 MHz ARM11 not 424...
  • Affectionate-Bed-980 - Wednesday, July 8, 2009 - link

    BTW I don't believe you should be commenting about the N97. Gizmodo is heavily biased towards iPhones and unless you yourself Anand uses some Symbian S60 phones with detail, I don't really think you should join in the S60 bashing. I think a lot of us Symbian users AGREE that the platform needs to improve, but considering we were like ZOMG434MHZFAIL, the N97 is not bad in response time if you look at a few videos. The UI exceeded a lot of expectations amongst the Symbian crowd. If anything why don't you throw the Samsung i8910 Omnia HD in there instead? That has a Cortex A8 and uses Symbian S60v5 (not to mention has been out longer than the N97). The other S60v5 phone to come out is the Sony Satio which also uses a Cortex A8.

    You might as well comment on why Cortex A8 isn't being implemented in all new phones. WinMo phones are still on ARM11, and even HTC's newest announcements are ARMv7 chips. The iPhone doesn't define what high end is. Because if you want to point out that its unacceptable to have a ARMv7 chip in an N97, then it's just as unacceptable for the iPhone not to have a 5MP camera and multitasking.
  • straubs - Wednesday, July 8, 2009 - link

    1. It IS unacceptable for any flagship phone to use ARM11. The iPhone, Pre, and Omnia HD (as you pointed out) all use it, so why wouldn't Nokia put it in it's $700 N-series flagship? It doesn't make sense. I'm surprised he didn't mention the crappy screen on the N97.

    2. He did comment on how the iPhone needs multi-tasking and how much he missed the Pre's implementation of it.

    3. Doesn't everyone at this point agree that the number of megapixels in a phone camera is not a huge deal, considering the size of the sensor and optics? I would guess the N97 pictures are better than those from the the 3GS, but nothing like the jump from an ARM11 or A8.

Log in

Don't have an account? Sign up now