The CPU: A Dual-Core ARM Cortex A9

 

NVIDIA is a traditional ARM core licensee, which means it implements ARM’s own design rather than taking the instruction set and designing its own core around it (ala Qualcomm). 

 

The Tegra 2 SoC has a pair of ARM Cortex A9s running at up to 1GHz. The cores are clock gated but not power gated. Clock speed is dynamic and can be adjusted at a very granular level depending on load. Both cores operate on the same power plane.

Architecturally, the Cortex A9 isn’t very different from the Cortex A8. The main change is a move from a dual-issue in order architecture with the A8 to a dual-issue out-of-order architecture with the A9. 

 

With its OoO architecture, the A9 adds a re-order buffer and 16 more general purpose registers over the A8 for register renaming. Cortex A9 can reorder around write after read and write after write hazards.

 

ARM’s Cortex A9 MPcore architecture supports up to four cores behind a single shared L2 cache (at up to 1MB in size). Tegra 2 implements a full 1MB shared L2 cache and two Cortex A9 cores. Each core has a 64KB L1 cache (32KB instruction + 32KB data cache).

Pipeline depth is another major change between A8 and A9. While the Cortex A8 had a 13-cycle branch mispredict penalty, A9 shortens the pipeline to 8 cycles. The shallower pipeline improves IPC and reduces power consumption. Through process technology hitting 1GHz isn’t a problem at TSMC 40nm. 

 

From what I can tell, branch prediction, TLBs and execution paths haven’t changed between A8 and A9 although I’m still awaiting further details from ARM on this. 

 

NVIDIA is claiming the end result is a 20% increase in IPC between A8 and A9. That’s actually a bit lower than I’d expect, but combined with the move to dual core you should see a significant increase in performance compared to current Snapdragon and A8 based devices.

If on single threaded workloads the best performance improvement we see is 20%, Qualcomm’s dual-core 1.2GHz Snapdragon due out later this year could still be performance competitive.

 

While all Cortex A8 designs incorporated ARM’s SIMD engine called NEON, A9 gives you the option of integrating either a SIMD engine (ARM’s Media Processing Engine, aka NEON) or a non-vector floating point unit (VFPv3-D16). NVIDIA chose not to include the A9’s MPE and instead opted for the FPU. Unlike the A8’s FPU, in the A9 the FPU is fully pipelined - so performance is much improved. The A9’s FPU however is still not as quick at math as the optional SIMD MPE. 

 

Minimum Instruction Latencies (Single Precision)
Instruction FADD FSUB FMUL FMAC FFDIV FSQRT
ARM Cortex A8 (FPU) 9 cycles 9 cycles 10 cycles 18 cycles 20 cycles 19 cycles
ARM Cortex A9 (FPU) 4 cycles 4 cycles 5 cycles 8 cycles 15 cycles 17 cycles
ARM Cortex A8 (NEON) 1 cycle 1 cycle 1 cycle 1 cycle N/A N/A
ARM Cortex A9 (MPE/NEON) 1 cycle 1 cycle 1 cycle 1 cycle 10 cycles 13 cycles

 

NVIDIA claims implementing MPE would incur a 30% die penalty for a performance improvement that impacts only a minimal amount of code. It admits that at some point integrating a SIMD engine makes sense, just not yet. The table above shows a comparison of instruction latency on various floating point and SIMD engines in A8 and A9.

 

TI’s OMAP 4 on the other hand will integrate ARM’s Cortex A9 MPE. Depending on the code being run, OMAP 4 could have a significant performance advantage in some cases.

Introduction The GeForce ULV GPU
Comments Locked

21 Comments

View All Comments

  • tumbleweed - Wednesday, January 5, 2011 - link

    Any OMAP 4 phone design wins announced yet? I really want to see how that battle plays out.
  • bplewis24 - Wednesday, January 5, 2011 - link

    When will you guys be able to post the benchmarking numbers?

    Also, if the 2X is running a beta build of 2.2 Froyo, why is the status bar black? Are those stock/borrowed photos?

    Brandon
  • strikeback03 - Thursday, January 6, 2011 - link

    Manufacturers could skin the status bar if they wanted, the Galaxy S phones with 2.1 have a black status bar
  • snoozemode - Wednesday, January 5, 2011 - link

    Man, why did they have to do that.. After a few months the images get so bad when the plastic is all messy.
  • strikeback03 - Thursday, January 6, 2011 - link

    Because otherwise it is the front lens of the camera getting nasty?
  • TareX - Wednesday, January 5, 2011 - link

    Why did you describe the Optimus 2X as "one of the fastest smartphones we’ve encountered"??

    That's very disappointing. What phone can be faster?
  • KidneyBean - Thursday, January 6, 2011 - link

    The iPhone? Seriously, it does seem to be fast for what it can do. I'm an Android type of guy myself.
  • TareX - Thursday, January 6, 2011 - link

    I understand the iPhone 4 "appearing" to be the fastest. But they said the aforementioned phrase right after mentioning that they can't reveal benchmarking scores. So it sounds like another phone beat it in benchmarks.
  • Aloonatic - Friday, January 7, 2011 - link

    Well, even if it was/is the fastest phone that they have tested/benchmarked, they couldn't say it here as that would probably be revealing a little too much. As such they probably have to be as vague as they are, so that we are left guessing.
  • Cali3350 - Thursday, January 6, 2011 - link

    Probably the iPhone 4. When it comes to being smooth iOS is untouched at this point in time (probably because everything is GPU accelerated).

Log in

Don't have an account? Sign up now