ARM Cortex A9: What I'm Excited About

NVIDIA won't talk about Tegra GPU architecture, but ARM is more than willing to talk about the Cortex A9.

I'm not used to seeing so much pipeline variance between microprocessor cores. The ARM11 core was introduced in 2003 and featured a single-issue 8-stage integer pipeline. Floating point was optional. The Cortex A8 was announced in 2005 and doubled the front end with. The A8 has a dual-issue in-order 13-stage integer pipeline. Doubling issue width increased IPC (instructions per clock) and the deeper pipeline gave it frequency headroom.

The Cortex A9 goes back down to an 8-stage pipeline. It's still a dual-issue pipeline, but instructions can execute out of order. What's even more ridiculous are the frequencies you can get out of this core. TI is going to be shipping a 750MHz and 1GHz SoC based on the Cortex A9. NVIDIA's Tegra 2 will run at up to 1GHz. And even ARM is willing to supply Cortex A9 designs that can run at up to 2GHz on TSMC's 40nm process. Privately I've heard that designs scaling beyond 2GHz, especially at 28nm, are going to be possible.

This is huge for two reasons. Cortex A9 has a shallower pipeline compared to A8, so it does more per clock. It also has an out of order execution engine, allowing it to also do more per clock. At the same clock speed, A9 should destroy A8. ARM estimates that the A8 can do up to 2 DMIPS per MHz (or 2000 DMIPS at 1GHz), whereas the A9 can do 2.5 DMIPS per MHz (2500 DMIPS at 1GHz). Given that most A8 implementations have been at or below 600MHz (1200 DMIPS), and TI's A9s are running at 750MHz or 1GHz (1875 DMIPS or 2500 DMIPS) I'd expect anywhere from a 30 - 100% performance improvement over existing Cortex A8 designs.

That's just for a single core though. At 40nm there's enough room to cram two of these out of order cores on a single SoC. That's what NVIDIA's doing at first with Tegra 2. Two cores together running multithreaded code and now you're looking at multiples of Cortex A8 performance. I'm talking iPhone to 3GS levels of performance improvement. And then some.

The shallower pipeline is very important for keeping power consumption low. Mispredicted branches have a much lower performance and power impact on shallow pipelines than they do on deep ones.

Each Cortex A9 MPCore has its own private L1 instruction and data caches. I'd expect these to be 32KB in size (each) just as they are today on the A8s. The L2 cache is shared by all cores on the SoC. A shared L2 makes sense, especially with a dual-core design. The architecture can scale up to 8MB of L2, but it seems a bit excessive. I'd expect L2 sizes to stay at around 256KB or 512KB. The L2 can run at the CPU's clock speed or for extremely high clocked versions of the A9 it can run at a divider.

What we're seeing is repetition of the sort of evolution we had in the desktop microprocessor, just on a much smaller scale. The Pentium processor was Intel's last high end in-order chip. The Pentium Pro brought out of order execution into the mix. ARM took that same evolutionary step going from the Cortex A8 to A9.

The world is very different today than it was when the Pentium Pro first came out. Multithreaded code is far more commonplace and thus we see that ARM's first out-of-order processor is also multi-core capable. Technically ARM11 could be used in multi-core environments, it just wasn't (at least not commonly). Even NVIDIA's Tegra 1 used the ARM11 MPCore processor, but only used one of them on its SoC. Cortex A9 will change all of that. The first implementations announced by TI as well as NVIDIA are dual-core designs. The next stage in smartphone evolution is enabling usable multitasking through interfaces like what we saw on the Palm Pre. In order to enable good performance in smartphone multitasking you'll need multiple cores.

There is of course a single core version of the Cortex A9. ARM suggests that the single core A9 is a great upgrade path for ARM11 designs. You get full backwards compatibility on code, an extremely small core (most ARM11 designs were 130nm, at 40nm a single A9 core is very space efficient) and much higher performance.

NEON Optional

With the Cortex A8 ARM introduced its own vector FP instruction set called NEON (think of it like ARM's SSE). A8 processors included a NEON core, but with Cortex A9 partners can either choose to use an ARM FPU or NEON. The FPU based Cortex A9s will most likely be single core implementations designed to be ARM11 replacements. The FPU will be smaller to implement than a full NEON unit and thus save cost/power.

Tegra Tablets Today, Smartphones Soon Atom vs. Cortex A9
POST A COMMENT

55 Comments

View All Comments

  • T2k - Tuesday, January 12, 2010 - link

    http://www.slashgear.com/imagination-technologies-...">http://www.slashgear.com/imagination-te...gx545-to...

    Nvidia has nothing against Imagination's new PowerVR chip, period.

    Anand is licking the wrong @ss again.
    Reply
  • bnolsen - Monday, January 11, 2010 - link

    That pat is bothersome...the core general purpose cpu being only 10% of the transistors in the package. Makes me wonder if there isn't some better way to design cpus and socs in general. Reply
  • techadd - Monday, January 11, 2010 - link

    Most of the job is now done on specialized processors. Get used to it. The general purpose CPUs are going to matter less and less. They are slow for hard tasks and will be giving way to special gear like video and graphics processors. Reply
  • jconan - Saturday, January 09, 2010 - link

    is the TEGRA2 CUDA compliant as others have mentioned? Reply
  • techadd - Sunday, January 10, 2010 - link

    I doubt it. That would draw more power. It's good as it is, but I have hopes for future Tegras Reply
  • jconan - Saturday, January 09, 2010 - link

    is the TEGRA2 CUDA compliant as others have mentioned? Reply
  • Mike1111 - Saturday, January 09, 2010 - link

    Anand, Imagination has their own dedicated HD video decode (VXD) and encode (VXE) processors, just like Nvidia. They offer comparable features (1080p h.264 high profile decode and encode) in a low power envelope. This has nothing to do with the GPU (SGX vs. Nvidia's 2D/3D graphics processor).
    VXD390: http://www.imgtec.com/news/Release/index.asp?NewsI...">http://www.imgtec.com/news/Release/index.asp?NewsI...
    VXE380: http://www.imgtec.com/news/Release/index.asp?NewsI...">http://www.imgtec.com/news/Release/index.asp?NewsI...

    Plus the iPhone3GS officially supports not only 480p but (720x)576p anamorphic (PAL DVD resolution) with high bitrates (if you go too high you just have to manually restrict your encoder to h.264 level 3.0 or iTunes won't transfer the file). Unofficially the iPhone 3GS supports even 1080p, you just have to know which h.264 options to tweak and how to transfer the file. So the problem with 1080p decode is Apple, not the Samsung SoC. Of course that's nothing compared to the announced Tegra2 SKU, but that's no surprise since it's newer and aimed at tablets/smartbooks etc.
    Reply
  • thebeastie - Friday, January 08, 2010 - link

    Good article this one, why? Because I had no idea Nvidia were working on a good SoC technology, I simply ignored just about ANYTHING with the word Tegra on it think it was just some power sucking first gut shot thing created by nvidia as a side show.

    I was so ultra wrong! This looks truly impressive.
    Reply
  • vol7ron - Thursday, January 07, 2010 - link

    Anand,

    You certainly hyped the A9 up, maybe a little too much. I agree with you and everything, but the repetition of the Cortex A9 support kind of made me a little sick. (please read on)

    Personally, I'm happy if there are any improvements, but this still isn't where it should be. What I would like to know, though, is if you plan on doing any performance testing on phone devices in the future?

    I believe smartphones/PDAs/pocket pcs - whatever you want to call them - are reaching that last step of maturity and have enough features and variance that they are worthy of testing.

    I even started thinking, "should I pay to upgrade my phone?" I have 1 1/2 years left on my contract! Had this been one of my previous, non-touch devices, I would have gladly saved money and waited 'til even after my contract expired. But now, I started thinking that I'm using my phone a lot more than my desktop - the $/time-used would say it'd be a better buy.

    Please start doing some in-depth analysis and, if you can, please push the phone manufacturers to include pico-projectors / good external speakers. I for one use my phone to watch my workout videos, it'd be nice just to set it down or let others view things at the same time.

    vol7ron
    Reply
  • QChronoD - Thursday, January 07, 2010 - link

    Dear Santa,
    I plan to be very good this year, so please start your elves working on a new phone running Android on a Tegra2 with a 4.5" OLED screen.
    Reply

Log in

Don't have an account? Sign up now