CPU Performance

The big news with Tegra 3 is that you get four ARM Cortex A9 cores with NEON support instead of just two (sans NEON) in the case of the Tegra 2 or most other smartphone class SoCs. In the short period of time I had to test the tablet I couldn't draw many definitive conclusions but I did come away with some observations.

Linpack showed us healthy gains over Tegra 2 thanks to full NEON support in Tegra 3:

Linpack - Single-threaded

Linpack - Multi-threaded

As expected, finding applications and usage models to task all four cores is pretty difficult. That being said, it's not hard to use the tablet in such a way that you do stress more than two cores. You won't see 100% CPU utilization across all four cores, but there will be a tangible benefit to having more than two. Whether or not the benefit is worth the cost in die area is irrelevant, it only means that NVIDIA (and/or its partners) have to pay more as the price of the end product to you is already pretty much capped.

SunSpider JavaScript Benchmark 0.9.1

Rightware BrowserMark

The bigger benefit I saw to having four cores vs. two is that you're pretty much never CPU limited in anything you do when multitasking. Per core performance can always go up but I found myself bound either by the broken WiFi or NAND speed. In fact, the only thing that would bring the Prime to a halt was if I happened to be doing a lot of writing to NAND over USB. Keyboard and touch interrupts were a low priority at that point, something I hope to see addressed as we are finally entering the era of performance good enough to bring on some I/O crushing multitasking workloads.

Despite having many cores at its disposal, NVIDIA appears to have erred on the side of caution when it comes to power consumption. While I often saw the third and fourth cores fire up when browsing the web or just using the tablet, NVIDIA did a good job of powering them down when their help wasn't needed. Furthermore, NVIDIA also seems to prefer running more cores at lower voltage/frequency settings than fewer cores at a higher point in the v/f curve. This makes sense given the non-linear relationship between voltage and power.

From a die area perspective I'm not entirely sure having four (technically, five) A9 cores is the best way to deliver high performance, but without a new microprocessor architecture it's surely more efficient than just ratcheting up clock speed. I plan on providing a more thorough look at Tegra 3 SoC performance as I spend more time with a fixed Prime, but my initial impressions are that the CPU performance isn't really holding the platform back.

A Lesson in How Not to Launch a Product Tegra 3 GPU: Making Honeycomb Buttery Smooth
POST A COMMENT

204 Comments

View All Comments

  • abcgum091 - Thursday, December 01, 2011 - link

    After seeing the performance benchmarks, Its safe to say that the ipad 2 is an efficiency marvel. I don't believe I will be buying a tablet until windows 8 is out. Reply
  • ltcommanderdata - Thursday, December 01, 2011 - link

    I'm guessing the browser and most other apps are not well optimized for quad cores. The question is will developers actually bother focusing on quad cores? Samsung is going with fast dual core A15 in it's next Exynos. The upcoming TI OMAP 4470 is a high clock speed dual core A9 and OMAP5 seem to be high clock speed dual core A15. If everyone else standardizes on fast dual cores, Tegra 3 and it's quad cores may well be a check box feature that doesn't see much use putting it at a disadvantage. Reply
  • Wiggy McShades - Thursday, December 01, 2011 - link

    If the developer is writing something in java (most likely native code applications too) it would be more work for them to ensure they are at most using 2 threads instead of just creating as many threads as needed. The amount of threads a java application can create and use is not limited to the number of cores on the cpu. If you created 4 threads and there are 2 cores then the 4 threads will be split between the two cores. The 2 threads per core will take turns executing with the thread who has the highest priority getting more executing time than the other. All non real time operating systems are constantly pausing threads to let another run, that's how multitasking existed before we had dual core cpu's. The easiest way to write an application that takes advantage of multiple threads is to split up the application into pieces that can run independently of each other, the amount of pieces being dependent on the type of application it is. Essentially if a developer is going to write a threaded application the amount of threads he will use will be determined by what the application is meant to do rather than the cores he believes will be available. The question to ask is what kind of application could realistically use more than 2 threads and can that application be used on a tablet. Reply
  • Operaa - Monday, January 16, 2012 - link

    Making responsive today UI most certainly requires you to use threads, so shouldn't be big problem. I'd say 2 threads per application is absolutely a minimum. For example, talking about browsing web, I would imagine useful to handle ui in one thread, loading page in one, loading pictures in third and running flash in fourth (or more), etc. Reply
  • UpSpin - Thursday, December 01, 2011 - link

    ARM introduced big.LITTLE which only makes sense in Quad or more core systems.
    NVIDIA is the only company with a Quad core right now because they integrated this big.LITTLE idea already. Without such a companion core does a quad core consume too much power.
    So I think Samsung released a A15 dual core because it's easier and they are able to release a A15 SoC earlier. They'll work on a Quad core or six or eight core, but then they have to use the big.LITTLE idea, which probably takes a few more months of testing.
    And as we all know, time is money.
    Reply
  • metafor - Thursday, December 01, 2011 - link

    /boggle

    big.Little can work with any configuration and works just as well. Even in quad-core, individual cores can be turned off. The companion core is there because even at the lowest throttled level, a full core will still produce a lot of leakage current. A core made with lower-leakage (but slower) transistors can solve this.

    Also, big.Little involves using different CPU architectures. For example, an A15 along with an A7.

    nVidia's solution is the first step, but it only uses A9's for all of the cores.
    Reply
  • UpSpin - Friday, December 02, 2011 - link

    I haven't said anything different. I just added that Samsung wants to be one of the first who release a A15 SoC. To speed things up they released a dual core only, because there the advantage of a companion core isn't that big and the leakage current is 'ok'. It just makes the dual core more expensive (additional transistors needed, without such a huge advantage)
    But if you want to build a quad core, you must, just as Nvidia did, add such a companion core, else the leakage current is too high. But integrating the big.LITTLE idea probably takes additional time, thus they wouldn't be the first who produced a A15 based SoC.
    So to be one of the first, they chose to take the easiest design, a dual core A15. After a few months and additional time of RD they will release a quad core with big.LITTLE and probably a dual core and six core and eigth core with big.LITTLE, too.
    Reply
  • hob196 - Friday, December 02, 2011 - link

    You said:
    "ARM introduced big.LITTLE which only makes sense in Quad or more core systems"

    big.LITTLE would apply to single core systems if the A7 and A15 pairing was considered one core.
    Reply
  • UpSpin - Friday, December 02, 2011 - link

    Power consumption wise it makes sense to pair an A7 with a single and dual core already.
    Cost wise it doesn't really make sense.
    I really doubt that we will see some single core A15 SoC with a companion core. And dual core, maybe, but not at the beginning.
    Reply
  • GnillGnoll - Friday, December 02, 2011 - link

    It doesn't matter how many "big" cores there are, big.LITTLE is for those situations where turning on even a single "big" core is a relatively large power draw.

    A quad core with three cores power gated has no more leakage than a single core chip.
    Reply

Log in

Don't have an account? Sign up now