GPU Performance

All of our discussions around the new iPad and its silicon thus far have been in the theoretical space. Unfortunately the state of Android/iOS benchmarking is abysmal at best today. Convincing game developers to include useful benchmarks and timedemo modes in their games is seemingly impossible without a suitably large check. I have no doubt this will happen eventually, but today we're left with some great games and no way to benchmark them.

Without suitable game benchmarks, we rely on GLBenchmark quite a bit to help us in evaluating mobile GPU performance. Although even the current most stressful GLBenchmark test (Egypt) is a far cry from what modern Android/iOS games look like, it's the best we've got today.

We'll start out with the synthetic tests, which should show us roughly a 2x increase in performance compared to the iPad 2. Remember the PowerVR SGX 543MP4 simply bundles four SGX 543 cores instead of two. Since we're still on a 45nm LP process, GPU clocks haven't increased so we're looking at a pure doubling of virtually all GPU resources.

GLBenchmark 2.1—Fill Test

GLBenchmark 2.1—Triangle Test (White)

GLBenchmark 2.1—Triangle Test (Textured, Fragment Lit)

Indeed we see a roughly 2x increase in triangle and fill rates. Below we have the output from GLBenchmark's low level tests. Pay particular attention to how, at 1024 x 768, performance doubles compared to the iPad 2 but at 2048 x 1536 performance can drop to well below what the iPad 2 was able to deliver at 10 x 7. It's because of this drop in performance at the iPad's native resolution that we won't see many (if any at all), visually taxing games run at anywhere near 2048 x 1536.

GLBenchmark 2.1.3 Low Level Comparison
  iPad 2 (10x7) iPad 3 (10x7) iPad 3 (20x15) ASUS TF Prime
Trigonometric test—vertex weighted
35 fps
60 fps
57 fps
47 fps
Trigonometric test—fragment weighted
7 fps
14 fps
4 fps
20 fps
Trigonometric test—balanced
5 fps
10 fps
2 fps
9 fps
Exponential test—vertex weighted
59 fps
60 fps
60 fps
41 fps
Exponential test—fragment weighted
25 fps
49 fps
13 fps
18 fps
Exponential test—balanced
19 fps
37 fps
8 fps
7 fps
Common test—vertex weighted
49 fps
60 fps
60 fps
35 fps
Common test—fragment weighted
8 fps
16 fps
4 fps
28 fps
Common test—balanced
6 fps
13 fps
2 fps
12 fps
Geometric test—vertex weighted
57 fps
60 fps
60 fps
27 fps
Geometric test—fragment weighted
12 fps
24 fps
6 fps
20 fps
Geometric test—balanced
9 fps
18 fps
4 fps
9 fps
For loop test—vertex weighted
59 fps
60 fps
60 fps
28 fps
For loop test—fragment weighted
30 fps
57 fps
16 fps
42 fps
For loop test—balanced
22 fps
43 fps
11 fps
15 fps
Branching test—vertex weighted
58 fps
60 fps
60 fps
45 fps
Branching test—fragment weighted
58 fps
60 fps
30 fps
46 fps
Branching test—balanced
22 fps
43 fps
16 fps
16 fps
Array test—uniform array access
59 fps
60 fps
60 fps
60 fps
Fill test—Texture Fetch
1001483136 texels/s
1977874688
texels/s
1904501632
texels/s
415164192
texels/s
Triangle test—white
65039568
triangles/s
133523176
triangles/s
85110008
triangles/s
55729532
triangles/s
Triangle test—textured
56129984
triangles/s
116735856
triangles/s
71362616
triangles/s
54023840
triangles/s
Triangle test—textured, vertex lit
45314484
triangles/s
93638456
triangles/s
46841924
triangles/s
28916834
triangles/s
Triangle test—textured, fragment lit
43527292
triangles/s
92831152
triangles/s
39277916
triangles/s
26935792
triangles/s

GLBenchmark also includes two tests designed to be representative of a workload you could see in an actual 3D game. The older Pro test uses OpenGL ES 1.0 while Egypt is an ES 2.0 test. These tests can either run at the device's native resolution with vsync enabled, or rendered offscreen at 1280 x 720 with vsync disabled. The latter offers us a way to compare GPUs without device screen resolution creating unfair advantages.

Unfortunately there was a bug in the iOS version of GLBenchmark 2.1.2 that resulted in all on-screen benchmarks running at 1024 x 768 rather than the new iPad's native 2048 x 1536 resolution. This is why all of the native GLBenchmark scores from the new iPad are capped at 60 fps. It's not because the new GPU is fast enough to render at speeds above 60 fps at 2048 x 1536, it's because the benchmark is actually showing performance at 1024 x 768. Luckily, GLBenchmark 2.1.3 fixes this problem and delivers results at the new iPad's native screen resolution:

GLBenchmark 2.1—Egypt (Standard)

GLBenchmark 2.1—Pro (Standard)

Surprisingly enough, the A5X is actually fast enough to complete these tests at over 50 fps. Perhaps this is more of an indication of how light the Egypt workload has become, as the current crop of Retina Display enhanced 3D titles for the iPad all render offscreen to a non-native resolution due to performance constraints. The bigger takeaway is that with the 543MP4 and a quad-channel LP-DDR2 interface, it is possible to run a 3D game at 2048 x 1536 and deliver playable frame rates. It won't be the prettiest game around, but it's definitely possible.

The offscreen results give us the competitive analysis that we've been looking for. With a ~2x die size advantage, the fact that we're seeing a 2-3x gap in performance here vs. NVIDIA's Tegra 3 isn't surprising:

GLBenchmark 2.1—Egypt—Offscreen 720p

GLBenchmark 2.1—Pro—Offscreen 720p

The bigger worry is what happens when the first 1920 x 1200 enabled Tegra 3 tablets start shipping. With (presumably) no additional GPU horsepower or memory bandwidth under the hood, we'll see this gap widen.

The Impact of Larger Memory A5X vs. Tegra 3 in the Real World
POST A COMMENT

233 Comments

View All Comments

  • Steelbom - Thursday, March 29, 2012 - link

    I'm curious why we didn't see any graphics benchmarks from the UDK like with the iPhone 4S review? Reply
  • Craig234 - Thursday, March 29, 2012 - link

    Wow, this is good to buy... 'if you are in desperate need for a tablet'?

    That's a pretty weak recommendation, I expected a much stronger endorsement based on the review.
    Reply
  • Chaki Shante - Friday, March 30, 2012 - link

    Great, thorough review, thanks Anand et al.

    Given the sheer size of the SoC (like 4x larger then Tegra2 or OMAP4430, and 2x Tegra3), you'd bet Apple has the fastest current SoC, at least GPU-wise.

    This SoC is just huge and Apple's margin is certainly lowered. Is this sustainable on the long run ?

    I wonder if any other silicon manufacturer could make same size devices (not technologically but from a price perspective) and expect to sell them.
    Reply
  • dagamer34 - Friday, March 30, 2012 - link

    No one else needs to crank out so many chips that are the same. Also, other companies will be waiting long enough to use 28nm, so there's little chance they'll be hitting the same size as the A5X on 45nm. Reply
  • Aenean144 - Friday, March 30, 2012 - link

    Since Apple is both the chip designer/licensee and hardware vendor, it saves them the cost of paying a middleman. Ie, Nvidia has to make a profit on a Tegra sale, Apple does not, and can afford a more expensive chip from the fab compared to the business component chain from Asus to Nvidia to GF/TSMC and other IP licensees.

    I bet there is at least 50% margin somewhere in the transaction chain from Asus to Nvidia to GF/TSMC. Apple may also have a sweetheart IP deal from both ARMH and IMGTEC that competitors may not have.
    Reply
  • shompa - Friday, March 30, 2012 - link

    @Aenean144

    Tegra2 cost 25 dollars for OEMs and 15 dollars to manufacture. A5 cost Apple 25 dollars to manufacture. By designing its own SoC Apple got 30% larger SoC at the same price as Android OEMs.

    Tegra3 is huge. That is a problem for Nvidia. It costs at least 50% more to manufacture. Nvidia is rumored to charge 50 dollar for the SoC.

    A5X is 50%+ larger then Tegra3. Depending of yields it cost Apple 35-50 dollar per SoC.

    The integrated model gives Apple cheaper SoCs, but also custom designed for their needs. Apple have a long history of Accelerating stuff in its OS. Back in 2002 it was AltiVec. Encoding a DVD on a 667mhz powerbook took 90 minutes. The fastest X86 AMD 1.5ghz it took 15 hours. (and it was almost impossible to have XP not bluescreen for 15 hours under full load). Since 2002 Apple accelerate OSX with Quarz Extreme. Both these techniques are now used in iOS with SIMD acceleration and GPU acceleration. Its much more elegant then the brute force X86 approach. Integrated makes it possible to use slower, cheaper and more efficient designs.
    Reply
  • shompa - Friday, March 30, 2012 - link

    The A5X SoC is a disaster. Its a desperation SoC that had to be implemented when TSMC 28nm process slipped almost 2 years. That is the reason why Apple did not tape out a 32nm A5X on Samsung. PA Semi had to crank out a new tapeout fast with existing assets. So they took the A5 and added 2 more graphics core.

    The real A6 SoC is probably ready since long back, but TSMC cant deliver enough wafers. The rumored tapeout for A6 was mid 2011. Apple got test wafers from TSMC in june and another batch of test wafers in october. Still at this point Apple believed they would use TSMC for Ipad3.

    ARM is about small, cheap and low power SoCs. That is the future of computing. The A5X is larger then many X86 chips. Technically Intel manufactures many of its CPUs cheaper then Apple manufactures the A5X SoC. That is insane.
    Reply
  • stimudent - Friday, March 30, 2012 - link

    Products reviews are fun to look at, but where there's a bright side, there is always a dark side. Maybe product scoring should also reflect how a manufacturer treats its employees. Reply
  • name99 - Friday, March 30, 2012 - link

    You mean offers them a better wage than they could find in the rest of China, and living conditions substantially superior to anywhere else they could work?
    Yes, by all means let's use that scoring.

    Or perhaps you'd like to continue to live your Mike Daisey dystopia because god-forbid that the world doesn't conform to your expectations?
    Reply
  • Craig234 - Friday, March 30, 2012 - link

    I'm all for including 'how a company treats its employees' and other social issues; but I'd list them separately, not put them in a product rating. Reply

Log in

Don't have an account? Sign up now