Earlier this morning we published our first impressions on Apple's iPad 2, including analysis on camera quality and a dive into the architecture behind Apple's A5 SoC. Our SoC investigation mostly focused on CPU performance, which we found to be a healthy 50% faster than the A4 in the original iPad - at least in web browsing. We were able to exceed Apple's claim of up to 2x performance increase in some synthetic tests, but even a 50% increase in javascript and web page loading performance isn't anything to be upset about. We briefly touched on the GPU: Imagination Technologies' PowerVR SGX 543MP2. Here Apple is promising up to a 9x increase in performance, but it's something we wanted to investigate.

Architecturally the 543MP2 has more than twice the compute horsepower of the SGX 535 used in Apple's A4. Each shader pipeline can execute twice the number of instructions per clock as the SGX 535, and then there are four times as many pipes in an SGX 543MP2 as there are in a 535. There are also efficiency improvements as well. Hidden surface removal works at twice the rate in the 543MP2 as it did in the 535. There's also a big boost in texture filtering performance as you'll see below.

As always we turn to GLBenchmark 2.0, a benchmark crafted by a bunch of developers who either have or had experience doing development work for some of the big dev houses in the industry. We'll start with some of the synthetics.

Over the course of PC gaming evolution we noticed a significant increase in geometry complexity. We'll likely see a similar evolution with games in the ultra mobile space, and as a result this next round of ultra mobile GPUs will seriously ramp up geometry performance.

Here we look at two different geometry tests amounting to the (almost) best and worst case triangle throughput measured by GLBenchmark 2.0. First we have the best case scenario - a textured triangle:

Geometry Throughput - Textured Triangle Test

The original iPad could manage 8.7 million triangles per second in this test. The iPad 2? 29 million. An increase of over 3x. Developers with existing titles on the iPad could conceivably triple geometry complexity with no impact on performance on the iPad 2.

Now for the more complex case - a fragment lit triangle test:

Geometry Throughput - Fragment Lit Triangle Test

The performance gap widens. While the PowerVR SGX 535 in the A4 could barely break 4 million triangles per second in this test, the PowerVR SGX 543MP2 in the A5 manages just under 20 million. There's just no competition here.

I mentioned an improvement in texturing performance earlier. The GLBenchmark texture fetch test puts numbers to that statement:

Fill Rate - Texture Fetch

We're talking about nearly a 5x increase in texture fetch performance. This has to be due to more than an increase in the amount of texturing hardware. An improvement in throughput? Increase in memory bandwidth? It's tough to say without knowing more at this point.

Apple iPad vs. iPad 2
  Apple iPad (PowerVR SGX 535) Apple iPad 2 (PowerVR SGX 543MP2)
Array test - uniform array access
3412.4 kVertex/s
3864.0 kVertex/s
Branching test - balanced
2002.2 kShaders/s
11412.4 kShaders/s
Branching test - fragment weighted
5784.3 kFragments/s
Branching test - vertex weighted
3905.9 kVertex/s
3870.6 kVertex/s
Common test - balanced
1025.3 kShaders/s
4092.5 kShaders/s
Common test - fragment weighted
1603.7 kFragments/s
3708.2 kFragments/s
Common test - vertex weighted
1516.6 kVertex/s
3714.0 kVertex/s
Geometric test - balanced
1276.2 kShaders/s
6238.4 kShaders/s
Geometric test - fragment weighted
2000.6 kFragments/s
6382.0 kFragments/s
Geometric test - vertex weighted
1921.5 kVertex/s
3780.9 kVertex/s
Exponential test - balanced
2013.2 kShaders/s
11758.0 kShaders/s
Exponential test - fragment weighted
3632.3 kFragments/s
11151.8 kFragments/s
Exponential test - vertex weighted
3118.1 kVertex/s
3634.1 kVertex/s
Fill test - texture fetch
179116.2 kTexels/s
890077.6 kTexels/s
For loop test - balanced
1295.1 kShaders/s
3719.1 kShaders/s
For loop test - fragment weighted
1777.3 kFragments/s
6182.8 kFragments/s
For loop test - vertex weighted
1418.3 kVertex/s
3813.5 kVertex/s
Triangle test - textured
8691.5 kTriangles/s
29019.9 kTriangles/s
Triangle test - textured, fragment lit
4084.9 kTriangles/s
19695.8 kTriangles/s
Triangle test - textured, vertex lit
6912.4 kTriangles/s
20907.1 kTriangles/s
Triangle test - white
9621.7 kTriangles/s
29771.1 kTriangles/s
Trigonometric test - balanced
1292.6 kShaders/s
3249.9 kShaders/s
Trigonometric test - fragment weighted
1103.9 kFragments/s
3502.5 kFragments/s
Trigonometric test - vertex weighted
1018.8 kVertex/s
3091.7 kVertex/s
Swapbuffer Speed

Enough with the synthetics - how much of an improvement does all of this yield in the actual GLBenchmark 2.0 game tests? Oh it's big.

GLBenchmark 2.0 Egypt & PRO Performance
Comments Locked


View All Comments

  • Azethoth - Saturday, March 12, 2011 - link

    I think you misspelled "will scale poorly". The worst would be battery life if all 4 cores were actually doing work. Still, it would be interesting to see just how well or poorly it scales irl.
  • tipoo - Sunday, March 13, 2011 - link

    Why would it scale poorly? The 543 is built to be modular, its supposed to scale better than any other GPU solution.
  • ncb1010 - Sunday, March 13, 2011 - link

    Imagination Technologies says that the GPU scales at an efficiency of 95% when you add more cores. That is likely the rate of fallof from theoretical 2x performance when you go from 1 to 2 cores and from 2 cores to 4 cores.
  • jalexoid - Saturday, March 12, 2011 - link

    Ahem.... Actually GPUs scale almost linearly.
  • somata - Sunday, March 13, 2011 - link

    As impressive as the performance of modern low-power GPUs is, it helps to put things in perspective:

    Tegra 2 - 4.8 GFLOPS (8, 1-way ALUs @ ~300MHz)
    PowerVR SGX543MP2 - 19.2 GFLOPS (8, 4-way ALUs @ ~300MHz??)
    Radeon 9700 Pro - 33.8 GFLOPS (8, 4-way ALUs (pixel) + 4, 5-way ALUs (vertex) @ 325MHz)
    Radeon 2400 Pro - 42 GFLOPS (8, 5-way ALUs @ 525 MHz)
    Radeon 5450 - 104 GFLOPS (16, 5-way ALUs @ 650MHz)
    Xenos (Xbox 360) - 240 GFLOPS (48, 5-way ALUs @ 500MHz)
    RSX (PS3) - 255.2 GFLOPS (24, 2 x 4-way ALUs (pixel) + 8, 5-way ALUs (vertex) @ 550MHz)
    Radeon 6970 - 2703.4 GFLOPS (384, 4-way ALUs @ 880MHz)

    Granted, this only compares theoretical peak shader performance, and doesn't take into account the better ALU utilization of modern designs, but it should roughly correlate with general performance on modern workloads. Note that the iPad's GPU is just starting to approach Radeon 9700 (circa late-2002) levels of performance. It's impressive given the power-profile, but still nowhere near the performance of the 5-year-old consoles, and quite a bit lower than even a very low-end Radeon 2400 Pro from 2007.

    The MP4 however, might come close to the Radeon 2400, depending on clocks. Once the next generation of consoles launch (hopefully next year, we'll see at E3) and game graphics likewise catch up to what modern high-end GPUs are capable of, the low-power GPUs will once again be put in their place for a number of years.
  • Juzcallmeneo - Sunday, March 13, 2011 - link

    I'm almost positive I saw the XBox GPU scoring like twice the PS3 GPU somewhere..where did you get all these stats? I know the PS3 is capable of more graphics due to it's strange, but amazing CPU..but when comparing only GPU's the XBox's should be stronger.
  • tipoo - Sunday, March 13, 2011 - link

    "but when comparing only GPU's the XBox's should be stronger."

    Yeah, it probably is. The 360 has a unified shader architecture, so it automatically splits up tasks depending on the workload. The PS3's GPU has fixed function pixel and vertex shaders, meaning all their power might not be used fully at any given point.
  • somata - Sunday, March 13, 2011 - link

    I tried to show how I derived all the figures, but perhaps you're questioning the "2 x 4-way ALUs" part that gives the RSX the edge in this comparison. Recall that RSX is based on G70, which had 24 "shader pipelines" and each pipeline had *two* 4-way FMADD units, for a total of 48. The caveat is that the instructions for both units have to come from the same instruction stream, a restriction not shared by Xenos or any other modern ALU organization.

    So yeah, the RSX figure is the most optimistic of the bunch. Xenos no doubt sees much better utilization, due to not sharing the above restriction, having unified shaders, and having much better branch performance. In practice I'd say the Xbox 360's GPU has superior shader performance compared to RSX, but RSX does have a bit of a texturing advantage (24 vs 16 TMUs).
  • coredump27 - Sunday, March 13, 2011 - link

    "Once the next generation of consoles launch (hopefully next year, we'll see at E3) and game graphics likewise catch up to what modern high-end GPUs are capable of, the low-power GPUs will once again be put in their place for a number of years."

    Dream on!

    ST Ericsson have announced their Nova A9600 mobile SoC using Imagination Technologies next gen PowerVR Series 6. Sampling later this year the GPU delivers in excess of 210 GFLOPS.

    Source - http://www.stericsson.com/press_releases/NovaThor....

    "The Nova A9600, built in 28nm, will deliver groundbreaking multimedia and graphics performance, featuring a dual-core ARM Cortex- A15-based processor running up to up to 2.5 GHz breaking the 20k DMIPS barrier, and a POWERVR Rogue GPU that delivers in excess of 210 GFLOPS. The graphics performance of the A9600 will exceed 350 million ‘real’ polygons per second and more than 5 gigapixels per second visible fill rate (which given POWERVR’s deferred rendering architecture results in more than 13 gigapixels per second effective fill rate). Thanks to Rogue Nova will support all existing APIs such as Microsoft DirectX. The Nova A9600 is sampling in 2011."
  • somata - Sunday, March 13, 2011 - link

    What's to dream on about? Game consoles being released no later than next year (which may be dreaming given the current climate) or low-power GPUs not being able to catch high-power GPUs?

    First of all, 210 GFLOPS is still less than 1/10 of current GPU performance, and you can bet the next-gen consoles will have nothing less powerful than the current top-of-the-line. Again, I'm not dismissing Img. Tech's feat of cramming so much performance into such a small power-envelope, but does anyone realistically expect a sub-1W GPU to be able to take on GPUs that can consume 10s of watts built on the same process? You'll always be able to do more with more.

    Lastly, "sampling in 2011" means we'll be lucky to see any shipping devices based on this even next year, especially ones with the "up to" specs mentioned in the PR.

Log in

Don't have an account? Sign up now