The GPU: Apple's Gift to Game Developers

The GPU side of the A5 is really what's most exciting. As we mentioned in our iPad 2 GPU Performance analysis, the A5 includes a dual-core PowerVR SGX 543 - also known as the SGX 543MP2. In our earlier article we showed the SGX 543MP2 easily beating both an iPad 1 and the Tegra 2 based Motorola Xoom.

To understand why the SGX 543MP2 has such a performance advantage we need to first remember that NVIDIA's Tegra 2 is nearly a year late. NVIDIA's first competitive ultra mobile GPU was supposed to be shipping in products in the first half of 2010, instead it found itself shipping in 2011. While NVIDIA is good at designing GPUs, it's not good enough that it can release a product and maintain a two year performance advantage over the competition. Let's look at the architecture, shall we?

NVIDIA's Tegra 2 features a DirectX 9-class GPU. NVIDIA used to call it the GeForce ULP (Ultra Low Power) but now it's just GeForce. As a DX9 class GPU we're dealing with a conventional, non-unified shader architecture. While all OpenGL ES 2.0 GPUs can execute pixel and vertex shader instructions, the GeForce in Tegra 2 runs pixel and vertex shaders on separate groups of hardware.

NVIDIA calls each pixel and vertex shader ALU a core. The Tegra 2 has four pixel shader cores and four vertex shader cores. The four pixel shader ALUs make up a single Vec4 and the same goes for the four vertex shader ALUs. NVIDIA wouldn't elaborate on what limitations exist when dispatching operations to the cores. All pixel shader operations happen at 20-bits per component precision while all vertex shader operations happen at 32-bits per component.

Each core is capable of executing one multiply+add (MAD) operation per clock. Do the math and that works out to be a peak rate of 8 MADs per clock for the entire GPU. The maximum operating frequency for the Tegra 2 GeForce GPU is 300MHz, however device vendors may run the GPU at a lower frequency to save on power. At 300MHz this works out to be 4.8 GFLOPS (counting a MAD as two FLOPs).

Imagination Technologies' PowerVR SGX 543MP2 is fundamentally a bigger GPU than the GeForce in NVIDIA's Tegra 2. Let's go through the math.

The SGX 543 features four USSE2 pipes. This is a unified shader architecture so both vertex and pixel shader code runs on the same set of hardware. The benefit of this approach is you get better performance in peaky situations where you're running a lot of vertex or pixel shader code and not a balance that's perfectly tailored to your architecture. The Tegra 2 will only run at peak efficiency if it encounters a mix of 50% vertex and 50% pixel shader code. The PowerVR SGX series will never have any of its execution pipes idle regardless of the instruction mix.

Each USSE2 pipe has a 4-wide vector ALU capable of cranking out 4 MADs per clock. Two of these pipes is enough to equal the peak throughput of what NVIDIA built in Tegra 2, but the PowerVR SGX 543 has four of them. As for the MP2? Go ahead and double that number again. The SGX 543MP2 is simply two 543s placed next to one another.

All of this works out to be 16 MADs per clock for the SGX 543 and 32 MADs per clock for the SGX 543MP2. At 200MHz that's 12.8GFLOPS and at 250MHz we're talking about 16 GFLOPS.

Mobile SoC GPU Comparison
  PowerVR SGX 530 PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE USSE2 USSE2 Core Core
# of SIMDs 2 2 4 4 8 8 12
MADs per SIMD 2 2 2 4 4 1 ?
Total MADs 4 4 8 16 32 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 4.8 GFLOPS ?

At its lowest expected clock speed, the 543MP2 already has over twice the compute power of the Tegra 2's GPU at its highest operating frequency. Take into account the fact that the A5 likely has more memory bandwidth than Tegra 2 and the SGX 543MP2 is a tile based architecture with lower bandwidth requirements and the performance numbers we talked about last time shouldn't be all that surprising.

The real competition for the SGX 543MP2 will be NVIDIA's Kal-El. That part is expected to ship on time and will feature a boost in core count: from 8 to 12. The ratio of pixel to vertex shader cores is not known at this point but I'm guessing it won't be balanced anymore. NVIDIA is promising 3x the GPU performance out of Kal-El so I suspect that we'll see an increase in throughput per core.

GPU Performance

Taken from our iPad 2 GPU Performance Preview:

As always we turn to GLBenchmark 2.0, a benchmark crafted by a bunch of developers who either have or had experience doing development work for some of the big dev houses in the industry. We'll start with some of the synthetics.

Over the course of PC gaming evolution we noticed a significant increase in geometry complexity. We'll likely see a similar evolution with games in the ultra mobile space, and as a result this next round of ultra mobile GPUs will seriously ramp up geometry performance.

Here we look at two different geometry tests amounting to the (almost) best and worst case triangle throughput measured by GLBenchmark 2.0. First we have the best case scenario - a textured triangle:

Geometry Throughput - Textured Triangle Test

The original iPad could manage 8.7 million triangles per second in this test. The iPad 2? 29 million. An increase of over 3x. Developers with existing titles on the iPad could conceivably triple geometry complexity with no impact on performance on the iPad 2.

Now for the more complex case - a fragment lit triangle test:

Geometry Throughput - Fragment Lit Triangle Test

The performance gap widens. While the PowerVR SGX 535 in the A4 could barely break 4 million triangles per second in this test, the PowerVR SGX 543MP2 in the A5 manages just under 20 million. There's just no competition here.

I mentioned an improvement in texturing performance earlier. The GLBenchmark texture fetch test puts numbers to that statement:

Fill Rate - Texture Fetch

We're talking about nearly a 5x increase in texture fetch performance. This has to be due to more than an increase in the amount of texturing hardware. An improvement in throughput? Increase in memory bandwidth? It's tough to say without knowing more at this point.

Apple iPad vs. iPad 2
  Apple iPad (PowerVR SGX 535) Apple iPad 2 (PowerVR SGX 543MP2)
Array test - uniform array access
3412.4 kVertex/s
3864.0 kVertex/s
Branching test - balanced
2002.2 kShaders/s
11412.4 kShaders/s
Branching test - fragment weighted
5784.3 kFragments/s
22402.6kFragments/s
Branching test - vertex weighted
3905.9 kVertex/s
3870.6 kVertex/s
Common test - balanced
1025.3 kShaders/s
4092.5 kShaders/s
Common test - fragment weighted
1603.7 kFragments/s
3708.2 kFragments/s
Common test - vertex weighted
1516.6 kVertex/s
3714.0 kVertex/s
Geometric test - balanced
1276.2 kShaders/s
6238.4 kShaders/s
Geometric test - fragment weighted
2000.6 kFragments/s
6382.0 kFragments/s
Geometric test - vertex weighted
1921.5 kVertex/s
3780.9 kVertex/s
Exponential test - balanced
2013.2 kShaders/s
11758.0 kShaders/s
Exponential test - fragment weighted
3632.3 kFragments/s
11151.8 kFragments/s
Exponential test - vertex weighted
3118.1 kVertex/s
3634.1 kVertex/s
Fill test - texture fetch
179116.2 kTexels/s
890077.6 kTexels/s
For loop test - balanced
1295.1 kShaders/s
3719.1 kShaders/s
For loop test - fragment weighted
1777.3 kFragments/s
6182.8 kFragments/s
For loop test - vertex weighted
1418.3 kVertex/s
3813.5 kVertex/s
Triangle test - textured
8691.5 kTriangles/s
29019.9 kTriangles/s
Triangle test - textured, fragment lit
4084.9 kTriangles/s
19695.8 kTriangles/s
Triangle test - textured, vertex lit
6912.4 kTriangles/s
20907.1 kTriangles/s
Triangle test - white
9621.7 kTriangles/s
29771.1 kTriangles/s
Trigonometric test - balanced
1292.6 kShaders/s
3249.9 kShaders/s
Trigonometric test - fragment weighted
1103.9 kFragments/s
3502.5 kFragments/s
Trigonometric test - vertex weighted
1018.8 kVertex/s
3091.7 kVertex/s
Swapbuffer Speed
600
599

Enough with the synthetics - how much of an improvement does all of this yield in the actual GLBenchmark 2.0 game tests? Oh it's big.

GLBenchmark 2.0 Egypt

Without AA, the Egypt test runs at 5.4x the frame rate of the original iPad. It's even 3.7x the speed of the Tegra 2 in the Xoom running at 1280 x 800 (granted that's an iOS vs. Android comparison as well).

GLBenchmark 2.0 Egypt - FSAA

With AA enabled the iPad 2 advantage grows to 7x. In a game with the complexity of the Egypt test the original iPad wouldn't be remotely playable while the iPad 2 could run it smoothly.

The Pro test is a little more reasonable, showing a 3 - 4x increase in performance compared to the original iPad:

GLBenchmark 2.0 PRO

GLBenchmark 2.0 PRO - FSAA

While we weren't able to reach the 9x figure claimed by Apple (I'm not sure that you'll ever see 9x running real game code), a range of 3 - 7x in GLBenchmark 2.0 is more reasonable. In practice I'd expect something less than 5x but that's nothing to complain about.

The Right SoC at the Right Time: Apple's A5 Battery Life
POST A COMMENT

189 Comments

View All Comments

  • name99 - Sunday, March 20, 2011 - link

    "you cant be a very tech inclined person if if you think you are, if you dont know that 1.2 GHz quad core arm cortex is coming later this year and so most tech people are waiting on that to happen"

    Really? You're going to buy that crappy 1.2GHz quad core A9? You're not going to wait the even better 1.8GHz quad core A15 that will be available in late 2012? Sucker!

    Personally I think that if you buy now, before the 802.11s wireless spec is standardized, and before the chipsets support OpenGL 6, you're just throwing your money away. But I tell you, come 2020, that's going to be one SWEET rig that I finally get round to buying.
    Reply
  • CZroe - Sunday, March 20, 2011 - link

    "Just to test it out, I shot a series of videos of my car and stitched them together using iMovie, then added some titles and a soundtrack."

    I found iMovie completely useless on my iPhone 4 and iPhone 3GS because I could not combine two clips/videos nor could I make a runing commentary with titles.

    Are you sure that the iPad 2 version can do this or were all the "videos" in the "series" made from the same longer video?
    Reply
  • CZroe - Sunday, March 20, 2011 - link

    "Lately Apple has been trying its hand at first party case solutions. It stated with the bumper on the iPhone 4, carried over to the original iPad, and continues now with the iPad 2."
    When you fix that typo ("stated" instead of "started"), you may also want to correct that fact about what came first.

    The iPad launched before the iPhone 4 so the official iPad case launched before the iPhone 4 bumper case, unless I somehow missed it and the official iPad case came out mid-life for the iPad.
    Reply
  • darwiniandude - Sunday, March 20, 2011 - link

    pja: The 64gb 3G version was at most $1049 AUD rrp, before the price drop, the 64gb WiFi one was $899 AUD rrp. The 64gb WiFi was never $1100 AUD unless you were looking at eBay pricing while stock was scarce. Anyway as this article states, the iPad, provided it does what you require, is a great combination of battery life, weight and size. Tablets certainly aren't for everyone though.

    Deepcover96: Agreed. Hopefully this changes later and I'm sure it will, but for the moment Android has a poor selection of AAA titles. Nothing like Garageband or iMovie, but certainly nothing like Infinity Blade, Nanostudio, Beatmaker 2, World of Goo etc. I'm sure Gameloft and EA will eventually do more, provided they can monitize ok on Android. And for the limitations of iOS apps, I wouldn't be able to have an iPad as my only portable device if it were not for Pages/Keynote/Numbers/TouchDraw/Photogene and so on.

    CZroe: iMovie for iPhone (last year even) could do what you ask after the first update. This year it's greatly improved. A downside to this app and other Apple apps can be a lack of well known gestures. People don't know in Pages that if you hold your finger on an object, swiping with another finger moves it by one pixel, swipe with two moves it by 5 pixels, and so on. Likewise in iMovie, you swipe down through footage like you were cutting it at the playhead to make a cut. Each cut is a faultless transition, but then you can title each cut area separately. So you cut where you want the text to change, and label accordingly. In the new iMovie (only used on iPhone 4 as I sold 1st gen iPad whilst waiting for iPad2) when you import video there are standard iOS movie trim handles over the clip, you only need import the bits you want from each clip. But you could definitely always import more videos into one project in the last version. I think Apple need a modal help "Would you like to watch a short video about iMovie?" dialog or something on the first few launches with a website link, all these apps have their features tucked away so people often think they're less powerful than they are. I'm not sure Apple is choosing the best ratio of controls to expose to the user here. And yes, iPad case came out before iPhone 4, definitely.
    Reply
  • kschaffner - Sunday, March 20, 2011 - link

    An awesome free web browser for the iPad is Terra, it gives you tabs, has an incognito mode. etc I would definitely check it out. Reply
  • darwiniandude - Monday, March 21, 2011 - link

    Thanks, I'll check it out. I only use iCab as I bought it for iPhone, it got a universal update and I've been happy enough not to bother looking elsewhere. (it does have a 'privacy' mode) also caching of pages for when you're offline. Anyway, I've downloaded Terra and will play with it on the new iPad. It looks nice.
    Ha, there's a Terra Incognito HD game, lol
    Reply
  • medi01 - Monday, March 21, 2011 - link

    Looking at the rounded back of ipads, ipad2 in particular, it's hard to understand, why the newer version is easier to hold.

    With rounded surface, they both should be harder to hold, and ip2 in particular.
    Reply
  • darwiniandude - Monday, March 21, 2011 - link

    The original had flat sides, probably about 4 or 5mm, and a giant convex back, domed in the centre. The new one is thinner, has no flat sides (the curve just falls away from the front) but it's more of a bevelled edge, and once you're about 1cm in from the edges the back is perfectly flat.

    Is it easier to hold? Dunno, haven't got mine yet :) But that's what people are saying.
    Reply
  • thebeastie - Monday, March 21, 2011 - link

    Everyday I use my Ipad even when I don't think about it.
    I use it as my wake up Radio clock via TuneIn Radio app. This app is great as I can go to sleep with the timer and then wake up to Internet radio which beats the hell out of analog radio. I been looking at a digital radio for a while but there is no reason now for me in the world to do that, and digital radios aren't cheap, it is just another device the Ipad as replaced perfectly with much better screen interface, and life time of free updates as app software evolves.

    I think the Anandtech authors here saying that they found them selfs not using their original Ipad1 after a while didn't adapt their imaginations enough of where it can be used, maybe it is something to do with age and being hardwired into their life styles, dare I say it but becoming 'old school'.
    I am wondering how they wake up in the morning, I find it hard to believe there is a better way to wake up in the morning then from an Ipad radio app, if it is about sound quality there are plenty of speaker options.

    For people who don't get it then I say you just don't see things the same way, I would rather shove a pine cone up my backside then wait more then 2 seconds to be able to look at my email. A laptop takes ages to boot up let a lone the loading of the email client.

    The main reason I got an Ipad was because I LOVE to read the paper outside, but the wind blowing the paper around drives me nuts, the Ipad is a killer in this regard.
    Reply
  • damianrobertjones - Monday, March 21, 2011 - link

    I have an Asus EP121, 4Gb ram, SSD drive, etc. It takes 20 seconds to start from cold onto the desktop. Anotgher 2 seconds to pen my email application.

    Is that fast enough?

    from sleep, we're talking seconds
    Reply

Log in

Don't have an account? Sign up now