GPU Analysis/Performance

Section by Anand Shimpi

Understanding the A6's GPU architecture is a walk in the park compared to what we had to do to get a high level understanding of Swift. The die photos give us a clear indication of the number of GPU cores and the width of the memory interface, while the performance and timing of release fill in the rest of the blanks. Apple has not abandoned driving GPU performance on its smartphones and increased the GPU compute horsepower by 2x. Rather than double up GPU core count, Apple adds a third PowerVR SGX 543 core and runs the three at a higher frequency than in the A5. The result is roughly the same graphics horsepower as the four-core PowerVR SGX 543MP4 in Apple's A5X, but with a smaller die footprint.

As a recap, Imagination Technologies' PowerVR SGX543 GPU core features four USSE2 pipes. Each pipe has a 4-way vector ALU that can crank out 4 multiply-adds per clock, which works out to be 16 MADs per clock or 32 FLOPS. Imagination lets the customer stick multiple 543 cores together, which scales compute performance linearly.

SoC die size however dictates memory interface width, and it's clear that the A6 is significantly smaller in that department than the A5X, which is where we see the only tradeoff in GPU performance: the A6 maintains a 64-bit LPDDR2 interface compared to the 128-bit LPDDR2 interface in the A5X. The tradeoff makes sense given that the A5X has to drive 4.3x the number of pixels that the A6 has to drive in the iPhone 5. At high resolutions, GPU performance quickly becomes memory bandwidth bound. Fortunately for iPhone 5 users, the A6's 64-bit LPDDR2 interface is a good match for the comparatively low 1136 x 640 display resolution. The end result is 3D performance that looks a lot like the new iPad, but in a phone:

Mobile SoC GPU Comparison
  Adreno 225 PowerVR SGX 540 PowerVR SGX 543MP2 PowerVR SGX 543MP3 PowerVR SGX 543MP4 Mali-400 MP4 Tegra 3
SIMD Name - USSE USSE2 USSE2 USSE2 Core Core
# of SIMDs 8 4 8 12 16 4 + 1 12
MADs per SIMD 4 2 4 4 4 4 / 2 1
Total MADs 32 8 32 48 64 18 12
GFLOPS @ 200MHz 12.8 GFLOPS 3.2 GFLOPS 12.8 GFLOPS 19.2 GFLOPS 25.6 GFLOPS 7.2 GFLOPS 4.8 GFLOPS

We ran through the full GLBenchmark 2.5 suite to get a good idea of GPU performance. The results below are largely unchanged from our iPhone 5 Performance Preview, with the addition of the Motorola RAZR i and RAZR M. I also re-ran the iPad results on iOS 6, although I didn't see major changes there.

We'll start out with the raw theoretical numbers beginning with fill rate:

GLBenchmark 2.5 - Fill Test

The iPhone 5 nips at the heels of the 3rd generation iPad here, at 1.65GTexels/s. The performance advantage over the iPhone 4S is more than double, and even the Galaxy S 3 can't come close.

GLBenchmark 2.5 - Fill Test (Offscreen 1080p)

Triangle throughput is similarly strong:

GLBenchmark 2.5 - Triangle Texture Test

Take resolution into account and the iPhone 5 is actually faster than the new iPad, but normalize for resolution using GLBenchmark's offscreen mode and the A5X and A6 look identical:

GLBenchmark 2.5 - Triangle Texture Test (Offscreen 1080p)

The fragment lit texture test does very well on the iPhone 5, once again when you take into account the much lower resolution of the 5's display performance is significantly better than on the iPad:

GLBenchmark 2.5 - Triangle Texture Test - Fragment Lit

GLBenchmark 2.5 - Triangle Texture Test - Fragment Lit (Offscreen 1080p)

GLBenchmark 2.5 - Triangle Texture Test - Vertex Lit

GLBenchmark 2.5 - Triangle Texture Test - Vertex Lit (Offscreen 1080p)

The next set of results are the gameplay simulation tests, which attempt to give you an idea of what game performance based on Kishonti's engine would look like. These tests tend to be compute monsters, so they'll make a great stress test for the iPhone 5's new GPU:

GLBenchmark 2.5 - Egypt HD

Egypt HD was the great equalizer when we first met it, but the iPhone 5 does very well here. The biggest surprise however is just how well the Qualcomm Snapdragon S4 Pro with Adreno 320 GPU does by comparison. LG's Optimus G, a device Brian flew to Seoul, South Korea to benchmark, is hot on the heels of the new iPhone.

GLBenchmark 2.5 - Egypt HD (Offscreen 1080p)

When we run everything at 1080p the iPhone 5 looks a lot like the new iPad, and is about 2x the performance of the Galaxy S 3. Here, LG's Optimus G actually outperforms the iPhone 5! It looks like Qualcomm's Adreno 320 is quite competent in a phone. Note just how bad Intel's Atom Z2460 is, the PowerVR SGX 540 is simply unacceptable for a modern high-end SoC. I hope Intel's slow warming up to integrating fast GPUs on die doesn't plague its mobile SoC lineup for much longer.

GLBenchmark 2.5 - Egypt Classic

The Egypt classic tests are much lighter workloads and are likely a good indication of the type of performance you can expect from many games today available on the app store. At its native resolution, the iPhone 5 has no problems hitting the 60 fps vsync limit.

GLBenchmark 2.5 - Egypt Classic (Offscreen 1080p)

Remove vsync, render at 1080p and you see what the GPUs can really do. Here the iPhone 5 pulls ahead of the Adreno 320 based LG Optimus G and even slightly ahead of the new iPad.

Once again, looking at GLBenchmark's on-screen and offscreen Egypt tests we can get a good idea of how the iPhone 5 measures up to Apple's claims of 2x the GPU performance of the iPhone 4S:

Removing the clearly vsync limited result from the on-screen Egypt Classic test, the iPhone 5 performs about 2.26x the speed of the 4S. If we include that result in the average you're still looking at a 1.95x average. As we've seen in the past, these gains don't typically translate into dramatically higher frame rates in games, but games with better visual quality instead.

General Purpose Performance Increased Dynamic Range: Understanding the Power Profile of Modern SoCs
Comments Locked

276 Comments

View All Comments

  • themossie - Tuesday, October 16, 2012 - link

    The manufacturer's charger uses a set of pull-up resistors connected between the various USB lines, to indicate that the phone can pull maximum current. Unfortunately, every manufacturer (and sometimes different phones) use different resistances.

    See http://electronics.stackexchange.com/questions/144... for a brief writeup.

    For what it's worth, I've only had this problem with iDevices and the HP Touchpad. I own circa-2011+ HTC, Motorola and Samsung phones, and they all work fine with every charger. My Droid 2 Global was my primary work phone until a few months ago, and works great with every charger. Not sure why your wife is having problems there.
  • crankerchick - Tuesday, October 16, 2012 - link

    "The non-LTE phones see a sharp drop in battery life. At least at 28nm the slower air interfaces simply have to remain active (and drawing power) for longer, which results in measurably worse battery life. Again, the thing to be careful of here is there's usually a correlation between network speed and how aggressive you use the device. With a workload that scaled with network speed you might see closer numbers between 3G and 4G LTE."

    Perhaps you all could devise a test for this? Something like, change your LTE and 3G tests, where you decrease the time between page loads for the LTE test, to simulate doing more browsing since the pages load faster? One data point on this, with a reasonably selected change in page load duration, would be very helpful now that we have this very interesting dynamic clearly visible.

    That said, as always, I appreciate the reviews presented here. Always thorough with lots of information to chew on beyond specs and "user opinion on user experience."

    Just wish the reviews didn't take so long, but they are always worth it in the end.
  • TofDriver - Tuesday, October 16, 2012 - link

    Thanks for this awesome article. Gigantic work, we'll worth the wait.
    I've learnt so much.
    Would still appreciate it as an ebook, even after the web reading!
    Seems like you're perfectionists who love to push limits... To me it does resonate with the team who designed the reviewed product.
  • name99 - Tuesday, October 16, 2012 - link

    "Another potential explanation is that the 3-wide front end allowed for better utilization of the existing two ALUs, although it's also unlikely that we see better than perfect scaling simply due to the addition of an extra decoder."

    Remember the standard numbers. On this type of integer code:
    1/6 instructions are branch
    1/6 instructions are store
    1/3 instructions are load
    1/3 instructions are ALU
    This means the usual first throttling point i cache access, if you can only do one load/store cycle.
    If you limit your cache to one op/cycle, it's generally not worth going beyond 2-wide --- too often you're waiting on the cache.
    Once you widen your cache (usually, at this stage, by allowing simultaneous read and write per cycle) three-wide makes sense.
    Each cycle now (on ideal and some sort of "average" idealized code) you can now do some sort of combination of half a branch, 1.5 load/stores, and 1 ALU. Meaning that 2 ALUs (as long as they are not overloaded and also handling some aspect of the load/store) is enough for now.
    [Of course things never work out quite this ideal --- you have burstiness in operation types, not to mention delays. But the compiler should try to schedule instructions to get this sort of average, and likewise the re-order queues will do what they can to shuffle things to this sort of average. 2ALUs helps with the bursts, 3ALUs is overkill.]

    So I would say the primary important change made to go to three-wide in a way that is not a waste of time was to convert the L1 cache to dual-ported, supporting simultaneous load & store per cycle.
  • jiffylube1024 - Tuesday, October 16, 2012 - link

    I have to commend the Anandtech team for the great review! It was a long wait, but well worth it. The info on anodizing, the "Swift" CPU @ 1.3 GHz, camera performance, etc. was worth waiting for. This article, in my eyes, is a culmination of the Anandtech team's knowledge in the tech industry - deconstructing A6 to figure out what it's made of, discussing Apple's manufacturing capabilities, etc. Very informative and well written!

    I am always amazed at how many complaints (and petty platform wars) get exposed on the comment board. I certainly appreciate them when an article is poorly written, contains false information or outright lies, but with an article like this, the comments section seems shy of the effusive praise it deserves!
    ------

    On a slight tangent, I've enjoyed the first 8 Anandtech podcasts as well, and I have to say that I look forward to more non-iPhone related disucssion on future podcasts. The information was much appreciated, but for a tech site as broad as Anandtech, the first 8 podcasts have been VERY iPhone heavy in their content! Keep up the good work.
  • jamyryals - Thursday, October 18, 2012 - link

    I think you're right, it has been iPhone heavy, but the start of the podcast kind of lined up with the launch/review process. Let's be honest, it is a huge selling high quality device and it's treated as such. I have a feeling Brian and Anand will have a lot to say about all of the impending Nexii/WP8 when they come out this quarter.
  • krumme - Tuesday, October 16, 2012 - link

    Good to see reviewers apreciation of low light capabilities for the BSI sensor, reflecting real world usage. Oposite to a lot of uninformed stupid reviews on the net.

    Its exactly the same practical difference between s2 and s3 cameras. Big difference for real usage.

    All the mpix race must stop now. 8M is way to much for the quality anyway.
  • Zanegray - Tuesday, October 16, 2012 - link

    I love the level of analysis and attention to detail. Keep it up!
  • mrdude - Tuesday, October 16, 2012 - link

    Wow, what an article. Really fantastic read. The lengths you guys have gone to in this review is stunning, frankly. Well done. Although I'm no Apple fanatic, I must say that this is one of the better articles I've read on AT :)
  • Dennis Travis - Tuesday, October 16, 2012 - link

    Totally outstanding review. You guys covered everything. Thanks so much!

Log in

Don't have an account? Sign up now