GPU Performance Using Unreal Engine 3

In our iPad 2 review I called the PowerVR SGX 543MP2 Apple's gift to game developers. Apple boasted a roughly 9x improvement in raw GPU compute power over the A4 into the A5. The increase came through more execution resources and a higher GPU clock. The A5 in the iPhone 4S gets the same GPU, simply clocked lower than the iPad 2 version. Apple claims the iPhone 4S can deliver up to 7x the GPU performance of the iPhone 4, down from 9x in the iPad 2 vs. iPad 1 comparison. Why the delta?

The iPad 2 has both a larger battery and a higher resolution display. There are 28% more pixels to deal with on the iPad 2 vs the iPhone 4S and 9x vs 7x actually works out to be a 28% increase. The lower clocked GPU goes along with the lower clocked CPU in the 4S' version of the A5 to keep power consumption in check and because the platform doesn't need the performance as much as the iPad 2 with its higher resolution display.

Mobile SoC GPU Comparison
  Adreno 225 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name - USSE USSE2 USSE2 Core Core Core
# of SIMDs 8 4 4 8 4 + 1 8 12
MADs per SIMD 4 2 4 4 4 / 2 1 ?
Total MADs 32 8 16 32 18 8 ?
GFLOPS @ 200MHz 12.8 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 19.2  GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

GLBenchmark continues to be our go-to guy for GPU performance under iOS. While there are other reputable 3D benchmarks, GLBench remains the only good cross-platform (iOS and Android) solution we have today.

The performance gains live up to Apple's expectations (Update: our original 4S for Egypt/Pro were incorrect. We had two sets of graphs, one internal and one external - the latter had incorrect data. We have since updated the charts to reflect the 4S' actual performance. Sorry for the mixup!):

GLBenchmark 2.1 - Egypt - Offscreen

GLBenchmark 2.1 - Pro - Offscreen

GLBenchmark gets around vsync by rendering offscreen, so the 4S is allowed to run as fast as it can. Here we see a 6.46x higher frame rate compared to the iPhone 4.

It's obvious that GLBenchmark is designed first and foremost to be bound by shader performance rather than memory bandwidth, otherwise all of these performance increases would be capped at 2x since that's the improvement in memory bandwidth from the 4 to the 4S. Note that we're clearly not overly bound by memory bandwidth in these tests if we scale pixel count by 50%, which is hardly realistic. Most games won't be shader bound, instead they should be more limited by memory bandwidth.

At the iPhone 4S introduction Epic was on stage showing off Infinity Blade 2, which will have new visual enhancements only present on the 4S thanks to its faster GPU. Thus far Epic has been using GPU performance improvements to make its games look better and not necessarily run faster (although they do) since the target is playability on all platforms. What I wanted however was a true apples-to-apples comparison using Epic's engine as it is arguably the best looking platform to develop iOS games on today.

Epic offers a free license to Unreal Engine 3 to anyone who wants to use it for non-commercial use. If you want to sell your UE3 based iOS game, you don't have to pay a large sum to license Epic's engine up front. Instead you toss Epic $99 and pay royalties (25%) on any revenue beyond the first $50K. It's a great deal for aspiring game developers since you get access to one of the best 3D engines around and don't need any additional startup capital to use it. If your game is a hit Epic gets a cut but you're still making money so all is good in the world.

The process starts with UDK, the Unreal Development Kit. Epic actually offers a great deal of documentation on developing using UDK, making the whole process extremely easy. The freely available UDK can target Windows, Mac OS X and iOS platforms. If you want Android support you'll have to pay to license the dev kit unfortunately. Given how successful Infinity Blade has been under iOS, I suspect this is a move partially designed to keep Apple happy. It's also possible the Android UE3 dev kit is simply not as far along as the iOS version.

Along with every UDK download, Epic now provides the full source code to its well known iOS Citadel demo. With access to Citadel's source code and Epic's excellent (and freely available) development tools I put together a real-world GPU test for iOS.


What's that? A frame counter in iOS? Huzzah!

The test shows us frame rate over the course of a flythrough of Epic's Citadel demo. This is simply the standard Citadel guided tour but with UE3's frame recording statistics enabled. Once again, UDK gave me the tools needed to accurately profile what was going on. For developers this would be helpful in tuning the performance of your app, but for me it gave me the one thing I've been hoping for: average frame rate in a UE3 game for iOS.

The raw data looks like this, a graph of frame render times:


iPhone 4S frame time

You're looking at frame render time in ms, so lower numbers mean better performance. Notice how the iPhone 4S graph seems to remain mostly flat for the majority of the benchmark run? That's because it's limited by vsync. At 60Hz the frame render time is capped to 16.7ms, which is approximately where the 4S' curve flattens out to. The 4S could likely run through this demo even quicker (or maintain the same speed with a heavier graphical workload) if we had a way to disable vsync in iOS.


iPhone 4 frame time

On the iPhone 4 however, frame times are significantly higher - more than 2x on average. You also see significant spikes in frame time, indicating periods where the frame rate drops significantly. Not only does the 4S offer better average performance here but its performance is far more consistent, hugging vsync rather than wildly bouncing around.

The chart below summarizes the two graphs above by looking at the average frames rendered per second throughout the benchmark:

AnandTech UE3 Performance Test

The iPhone 4S averages 2.3x the frame rate of the iPhone 4 throughout our test. I believe this gives us a more realistic value than the 6x we saw in GLBenchmark. A major cause for the difference is the vsync limitations present in all iOS apps that render to the screen. On top of that, while we're obviously not completely limited by memory bandwidth, it's clear that memory bandwidth does play a larger role here than it does in GLBenchmark.

The Citadel demo by default increases rendering quality on the iPhone 4, but a quick look at the game's configuration files didn't show any new features enabled for the 4S. Chances are the version of Citadel included with the UDK was built prior to the 4S being available. In other words, the 4 and 4S should be rendering the same workload in our benchmark. To confirm I also grabbed a couple of screenshots to ensure the two devices were running at the same settings:


iPhone 4


iPhone 4S

This is actually the most stressful scene in the level, it causes even the 4S to drop below 30 fps. With the camera stationary in roughly the same position I saw a 74% increase in performance on the 4S vs the iPhone 4.

Most game developers still target the iPhone 3GS, but the 4S allows them to significantly ramp up image quality without any performance penalty. Because of the lower hardware target for most iOS games and forced vsync I wouldn't expect to see 2x increases in frame rate for the 4S over the 4 in most games out today or in the near future. You can expect a smoother frame rate and better looking games if developers follow Epic's lead and simply enable more eye-candy on the 4S.

The Memory Interface The A6: What's Next?
Comments Locked

199 Comments

View All Comments

  • robco - Monday, October 31, 2011 - link

    I've been using the 4S from launch day and agree that Siri needs some work. That being said, it's pretty good for beta software. I would imagine Apple released it as a bonus for 4S buyers, but also to keep the load on their servers small while they get some real-world data before the final version comes in an update.

    The new camera is great. As for me, I'm glad Apple is resisting the urge to make the screen larger. The Galaxy Nexus looks nice, but the screen will be 4.65". I want a smartphone, not a tablet that makes phone calls. I honestly wouldn't want to carry something much larger than the iPhone and I would imagine I'm not the only one.

    Great review as always.
  • TrackSmart - Monday, October 31, 2011 - link

    I'm torn on screen size myself. Pocketable is nice. But I'm intrigued by the idea of a "mini-tablet" form factor, like the Samsung Galaxy Note with it's 5.3" screen (1280x800 resolution) and almost no bezel. That's HUGE for a phone, but if it replaces a tablet and a phone, and fits my normal pants pockets, it would be an interesting alternative. The pen/stylus is also intriguing. I will be torn between small form factor vs mini-tablet when I make my phone upgrade in the near future.

    To Anand and Brian: I'd love to see a review of the Samsung Galaxy Note. Maybe Samsung can send you a demo unit. It looks like a refined Dell Streak with a super-high resolution display and Wacom digitizer built in. Intriguing.
  • Rick83 - Wednesday, November 2, 2011 - link

    That's why I got an Archos 5 two years ago. And what can I say? It works.

    Sadly the Note is A) three times as expensive as the Archos
    and B) not yet on Android 4

    there's also C) Codec support will suck compared to the Archos, and I'm pretty sure Samsung won't release an open bootloader, like Archos does.

    I'm hoping that Archos will soon release a re-fresh of their smaller size tablets base on OMAP 4 and Android 4.
    Alternatively, and equally as expensive as the Note, is the Sony dual-screen tablet. Looks interesting, but same caveats apply....
  • kylecronin - Monday, October 31, 2011 - link

    > It’s going to be a case by case basis to determine which 4 cases that cover the front of the display work with the 4S.

    Clever
  • metafor - Monday, October 31, 2011 - link

    "Here we have two hypothetical CPUs, one with a max power draw of 1W and another with a max power draw of 1.3W. The 1.3W chip is faster under load but it draws 30% more power. Running this completely made-up workload, the 1.3W chip completes the task in 4 seconds vs. 6 for its lower power predecessor and thus overall power consumed is lower. Another way of quantifying this is to say that in the example above, CPU A does 5.5 Joules of work vs. 6.2J for CPU B."

    The numbers are off. 4 seconds vs 6 seconds isn't 30% faster. Time-to-complete is the inverse of clockspeed.

    Say a task takes 100 cycles. It would take 1 second on a 100Hz, 1 IPC CPU and 0.77 seconds on a 130Hz, 1 IPC CPU. This translates to 4.62 sec if given a task that takes 600 cycles of work (6 sec on the 100Hz, 1 IPC CPU).

    Or 1W * 6s = 6J = 1.3W * 4.62s

    Exactly the same amount of energy used for the task.
  • Anand Lal Shimpi - Monday, October 31, 2011 - link

    Err sorry, I should've clarified. For the energy calculations I was looking at the entire period of time (10 seconds) and assumed CPU A & B have the same 0.05W idle power consumption.

    Doing the math that way you get 1W * 6s + 0.05W * 4s = 6.2J (CPU B)

    and

    1.3W * 4s + 0.05W * 6s = 5.5J (CPU A)
  • metafor - Monday, October 31, 2011 - link

    Erm, that still presents the same problem. That is, a processor running at 130% the clockspeed will not finish in 4 seconds, it will finish in 4.62s.

    So the result is:

    1W * 6s + 0.05W * 4s = 6.2J (CPU B)
    1.3W * 4.62s + 0.05 * 5.38s = 6.275J (CPU A)

    There's some rounding error there. If you use whole numbers, say 200Hz vs 100Hz:

    1W * 10s + 0.05W * 10s = 10.5W (CPU B running for 20s with a task that takes 1000 cycles)

    2W * 5s + 0.05W * 15s = 10.75W (CPU A running for 10s with a task that takes 1000 cycles)
  • Anand Lal Shimpi - Monday, October 31, 2011 - link

    I wasn't comparing clock speeds, you have two separate processors - architectures unknown, 100% hypothetical. One draws 1.3W and completes the task in 4s, the other draws 1W and completes in 6s. For the sake of drawing a parallel to the 4S vs 4 you could assume that both chips run at the same clock. The improvements are entirely architectural, similar to A5 vs. A4.

    Take care,
    Anand
  • metafor - Tuesday, November 1, 2011 - link

    In that case, the CPU that draws 1.3W is more power efficient, as it managed to gain a 30% power draw for *more* than a 30% performance increase.

    I absolutely agree that this is the situation with the A5 compared to the A4, but that has nothing to do with the "race to sleep" problem.

    That is to say, if CPU A finishes a task in 4s and CPU B finishes a task in 6s. CPU A is more than 30% faster than CPU B; it has higher perf/W.
  • Anand Lal Shimpi - Tuesday, November 1, 2011 - link

    It is race to sleep though. The more power efficient CPU can get to sleep quicker (hurry up and wait is what Intel used to call it), which offsets any increases in peak power consumption. However, given the right workload, the more power efficient CPU can still use more power.

    Take care,
    Anand

Log in

Don't have an account? Sign up now