GPU Performance Using Unreal Engine 3

In our iPad 2 review I called the PowerVR SGX 543MP2 Apple's gift to game developers. Apple boasted a roughly 9x improvement in raw GPU compute power over the A4 into the A5. The increase came through more execution resources and a higher GPU clock. The A5 in the iPhone 4S gets the same GPU, simply clocked lower than the iPad 2 version. Apple claims the iPhone 4S can deliver up to 7x the GPU performance of the iPhone 4, down from 9x in the iPad 2 vs. iPad 1 comparison. Why the delta?

The iPad 2 has both a larger battery and a higher resolution display. There are 28% more pixels to deal with on the iPad 2 vs the iPhone 4S and 9x vs 7x actually works out to be a 28% increase. The lower clocked GPU goes along with the lower clocked CPU in the 4S' version of the A5 to keep power consumption in check and because the platform doesn't need the performance as much as the iPad 2 with its higher resolution display.

Mobile SoC GPU Comparison
  Adreno 225 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name - USSE USSE2 USSE2 Core Core Core
# of SIMDs 8 4 4 8 4 + 1 8 12
MADs per SIMD 4 2 4 4 4 / 2 1 ?
Total MADs 32 8 16 32 18 8 ?
GFLOPS @ 200MHz 12.8 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 19.2  GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

GLBenchmark continues to be our go-to guy for GPU performance under iOS. While there are other reputable 3D benchmarks, GLBench remains the only good cross-platform (iOS and Android) solution we have today.

The performance gains live up to Apple's expectations (Update: our original 4S for Egypt/Pro were incorrect. We had two sets of graphs, one internal and one external - the latter had incorrect data. We have since updated the charts to reflect the 4S' actual performance. Sorry for the mixup!):

GLBenchmark 2.1 - Egypt - Offscreen

GLBenchmark 2.1 - Pro - Offscreen

GLBenchmark gets around vsync by rendering offscreen, so the 4S is allowed to run as fast as it can. Here we see a 6.46x higher frame rate compared to the iPhone 4.

It's obvious that GLBenchmark is designed first and foremost to be bound by shader performance rather than memory bandwidth, otherwise all of these performance increases would be capped at 2x since that's the improvement in memory bandwidth from the 4 to the 4S. Note that we're clearly not overly bound by memory bandwidth in these tests if we scale pixel count by 50%, which is hardly realistic. Most games won't be shader bound, instead they should be more limited by memory bandwidth.

At the iPhone 4S introduction Epic was on stage showing off Infinity Blade 2, which will have new visual enhancements only present on the 4S thanks to its faster GPU. Thus far Epic has been using GPU performance improvements to make its games look better and not necessarily run faster (although they do) since the target is playability on all platforms. What I wanted however was a true apples-to-apples comparison using Epic's engine as it is arguably the best looking platform to develop iOS games on today.

Epic offers a free license to Unreal Engine 3 to anyone who wants to use it for non-commercial use. If you want to sell your UE3 based iOS game, you don't have to pay a large sum to license Epic's engine up front. Instead you toss Epic $99 and pay royalties (25%) on any revenue beyond the first $50K. It's a great deal for aspiring game developers since you get access to one of the best 3D engines around and don't need any additional startup capital to use it. If your game is a hit Epic gets a cut but you're still making money so all is good in the world.

The process starts with UDK, the Unreal Development Kit. Epic actually offers a great deal of documentation on developing using UDK, making the whole process extremely easy. The freely available UDK can target Windows, Mac OS X and iOS platforms. If you want Android support you'll have to pay to license the dev kit unfortunately. Given how successful Infinity Blade has been under iOS, I suspect this is a move partially designed to keep Apple happy. It's also possible the Android UE3 dev kit is simply not as far along as the iOS version.

Along with every UDK download, Epic now provides the full source code to its well known iOS Citadel demo. With access to Citadel's source code and Epic's excellent (and freely available) development tools I put together a real-world GPU test for iOS.


What's that? A frame counter in iOS? Huzzah!

The test shows us frame rate over the course of a flythrough of Epic's Citadel demo. This is simply the standard Citadel guided tour but with UE3's frame recording statistics enabled. Once again, UDK gave me the tools needed to accurately profile what was going on. For developers this would be helpful in tuning the performance of your app, but for me it gave me the one thing I've been hoping for: average frame rate in a UE3 game for iOS.

The raw data looks like this, a graph of frame render times:


iPhone 4S frame time

You're looking at frame render time in ms, so lower numbers mean better performance. Notice how the iPhone 4S graph seems to remain mostly flat for the majority of the benchmark run? That's because it's limited by vsync. At 60Hz the frame render time is capped to 16.7ms, which is approximately where the 4S' curve flattens out to. The 4S could likely run through this demo even quicker (or maintain the same speed with a heavier graphical workload) if we had a way to disable vsync in iOS.


iPhone 4 frame time

On the iPhone 4 however, frame times are significantly higher - more than 2x on average. You also see significant spikes in frame time, indicating periods where the frame rate drops significantly. Not only does the 4S offer better average performance here but its performance is far more consistent, hugging vsync rather than wildly bouncing around.

The chart below summarizes the two graphs above by looking at the average frames rendered per second throughout the benchmark:

AnandTech UE3 Performance Test

The iPhone 4S averages 2.3x the frame rate of the iPhone 4 throughout our test. I believe this gives us a more realistic value than the 6x we saw in GLBenchmark. A major cause for the difference is the vsync limitations present in all iOS apps that render to the screen. On top of that, while we're obviously not completely limited by memory bandwidth, it's clear that memory bandwidth does play a larger role here than it does in GLBenchmark.

The Citadel demo by default increases rendering quality on the iPhone 4, but a quick look at the game's configuration files didn't show any new features enabled for the 4S. Chances are the version of Citadel included with the UDK was built prior to the 4S being available. In other words, the 4 and 4S should be rendering the same workload in our benchmark. To confirm I also grabbed a couple of screenshots to ensure the two devices were running at the same settings:


iPhone 4


iPhone 4S

This is actually the most stressful scene in the level, it causes even the 4S to drop below 30 fps. With the camera stationary in roughly the same position I saw a 74% increase in performance on the 4S vs the iPhone 4.

Most game developers still target the iPhone 3GS, but the 4S allows them to significantly ramp up image quality without any performance penalty. Because of the lower hardware target for most iOS games and forced vsync I wouldn't expect to see 2x increases in frame rate for the 4S over the 4 in most games out today or in the near future. You can expect a smoother frame rate and better looking games if developers follow Epic's lead and simply enable more eye-candy on the 4S.

The Memory Interface The A6: What's Next?
Comments Locked

199 Comments

View All Comments

  • metafor - Tuesday, November 1, 2011 - link

    When you say power efficiency, don't you mean perf/W?

    I agree that perf/W varies depending on the workload, exactly as you explained in the article. However, the perf/W is what makes the difference in terms of total energy used.

    It has nothing to do with race-to-sleep.

    That is to say, if CPU B takes longer to go to sleep but it had been better perf/W, it would take less power. In fact, I think this was what you demonstrated with your second example :)

    The total energy consumption is directly related to how power-efficient a CPU is. Whether it's a slow processor that runs for a long time or a fast processor that runs for a short amount of time; whichever one can process more instructions per second vs joules per second wins.

    Or, when you take seconds out of the equations, whichever can process more instructions/joule wins.

    Now, I assume you got this idea from one of Intel's people. The thing their marketing team usually forgets to mention is that when they say race-to-sleep is more power efficient, they're not talking about the processor, they're talking about the *system*.

    Take the example of a high-performance server. The DRAM array and storage can easily make up 40-50% of the total system power consumption.
    Let's then say we had two hypothetical CPU's with different efficiencies. CPU A being faster but less power efficient and CPU B being slower but more power efficient.

    The total power draw of DRAM and the rest of the system remains the same. And on top of that, the DRAM and storage can be shut down once the CPU is done with its processing job but must remain active (DRAM refreshed, storage controllers powered) while the CPU is active.

    In this scenario, even if CPU A draws more power processing the job compared to CPU B, the system with CPU B has to keep the DRAM and storage systems powered for longer. Thus, under the right circumstances, the system containing CPU A actually uses less overall power because it keeps those power-hungry subsystems active for a shorter amount of time.

    However, how well this scenario translates into a smartphone system, I can't say. I suspect not as well.
  • Anand Lal Shimpi - Tuesday, November 1, 2011 - link

    I believe we're talking about the same thing here :)

    The basic premise is that you're able to guarantee similar battery life, even if you double core count and move to a power hungry OoO architecture without a die shrink. If your performance gains allow your CPU/SoC to remain in an ultra low power idle state for longer during those workloads, the theoretically more power hungry architecture can come out equal or ahead in some cases.

    You are also right about platform power consumption as a whole coming into play. Although with the shift from LPDDR1 to LPDDR2, an increase in effective bandwidth and a number of other changes it's difficult to deal with them independently.

    Take care,
    Anand
  • metafor - Tuesday, November 1, 2011 - link

    "If your performance gains allow your CPU/SoC to remain in an ultra low power idle state for longer during those workloads, the theoretically more power hungry architecture can come out equal or ahead in some cases."

    Not exactly :) The OoOE architecture has to perform more tasks per joule. That is, it has to have better perf/W. If it had worse perf/W, it doesn't matter how much longer it remains idle compared to the slower processor. It will still use more net energy.

    It's total platform power that may see savings, despite a less power-efficient and more power-hungry CPU. That's why I suspect that this "race to sleep" situation won't translate to the smartphone system.

    The entire crux relies on the fact that although the CPU itself uses more power per task, it saves power by allowing the rest of the system to go to sleep faster.

    But smartphone subsystems aren't that power hungry, and CPU power consumption generally increases with the *square* of performance. (Generally, this wasn't the case of A8 -> A9 but you can bet it's the case to A9 -> A15).

    If the increase in CPU power per task is greater than the savings of having the rest of the system active for shorter amounts of time, it will still be a net loss in power efficiency.

    Put it another way. A9 may be a general power gain over A8, but don't expect A15 to be so compared to A9, no matter how fast it finishes a task :)
  • doobydoo - Tuesday, November 1, 2011 - link

    You are both correct, and you are also both wrong.

    Metafor is correct because any chip, given a set number of tasks to do over a fixed number of seconds, regardless of how much faster it can perform, will consume more energy than an equally power efficient but slower chip. In other words, being able to go to sleep quicker never means a chip becomes more power efficient than it was before. It actually becomes less.

    This is easily logically provable by splitting the energy into two sections. If 2 chips are both equally power efficient (as in they can both perform the same number of 'tasks' per W), if one is twice as fast, it will consume twice the energy during that time, but complete in half the time, so that element will ALWAYS be equal in both chips. However, the chip which finished sooner will then have to be idle for LONGER because it finished quicker, so the idle expense of energy will always be higher for the faster chip. This assumes, as I said, that the idle power draw of both chips being equal.

    Anand is correct, because if you DO have a more power efficient chip with a higher maximum wattage consumption, race to sleep is the OFTEN (assuming reasonable idle times) the reason it can actually use less power. Consider 2 chips, one which consumes 1.3 W per second (max) and can carry out '2' tasks per second. A second chip consumes 1 W per second (max), and can carry out '1' task per second (so is less power efficient). Now consider a world without race-to-sleep. To carry out '10' tasks over a 10 second period, Chip one would take 5 seconds, but would remain on full power for the full 10 seconds, thereby using 13W. Chip two would take 10 seconds, and would use a total of 10W over that period. Thus, the more power efficient chip actually proved less power efficient.

    Now if we factor in race-to-sleep, the first chip can use 1.3 for the first 5 seconds, then go down to 0.05 for the last 5. Consuming 6.75W. The second chip would still consume the same 10W.

    Conclusion:

    If the chip is not more power effficient, it can never consume less energy, with or without race-to-sleep. If the chip IS more power efficient, but doesn't have the sleep facility, it may not use less energy in all scenarios.

    In other words, for a higher powered chip to reduce energy in ALL situations, it needs to a) be more power efficient fundamentally, and b) it needs to be able to sleep (race-to-sleep).
  • djboxbaba - Monday, October 31, 2011 - link

    Well done on the review Brian and Anand, excellent job as always. I was resisting the urge to tweet you about the eta of the review, and of course I end up doing it the same day as your release the review :).
  • Mitch89 - Monday, October 31, 2011 - link

    "This same confidence continues with the 4S, which is in practice completely usable without a case, unlike the GSM/UMTS iPhone 4. "

    Everytime I read something like this, I can't help but compare it to my experience with iPhone 4 reception, which was never a problem. I'm on a very good network here in Australia (Telstra), and never did I have any issues with reception when using the phone naked. Calls in lifts? No problem. Way outside the suburbs and cities? Signal all the way.

    I never found the iPhone 4 to be any worse than other phones when I used it on a crappy network either.

    Worth noting, battery life is noticeably better on a strong network too...
  • wonderfield - Tuesday, November 1, 2011 - link

    Same here. It's certainly possible to "death grip" the GSM iPhone 4 to the point where it's rendered unusable, but this certainly isn't the typical use case. For Brian to make the (sideways) claim that the 4 is unusable without a case is fairly disingenuous. Certainly handedness has an impact here, but considering 70-90% of the world is right-handed, it's safe to assume that 70-90% of the world's population will have few to no issues with the iPhone 4, given it's being used in an area with ample wireless coverage.
  • doobydoo - Tuesday, November 1, 2011 - link

    I agree with both of these. I am in a major capital city which may make a difference, but no amount or technique of gripping my iPhone 4 ever caused dropped calls or stopped it working.

    Very much an over-stated issue in the press, I think
  • ados_cz - Tuesday, November 1, 2011 - link

    It was not over-stated at all and the argument that most people are right handed does not hold a ground. I live in a small town in Scotland and my usual signal strength is like 2-3 bars. If browsing on net on 3G without case and holding the iPhone 4 naturaly with left hand (using the right hand for touch commands ) I loose signal completely.
  • doobydoo - Tuesday, November 1, 2011 - link

    Well the majority of people don't lose signal.

    I have hundreds of friends who have iPhone 4's who've never had any issue with signal loss at all.

    The point is you DON'T have to be 'right handed' for them to work, I have left handed friends who also have no issues.

    You're the exception, rather than the rule - which is why the issue was overstated.

    For what it's worth, I don't believe you anyway.

Log in

Don't have an account? Sign up now