The Apple iPad 2 Review
by Brian Klug, Anand Lal Shimpi & Vivek Gowri on March 19, 2011 8:01 PM ESTThe GPU: Apple's Gift to Game Developers
The GPU side of the A5 is really what's most exciting. As we mentioned in our iPad 2 GPU Performance analysis, the A5 includes a dual-core PowerVR SGX 543 - also known as the SGX 543MP2. In our earlier article we showed the SGX 543MP2 easily beating both an iPad 1 and the Tegra 2 based Motorola Xoom.
To understand why the SGX 543MP2 has such a performance advantage we need to first remember that NVIDIA's Tegra 2 is nearly a year late. NVIDIA's first competitive ultra mobile GPU was supposed to be shipping in products in the first half of 2010, instead it found itself shipping in 2011. While NVIDIA is good at designing GPUs, it's not good enough that it can release a product and maintain a two year performance advantage over the competition. Let's look at the architecture, shall we?
NVIDIA's Tegra 2 features a DirectX 9-class GPU. NVIDIA used to call it the GeForce ULP (Ultra Low Power) but now it's just GeForce. As a DX9 class GPU we're dealing with a conventional, non-unified shader architecture. While all OpenGL ES 2.0 GPUs can execute pixel and vertex shader instructions, the GeForce in Tegra 2 runs pixel and vertex shaders on separate groups of hardware.
NVIDIA calls each pixel and vertex shader ALU a core. The Tegra 2 has four pixel shader cores and four vertex shader cores. The four pixel shader ALUs make up a single Vec4 and the same goes for the four vertex shader ALUs. NVIDIA wouldn't elaborate on what limitations exist when dispatching operations to the cores. All pixel shader operations happen at 20-bits per component precision while all vertex shader operations happen at 32-bits per component.
Each core is capable of executing one multiply+add (MAD) operation per clock. Do the math and that works out to be a peak rate of 8 MADs per clock for the entire GPU. The maximum operating frequency for the Tegra 2 GeForce GPU is 300MHz, however device vendors may run the GPU at a lower frequency to save on power. At 300MHz this works out to be 4.8 GFLOPS (counting a MAD as two FLOPs).
Imagination Technologies' PowerVR SGX 543MP2 is fundamentally a bigger GPU than the GeForce in NVIDIA's Tegra 2. Let's go through the math.
The SGX 543 features four USSE2 pipes. This is a unified shader architecture so both vertex and pixel shader code runs on the same set of hardware. The benefit of this approach is you get better performance in peaky situations where you're running a lot of vertex or pixel shader code and not a balance that's perfectly tailored to your architecture. The Tegra 2 will only run at peak efficiency if it encounters a mix of 50% vertex and 50% pixel shader code. The PowerVR SGX series will never have any of its execution pipes idle regardless of the instruction mix.
Each USSE2 pipe has a 4-wide vector ALU capable of cranking out 4 MADs per clock. Two of these pipes is enough to equal the peak throughput of what NVIDIA built in Tegra 2, but the PowerVR SGX 543 has four of them. As for the MP2? Go ahead and double that number again. The SGX 543MP2 is simply two 543s placed next to one another.
All of this works out to be 16 MADs per clock for the SGX 543 and 32 MADs per clock for the SGX 543MP2. At 200MHz that's 12.8GFLOPS and at 250MHz we're talking about 16 GFLOPS.
Mobile SoC GPU Comparison | |||||||||
PowerVR SGX 530 | PowerVR SGX 535 | PowerVR SGX 540 | PowerVR SGX 543 | PowerVR SGX 543MP2 | GeForce ULP | Kal-El GeForce | |||
SIMD Name | USSE | USSE | USSE | USSE2 | USSE2 | Core | Core | ||
# of SIMDs | 2 | 2 | 4 | 4 | 8 | 8 | 12 | ||
MADs per SIMD | 2 | 2 | 2 | 4 | 4 | 1 | ? | ||
Total MADs | 4 | 4 | 8 | 16 | 32 | 8 | ? | ||
GFLOPS @ 200MHz | 1.6 GFLOPS | 1.6 GFLOPS | 3.2 GFLOPS | 6.4 GFLOPS | 12.8 GFLOPS | 3.2 GFLOPS | ? | ||
GFLOPS @ 300MHz | 2.4 GFLOPS | 2.4 GFLOPS | 4.8 GFLOPS | 9.6 GFLOPS | 19.2 GFLOPS | 4.8 GFLOPS | ? |
At its lowest expected clock speed, the 543MP2 already has over twice the compute power of the Tegra 2's GPU at its highest operating frequency. Take into account the fact that the A5 likely has more memory bandwidth than Tegra 2 and the SGX 543MP2 is a tile based architecture with lower bandwidth requirements and the performance numbers we talked about last time shouldn't be all that surprising.
The real competition for the SGX 543MP2 will be NVIDIA's Kal-El. That part is expected to ship on time and will feature a boost in core count: from 8 to 12. The ratio of pixel to vertex shader cores is not known at this point but I'm guessing it won't be balanced anymore. NVIDIA is promising 3x the GPU performance out of Kal-El so I suspect that we'll see an increase in throughput per core.
GPU Performance
Taken from our iPad 2 GPU Performance Preview:
As always we turn to GLBenchmark 2.0, a benchmark crafted by a bunch of developers who either have or had experience doing development work for some of the big dev houses in the industry. We'll start with some of the synthetics.
Over the course of PC gaming evolution we noticed a significant increase in geometry complexity. We'll likely see a similar evolution with games in the ultra mobile space, and as a result this next round of ultra mobile GPUs will seriously ramp up geometry performance.
Here we look at two different geometry tests amounting to the (almost) best and worst case triangle throughput measured by GLBenchmark 2.0. First we have the best case scenario - a textured triangle:
The original iPad could manage 8.7 million triangles per second in this test. The iPad 2? 29 million. An increase of over 3x. Developers with existing titles on the iPad could conceivably triple geometry complexity with no impact on performance on the iPad 2.
Now for the more complex case - a fragment lit triangle test:
The performance gap widens. While the PowerVR SGX 535 in the A4 could barely break 4 million triangles per second in this test, the PowerVR SGX 543MP2 in the A5 manages just under 20 million. There's just no competition here.
I mentioned an improvement in texturing performance earlier. The GLBenchmark texture fetch test puts numbers to that statement:
We're talking about nearly a 5x increase in texture fetch performance. This has to be due to more than an increase in the amount of texturing hardware. An improvement in throughput? Increase in memory bandwidth? It's tough to say without knowing more at this point.
Apple iPad vs. iPad 2 | ||||
Apple iPad (PowerVR SGX 535) | Apple iPad 2 (PowerVR SGX 543MP2) | |||
Array test - uniform array access |
3412.4 kVertex/s
|
3864.0 kVertex/s
|
||
Branching test - balanced |
2002.2 kShaders/s
|
11412.4 kShaders/s
|
||
Branching test - fragment weighted |
5784.3 kFragments/s
|
22402.6kFragments/s
|
||
Branching test - vertex weighted |
3905.9 kVertex/s
|
3870.6 kVertex/s
|
||
Common test - balanced |
1025.3 kShaders/s
|
4092.5 kShaders/s
|
||
Common test - fragment weighted |
1603.7 kFragments/s
|
3708.2 kFragments/s
|
||
Common test - vertex weighted |
1516.6 kVertex/s
|
3714.0 kVertex/s
|
||
Geometric test - balanced |
1276.2 kShaders/s
|
6238.4 kShaders/s
|
||
Geometric test - fragment weighted |
2000.6 kFragments/s
|
6382.0 kFragments/s
|
||
Geometric test - vertex weighted |
1921.5 kVertex/s
|
3780.9 kVertex/s
|
||
Exponential test - balanced |
2013.2 kShaders/s
|
11758.0 kShaders/s
|
||
Exponential test - fragment weighted |
3632.3 kFragments/s
|
11151.8 kFragments/s
|
||
Exponential test - vertex weighted |
3118.1 kVertex/s
|
3634.1 kVertex/s
|
||
Fill test - texture fetch |
179116.2 kTexels/s
|
890077.6 kTexels/s
|
||
For loop test - balanced |
1295.1 kShaders/s
|
3719.1 kShaders/s
|
||
For loop test - fragment weighted |
1777.3 kFragments/s
|
6182.8 kFragments/s
|
||
For loop test - vertex weighted |
1418.3 kVertex/s
|
3813.5 kVertex/s
|
||
Triangle test - textured |
8691.5 kTriangles/s
|
29019.9 kTriangles/s
|
||
Triangle test - textured, fragment lit |
4084.9 kTriangles/s
|
19695.8 kTriangles/s
|
||
Triangle test - textured, vertex lit |
6912.4 kTriangles/s
|
20907.1 kTriangles/s
|
||
Triangle test - white |
9621.7 kTriangles/s
|
29771.1 kTriangles/s
|
||
Trigonometric test - balanced |
1292.6 kShaders/s
|
3249.9 kShaders/s
|
||
Trigonometric test - fragment weighted |
1103.9 kFragments/s
|
3502.5 kFragments/s
|
||
Trigonometric test - vertex weighted |
1018.8 kVertex/s
|
3091.7 kVertex/s
|
||
Swapbuffer Speed |
600
|
599
|
Enough with the synthetics - how much of an improvement does all of this yield in the actual GLBenchmark 2.0 game tests? Oh it's big.
Without AA, the Egypt test runs at 5.4x the frame rate of the original iPad. It's even 3.7x the speed of the Tegra 2 in the Xoom running at 1280 x 800 (granted that's an iOS vs. Android comparison as well).
With AA enabled the iPad 2 advantage grows to 7x. In a game with the complexity of the Egypt test the original iPad wouldn't be remotely playable while the iPad 2 could run it smoothly.
The Pro test is a little more reasonable, showing a 3 - 4x increase in performance compared to the original iPad:
While we weren't able to reach the 9x figure claimed by Apple (I'm not sure that you'll ever see 9x running real game code), a range of 3 - 7x in GLBenchmark 2.0 is more reasonable. In practice I'd expect something less than 5x but that's nothing to complain about.
189 Comments
View All Comments
Mike1111 - Sunday, March 20, 2011 - link
Well, Anandtech is a site for geeks, but shouldn't you have at least mentioned how you think the iPad 2 could fit into the average person's life? People who don't "work" with PCs in their free time and who don't have a dedicated PC workflow?Some thoughts regarding the review:
- I thought the glass was supposed to be from Asahi Glass (Dragontrail)?
- Okay, the Xoom can't play videos with b-frames without problems. But what h.264 videos can the iPad 2 play? Same as iPad? More? High-profile? Blu-ray class h.264 videos?
- I wish you could have gone more in-depth regarding the A5. Why is it so big compared to the Tegra2? How efficient does it work? What kind of video decoder/encoder are used? etc.
Zebo - Sunday, March 20, 2011 - link
Nothin like the real things baby.....I have used a x201 tablet since April of 10 and it's the best investment I ever made. True outdoor viewable with upgraded outdoor IPS screen and 500 nits. true keyboard, true duel core processor, true work machine. I have ATT card to get internet and take it everywhere I go. I bet I travel more than Anand and it's the only way to fly.tcool93 - Sunday, March 20, 2011 - link
I don't even own the Ipad. Yet I do know for a fact there are at least two other browsers you can use with it besides Safari. The Atomic browser, and the Skyfire browser... both supporting tabs and supposedly are much better than Safari. Skyfire even has partial flash support, and viewing social network sites built in (twitter, facebook, etc). Both of those browsers have very good reviews also.secretmanofagent - Sunday, March 20, 2011 - link
I'm starting to play with iCab, but I don't have an iPad.dagamer34 - Sunday, March 20, 2011 - link
3rd party browsers unfortunately don't get the Javascript speedup built into iOS 4.3tipoo - Sunday, March 20, 2011 - link
The javascript engine is built into the browser. Of course they don't get the faster Safari engine, they aren't Safari. They use their own engines.name99 - Sunday, March 20, 2011 - link
Yes and no.The current iOS will no allow third party apps to create code on the fly, so those browsers will not be able to use JIT'ing, even if they wanted to write a sophisticated javascript engine.
On the other hand, Apple is well aware of the limitations of their current browser tech and are actively working on ways to run different parts of the browsing code in different processes (for both performance --- multi-threading, non-blocked UI --- and security reasons), on both OSX and iOS.
When this effort comes to fruition, who knows how much of the underlying tech (in particular, in this case the ability to create code on the fly, perhaps in some sandboxed fashion) will be made available to devs?
Zebo - Sunday, March 20, 2011 - link
You'll never even think about a slate tablet after that.10 hrs battery
all windows apps
plays games
IPS screen (with upgrade)
can use as HTPC when on road
can publish this site effortlessly
I doubt you'll use a another device besides your iphone
VivekGowri - Sunday, March 20, 2011 - link
And I could buy three iPads for the same price. It simply isn't a valid comparison for the same reason the MB Air, Asus Slate, and other $1000+ devices aren't; not in the same category, not even in the same price range. It's like saying that after driving a Mercedes S-class, you'll never think about driving a Lotus Elise or Porsche Boxster ever again - it's not really a useful or valid comparison to make.I don't doubt that the X220t is going to be an excellent, excellent device - fixes every problem I had with the X200/201t, goes back to the IPS display, and it's going to be pretty fast too. It looks pretty awesome, IMO. If I was in the market for a tablet PC (as opposed to a smartphone-based tablet), this and the ASUS Slate would be the only two I'd really look at - the ASUS is kind of like a cheaper version of the X220 except without the built-in keyboard.
snouter - Sunday, March 20, 2011 - link
But I left it on a plane. What did I replace it with? An 11" MacBook Air. Honestly, it's no comparison. The Air can do so many things that the iPad could not. Tablets will stick around and find niche applications in lots of places, but I'd keep my eye on the the super thin super light notebooks. BTW, the Air has a ULV Core 2 Duo 1.6GHz and will get Sandy Bridge in the next update. The processing power is far superior to the tablets and the netbooks. It's everything I wanted to do with my iPad, and it's a notebook when I need it to be. Main main work Laptop is still a 17" MacBook Pro, but none of these tablets, netbooks or ULV laptops are in competition with it. When the next Air comes out with a backlit keyboard and ULV Sandy Bridge, I'll be there.