GL/DXBenchmark 2.7 & Final Words

While the 3DMark tests were all run at 720p, the GL/DXBenchmark results run at roughly 2.25x the pixel count: 1080p. We get a mixture of low level and simulated game benchmarks with GL/DXBenchmark 2.7, the former isn't something 3DMark offers across all platforms today. The game simulation tests are far more strenuous here, which should do a better job of putting all of this in perspective. The other benefit we get from moving to Kishonti's test is the ability to compare to iOS and Windows RT as well. There will be a 3DMark release for both of those platforms this quarter, we just don't have final software yet.

We'll start with the low level tests, beginning with Kishonti's fill rate benchmark:

GL/DXBenchmark 2.7 - Fill Test (Offscreen)

Looking at raw pixel pushing power, everything post Apple's A5 seems to have displaced NVIDIA's GeForce 6600. NVIDIA's Tegra 3 doesn't appear to be quite up to snuff with the NV4x class of hardware here, despite similarities in the architectures. Both ARM's Mali-T604 (Nexus 10) and ImgTec's PowerVR SGX 554MP4 (iPad 4) do extremely well here. Both deliver higher fill rate than AMD's Radeon HD 6310, and in the case of the iPad 4 are capable to delivering midrange desktop GPU class performance from 2004 - 2005.

Next we'll look at raw triangle throughput. The vertex shader bound test from 3DMark did some funny stuff to the old G7x based architectures, but GL/DXBenchmark 2.7 seems to be a bit kinder:

GL/DXBenchmark 2.7 - Triangle Throughput, Fragment Lit (Offscreen)

Here the 8500 GT definitely benefits from its unified architecture as it is able to direct all of its compute resources towards the task at hand, giving it better performance than the 7900 GTX. The G7x and NV4x based architectures unfortunately have limited vertex shader hardware, and suffer as a result. That being said, most of the higher end G7x parts are a bit too much for the current crop of ultra mobile GPUs. The midrange NV4x hardware however isn't. The GeForce 6600 manages to deliver triangle throughput just south of the two Tegra 3 based devices (Surface RT, Nexus 7).

Apple's iPad 4 even delivers better performance here than the Radeon HD 6310 (E-350).

ARM's Mali-T604 doesn't do very well in this test, but none of ARM's Mali architectures have been particularly impressive in the triangle throughput tests.

With the low level tests out of the way, it's time to look at the two game scenes. We'll start with the less complex of the two, Egypt HD:

GL/DXBenchmark 2.5 - Egypt HD (Offscreen)

Now we have what we've been looking for. The iPad 4 is able to deliver similar performance to the GeForce 7900 GS, and 7800 GT, which by extension means it should be able to outperform a 6800 Ultra in this test. The vanilla GeForce 6600 remains faster than NVIDIA's Tegra 3, which is a bit disappointing for that part. The good news is Tegra 4 should be somewhere around high-end NV4x/upper-mid-range G7x performance in this sort of workload. Again we're seeing Intel's HD 4000 do remarkably well here. I do have to caution anyone looking to extrapolate game performance from these charts. At best we know how well these GPUs stack up in these benchmarks, until we get true cross-platform games we can't really be sure of anything.

For our last trick, we'll turn to the insanely heavy T-Rex HD benchmark. This test is supposed to tide the mobile market over until the next wave of OpenGL ES 3.0 based GPUs take over, at which point GL/DXBenchmark 3.0 will step in and keep everyone's ego in check.

GL/DXBenchmark 2.7 - T-Rex HD (Offscreen)

T-Rex HD puts the iPad 4 (PowerVR SGX 554MP4) squarely in the class of the 7800 GT and 7900 GS. Note the similarity in performance between the 7800 GT and 7900 GS indicates the relatively independent nature of T-Rex HD when it comes to absurd amounts of memory bandwidth (relatively speaking). Given that all of the ARM platforms south of the iPad 4 line have less than 12.8GB/s of memory bandwidth (and those are the platforms these benchmarks were designed for), a lack of appreciation for the 256-bit memory interfaces on some of the discrete cards is understandable. Here the 7900 GTX shows a 50% increase in performance over the 7900 GS. Given the 62.5% advantage the GTX holds in raw pixel shader performance, the advantage makes sense.

The 8500 GT's leading performance here is likely due to a combination of factors. Newer drivers, a unified shader architecture that lines up better with what the benchmark is optimized to run on, etc... It's still remarkable how well the iPad 4's A6X SoC does here as well as Qualcomm's Snapdragon 600/Adreno 320. The latter is even more impressive given that it's constrained to the power envelope of a large smartphone and not a tablet. The fact that we're this close with such portable hardware is seriously amazing.

At the end of the day I'd say it's safe to assume the current crop of high-end ultra mobile devices can deliver GPU performance similar to that of mid to high-end GPUs from 2006. The caveat there is that we have to be talking about performance in workloads that don't have the same memory bandwidth demands as the games from that same era. While compute power has definitely kept up (as has memory capacity), memory bandwidth is no where near as good as it was on even low end to mainstream cards from that time period. For these ultra mobile devices to really shine as gaming devices, it will take a combination of further increasing compute as well as significantly enhancing memory bandwidth. Apple (and now companies like Samsung as well) has been steadily increasing memory bandwidth on its mobile SoCs for the past few generations, but it will need to do more. I suspect the mobile SoC vendors will take a page from the console folks and/or Intel and begin looking at embedded/stacked DRAM options over the coming years to address this problem.

 

Choosing a Testbed CPU & 3DMark Performance
Comments Locked

128 Comments

View All Comments

  • pSupaNova - Sunday, April 7, 2013 - link

    Your not listening to what Wilco1 is saying.

    Microsoft used a poor Tegra 3 part, the HTC One X+ ships with a Tegra 3 clocked at 1.7ghz.

    So by Anand comparing the Atom based tabs to the Surface RT it puts Intel chip in a much better light.
  • zeo - Tuesday, April 16, 2013 - link

    Incorrect, Wilco1 is ignoring the differences in the SoCs. The Tegra 3 is a quad core and that means it can have up to 50% more performance than a equivalent dual core.

    While the Clover Trail is only a dual core... so while the clock speed may favor the ATOM, the number of cores favors the Tegra 3.

    It doesn't help that the ATOM still wins the run time tests as well. So overall efficiency is clearly in the ATOMs favor. While needing a quad core to beat a dual core still means the ATOM has better performance per core!

    Not that it matters much as Intel is set to upgrade the ATOM to Bay Trail by the end of the year, which promises to up to double CPU performance, along with going up to quad cores) and triple GPU compared to the present Clover Trail.

    While also going full 64bit and offering up to 8GB of RAM... Something ARM won't do till about the later half of 2014 at earliest and Nvidia specifically won't do until the Tegra 6... with Tegra 4 yet to come out yet in actual products now...
  • nofumble62 - Friday, April 5, 2013 - link

    LTE is not available on Intel platform yet, that is why they don't offer in US. But I heard the new Intel LTE chip is pretty good (won award), so next year will be interesting.
    The ARM Big cores suck up a lot of power when they are running. That is the reason Qualcomm SnapDragon is winning the latest Samsung S4 (over Samsung own Enoxys chip) and Nexus 7 (over Nvidia Tegra).
  • Spunjji - Friday, April 5, 2013 - link

    Nvidia Tegra's not really ready for the new Nexus 7, so it's not entirely fair to say it's out because of power issues. When you consider that the S4 situation you described isn't strictly true either (if I buy an S4 here in the UK it's going to have the Exynos chip in it) it tends to harm your conclusion a bit.
  • zeo - Tuesday, April 16, 2013 - link

    LTE will be introduced with the XMM 7160, which will be an optional addition to the Clover Trail+ series that's starting to come out now... Lenovo's K900 being one of the first design wins that has already been announced.

    MWC 2013 showed the K900 off with the 2GHz Z2580, which ups the GMA to dual 544 PowerVR GPUs at 533MHz... So they showcased it running some games and demos like Epic Citadel at the full 1080P and max FPS that demo allows.

    Only issue is the LTE is not integrated into the SoC... so won't be as power efficient as the other ARM solutions that are coming out with integrated LTE... at least for the LTE part...
  • WaltC - Friday, April 5, 2013 - link

    Unfortunately, that's not what this article delivers. It doesn't tell you a thing about current desktop gpu performance versus current ARM performance. What it does is tell you about how obsolete cpus & gpus from roughly TEN YEARS AGO look like against state-of-the-art cell-phone and iPad ARM running a few isolated 3d Mark graphics tests. What a disappointment. Nobody's even using these desktop cpus & gpus anymore. All this article does is show you how poorly ARM-powered mobile devices do when stacked up against common PC technology a decade ago! (That's assuming one assumes the 3dMark tests used here, such as they are, are actually representative of anything.) AH, if only he had simply used state-of-the-art desktops & cpus to compare with state-of-the-art ARM devices--well, the ARM stuff would have been crushed by such a wide margin it would astound most people. Why *would you* compare current ARM tech with decade-old desktop cpus & gpus? Beats me. Trying to make ARM look better than it has any right to look? Maybe in the future Anand will use a current desktop for his comparison, such as it is. Right now, the article provides no useful information--unless you like learning about really old x86 desktop technology that's been hobbled...;)

    To be fair, in the end Anand does admit that current ARM horsepower is roughly on a par with ~10-year-old desktop technology IF you don't talk about bandwidth or add it into the equation--in which case the ARMs don't even do well enough to stand up to 10-year-old commonplace cpu & gpu technology. So what was the point of this article? Again, beats me, as the comparisons aren't relevant because nobody is using that old desktop stuff anymore--they're running newer technology from ~5 years old to brand new--and it runs rings around the old desktop nVidia gpus Anand used for this article.

    BTW, and I'm sure Anand is aware of this, you can take DX11.1 gpus and run DX9-level software on them just fine (or OpenGL 3.x-level software, too.) Comments like this are baffling: "While compute power has definitely kept up (as has memory capacity), memory bandwidth is no where near as good as it was on even low end to mainstream cards from that time period." What's "kept up" with what? It sure isn't ARM technology as deployed in mobile devices--unless you want to count reaching ~decade-old x86 "compute power" levels (sans real gpu bandwidth) as "keeping up." I sure wouldn't say that.

    Neither Intel or AMD will be sitting still on the x86 desktop, so I'd imagine the current performance advantage (huge) of x86 over ARM will continue to hold if not to grow even wider as time moves on. I think the biggest flaw in this entire article is that it pretends you can make some kind of meaningful comparisons between current x86 desktop performance and current ARM performance as deployed in the devices mentioned. You just can't do that--the disparity would be far too large--it would be embarrassing for ARM. There's no need in that because in mobile ARM cpu/gpu technology, performance is *not* king by a long shot--power conservation for long battery life is king in ARM, however. x86 performance desktops, especially those setup for 3d gaming, are engineered for raw horsepower first and every other consideration, including power conservation, second. That's why Apple doesn't use ARM cpus in Macs and why you cannot buy a desktop today powered by an ARM cpu--the compute power just isn't there, and no one wants to retreat 10-15 years in performance just to run an ARM cpu on the desktop. The forte for ARM is mobile-device use, and the forte for x86 power cpus is on the desktop (and no, I don't count Atom as a powerful cpu...;))
  • pSupaNova - Sunday, April 7, 2013 - link

    How is it embarrassing for ARM? 90% of consumers don't require for most of their computing needs the power of a desktop CPU.

    Mobile devices have took the world by storm and have been able to increase their pixel pushing ability exponentially.

    No-one is suggesting that Mobile chips will suddenly catch their Desktop brethrens, but it is interesting to see that they are only three times slower than an typical CPU/ Discrete GPU combo of 2004!
  • zeo - Tuesday, April 16, 2013 - link

    That percentage would be much higher if you eliminated cloud support... the only reason why they get away with not needing a lot of performance for the average person is because a lot is offset to run on the cloud instead of on the device.

    Apple's Siri for example runs primarily on Apple Servers!

    While some applications like augmented reality, voice control, and other developing features aren't wide spread or developed enough to be a factor yet but when they do then performance requirements will skyrocket!

    Peoples needs may be small now but they were even smaller before... so they're steadily increasing, though maybe not as quickly as historically but never underestimate what people may need even just a few years from now.
  • Wolfpup - Friday, April 5, 2013 - link

    Yeah, I've been wanting to know more about these architectures and how they compare to PC components for ages! Nice article.
  • robredz - Sunday, April 7, 2013 - link

    It certainly puts things in perspective in terms of gaming on mobile platforms.

Log in

Don't have an account? Sign up now