GL/DXBenchmark 2.7 & Final Words

While the 3DMark tests were all run at 720p, the GL/DXBenchmark results run at roughly 2.25x the pixel count: 1080p. We get a mixture of low level and simulated game benchmarks with GL/DXBenchmark 2.7, the former isn't something 3DMark offers across all platforms today. The game simulation tests are far more strenuous here, which should do a better job of putting all of this in perspective. The other benefit we get from moving to Kishonti's test is the ability to compare to iOS and Windows RT as well. There will be a 3DMark release for both of those platforms this quarter, we just don't have final software yet.

We'll start with the low level tests, beginning with Kishonti's fill rate benchmark:

GL/DXBenchmark 2.7 - Fill Test (Offscreen)

Looking at raw pixel pushing power, everything post Apple's A5 seems to have displaced NVIDIA's GeForce 6600. NVIDIA's Tegra 3 doesn't appear to be quite up to snuff with the NV4x class of hardware here, despite similarities in the architectures. Both ARM's Mali-T604 (Nexus 10) and ImgTec's PowerVR SGX 554MP4 (iPad 4) do extremely well here. Both deliver higher fill rate than AMD's Radeon HD 6310, and in the case of the iPad 4 are capable to delivering midrange desktop GPU class performance from 2004 - 2005.

Next we'll look at raw triangle throughput. The vertex shader bound test from 3DMark did some funny stuff to the old G7x based architectures, but GL/DXBenchmark 2.7 seems to be a bit kinder:

GL/DXBenchmark 2.7 - Triangle Throughput, Fragment Lit (Offscreen)

Here the 8500 GT definitely benefits from its unified architecture as it is able to direct all of its compute resources towards the task at hand, giving it better performance than the 7900 GTX. The G7x and NV4x based architectures unfortunately have limited vertex shader hardware, and suffer as a result. That being said, most of the higher end G7x parts are a bit too much for the current crop of ultra mobile GPUs. The midrange NV4x hardware however isn't. The GeForce 6600 manages to deliver triangle throughput just south of the two Tegra 3 based devices (Surface RT, Nexus 7).

Apple's iPad 4 even delivers better performance here than the Radeon HD 6310 (E-350).

ARM's Mali-T604 doesn't do very well in this test, but none of ARM's Mali architectures have been particularly impressive in the triangle throughput tests.

With the low level tests out of the way, it's time to look at the two game scenes. We'll start with the less complex of the two, Egypt HD:

GL/DXBenchmark 2.5 - Egypt HD (Offscreen)

Now we have what we've been looking for. The iPad 4 is able to deliver similar performance to the GeForce 7900 GS, and 7800 GT, which by extension means it should be able to outperform a 6800 Ultra in this test. The vanilla GeForce 6600 remains faster than NVIDIA's Tegra 3, which is a bit disappointing for that part. The good news is Tegra 4 should be somewhere around high-end NV4x/upper-mid-range G7x performance in this sort of workload. Again we're seeing Intel's HD 4000 do remarkably well here. I do have to caution anyone looking to extrapolate game performance from these charts. At best we know how well these GPUs stack up in these benchmarks, until we get true cross-platform games we can't really be sure of anything.

For our last trick, we'll turn to the insanely heavy T-Rex HD benchmark. This test is supposed to tide the mobile market over until the next wave of OpenGL ES 3.0 based GPUs take over, at which point GL/DXBenchmark 3.0 will step in and keep everyone's ego in check.

GL/DXBenchmark 2.7 - T-Rex HD (Offscreen)

T-Rex HD puts the iPad 4 (PowerVR SGX 554MP4) squarely in the class of the 7800 GT and 7900 GS. Note the similarity in performance between the 7800 GT and 7900 GS indicates the relatively independent nature of T-Rex HD when it comes to absurd amounts of memory bandwidth (relatively speaking). Given that all of the ARM platforms south of the iPad 4 line have less than 12.8GB/s of memory bandwidth (and those are the platforms these benchmarks were designed for), a lack of appreciation for the 256-bit memory interfaces on some of the discrete cards is understandable. Here the 7900 GTX shows a 50% increase in performance over the 7900 GS. Given the 62.5% advantage the GTX holds in raw pixel shader performance, the advantage makes sense.

The 8500 GT's leading performance here is likely due to a combination of factors. Newer drivers, a unified shader architecture that lines up better with what the benchmark is optimized to run on, etc... It's still remarkable how well the iPad 4's A6X SoC does here as well as Qualcomm's Snapdragon 600/Adreno 320. The latter is even more impressive given that it's constrained to the power envelope of a large smartphone and not a tablet. The fact that we're this close with such portable hardware is seriously amazing.

At the end of the day I'd say it's safe to assume the current crop of high-end ultra mobile devices can deliver GPU performance similar to that of mid to high-end GPUs from 2006. The caveat there is that we have to be talking about performance in workloads that don't have the same memory bandwidth demands as the games from that same era. While compute power has definitely kept up (as has memory capacity), memory bandwidth is no where near as good as it was on even low end to mainstream cards from that time period. For these ultra mobile devices to really shine as gaming devices, it will take a combination of further increasing compute as well as significantly enhancing memory bandwidth. Apple (and now companies like Samsung as well) has been steadily increasing memory bandwidth on its mobile SoCs for the past few generations, but it will need to do more. I suspect the mobile SoC vendors will take a page from the console folks and/or Intel and begin looking at embedded/stacked DRAM options over the coming years to address this problem.


Choosing a Testbed CPU & 3DMark Performance


View All Comments

  • Wilco1 - Friday, April 5, 2013 - link

    Yes frequency still matters. Surface RT looks bad because MS chose the lowest frequency. If they had used the 1.7GHz Tegra 3 instead then Surface RT would look a lot more competitive just because of the frequency.

    So my point stands and is confirmed by your link: at similar frequencies Tegra 3 beats the Z-2760 even on SunSpider.
  • tech4real - Friday, April 5, 2013 - link

    but why do we have to compare them at similar frequencies? one of atom's strength is working at high freq within thermal budget. If tegra 3 can't hit 2GHz within power budget, it's nvidia/arm's problem. why should atom bother to downclock itself. Reply
  • Wilco1 - Friday, April 5, 2013 - link

    There is no need to clock the Atom down - typical A9-based tablets are at 1.6 or 1.7GHz. Yes an Z-2760 beats a 1.3GHz Tegra 3 on SunSpider, but that's not true for Cortex-A9's used today (Tegra 3 goes up to 1.7GHz, Exynos 4 does 1.6GHz), let alone future ones. So it's incorrect to claim that Atom is generally faster than A9 - that implies Atom has an IPC advantage (which it does not have - it only wins if it has a big frequency advantage). I believe MS made a mistake by choosing the slowest Tegra 3 for Surface RT as it gives RT as well as Tegra a bad name - hopefully they fix this in the next version.

    Beating an old low clocked Tegra 3 on performance/power is not all that difficult, however beating more modern SoCs is a different matter. Pretty much all ARM SoCs are already at 28 or 32nm, while Tegra 3 is still 40nm. That will finally change with Tegra 4.
  • tech4real - Sunday, April 7, 2013 - link

    Based on this anand article
    the linearly projected 1.7GHz Tegra 3 specint2000 score is about 1.12, while the 1.8Ghz atom stands at 1.20, so the gap is still there. If you consider 2GHz atom turbo case, we can argue the gap is even wider. Of course since this specint data is provided by intel, we have to take it with a grain of salt, but i think the general idea has its merit.
  • Wilco1 - Monday, April 8, 2013 - link

    Those are Intel marketing numbers indeed - Intel uses compiler tricks to get good SPEC results, and this doesn't translate to real world performance or help Atom when you use a different compiler (Android uses GCC, Windows uses VC++).

    Geekbench gives a better idea of CPU performance:

    A 1.6GHz Exynos 4412 soundly thrases the Z-2760 at 1.8GHz on the integer, FP and memory performance. Atom only wins the Stream test. Before you go "but but Atom has only 2 cores!", it has 4 threads, so it is comparable with 4 cores, and in any case it loses all but 3 single thread benchmarks despite having a 12.5% frequency advantage.

    There are also several benchmark runs by Phoronix which test older Atoms against various ARM SoCs using the same Linux kernel and GCC compiler across a big test suite of benchmarks which come to the same conclusion. This is what I base my opinion of, not some Intel marketing scores blessed by Anand or some rubbish Javascript benchmark.
  • tech4real - Wednesday, April 10, 2013 - link

    Cross ISA cross platform benchmarking is a daunting task to be done fairly, or at least trying to :-)
    SPEC benchmark has established its position after many years of tuning, and I think most people would prefer using it to gauge processor performance. If samsung or nvidia believe they can do a better job to showcase their CPUs than Intel(which I totally expect they could do better, after all it doesn't make sense for intel to spend time tuning its competitors' products), they can publish their SPEC scores. However in the absence of that, it's very hard to argue samsung/nvidia/arm has a better performing product. Remember "the worst way to lose a fight is by not showing up".
    I don't have much knowledge of these new benchmark suites, and they may well be decent, but it takes time to mature and gain professional acceptance.
    A past example of taking hobby benchmarks on face value too seriously is: back in early 2011, nvidia showed a tegra 3 is performing on the same level as(or faster than?) core 2 duo T7200 under CoreMarks. Needless to say, now we all know tegra 3 in real life is around atom level performance. This shows there is a reason we have and need a benchmark suite like SPEC.
  • Wilco1 - Sunday, April 14, 2013 - link

    SPEC is hardly used outside high-end server CPUs (it's difficult to even run SPEC on a mobile phone due to memory and storage constraints). However the main issue is that Intel has tuned its compiler to SPEC, giving it an unfair advantage. Using GCC results in a much lower score. The funny thing is, GCC typically wins on real applications (I know because I have done those comparisons). That makes Intel's SPEC scores useless as an indication of actual CPU speed in real world scenarios. Yes, ARM, NVidia, Samsung etc could tune GCC in the same way by pouring in 10's of millions over many years (it really takes that much effort). But does it really make sense to use compiler tricks to pretend you are faster?

    The NVidia T7200 claim was based on a public result on the EEMBC website that was run by someone else. It used an old GCC version with non-optimal settings. However for the Tegra score they used a newer version of GCC, giving it an unfair advantage. Same thing as with Intel's SPEC scores... This shows how much CPU performance is affected by compiler and settings.
  • theduckofdeath - Thursday, April 4, 2013 - link

    That is not true. A few months ago Anandtech themselves made a direct comparison between the Tegra 3 in the Surface tablet and an Atom processor, and the Atom beat the Tegra 3 both on performance and power efficiency. Reply
  • Wilco1 - Friday, April 5, 2013 - link

    I was talking about similar frequencies - did you read what I said? Yes the first Surface RT is a bit of a disappointment due to the low clocked Tegra 3, but hopefully MS will use a better SoC in the next version. Tegra 4(+) or Exynos Octa would make it shine. We can then see how Atom does against that. Reply
  • SlyNine - Saturday, April 6, 2013 - link

    Nobody cares if the frequencies are different, if one performs better and uses less power that's a win; REGARDLESS OF FREQUENCY.

    Give one good reason, that matters to the consumer and manufacture, for frequencies being an important factor.

Log in

Don't have an account? Sign up now