For the past several days I've been playing around with Futuremark's new 3DMark for Android, as well as Kishonti's GL and DXBenchmark 2.7. All of these tests are scheduled to be available on Android, iOS, Windows RT and Windows 8 - giving us the beginning of a very wonderful thing: a set of benchmarks that allow us to roughly compare mobile hardware across (virtually) all OSes. The computing world is headed for convergence in a major way, and with benchmarks like these we'll be able to better track everyone's progress as the high performance folks go low power, and the low power folks aim for higher performance.

The previous two articles I did on the topic were really focused on comparing smartphones to smartphones, and tablets to tablets. What we've been lacking however has been perspective. On the CPU side we've known how fast Atom was for quite a while. Back in 2008 I concluded that a 1.6GHz single core Atom processor delivered performance similar to that of a 1.2GHz Pentium M, or a mainstream Centrino notebook from 2003. Higher clock speeds and a second core would likely push that performance forward by another year or two at most. Given that most of the ARM based CPU competitors tend to be a bit slower than Atom, you could estimate that any of the current crop of smartphones delivers CPU performance somewhere in the range of a notebook from 2003 - 2005. Not bad. But what about graphics performance?

To find out, I went through my parts closet in search of GPUs from a similar time period. I needed hardware that supported PCIe (to make testbed construction easier), and I needed GPUs that supported DirectX 9, which had me starting at 2004. I don't always keep everything I've ever tested, but I try to keep parts of potential value to future comparisons. Rest assured that back in 2004 - 2007, I didn't think I'd be using these GPUs to put smartphone performance in perspective.

Here's what I dug up:

The Lineup (Configurations as Tested)
  Release Year Pixel Shaders Vertex Shaders Core Clock Memory Data Rate Memory Bus Width Memory Size
NVIDIA GeForce 8500 GT 2007 16 (unified) 520MHz (1040MHz shader clock) 1.4GHz 128-bit 256MB DDR3
NVIDIA GeForce 7900 GTX 2006 24 8 650MHz 1.6GHz 256-bit 512MB DDR3
NVIDIA GeForce 7900 GS 2006 20 7 480MHz 1.4GHz 256-bit 256MB DDR3
NVIDIA GeForce 7800 GT 2005 20 7 400MHz 1GHz 256-bit 256MB DDR3
NVIDIA GeForce 6600 2004 8 3 300MHz 500MHz 128-bit 256MB DDR

I wanted to toss in a GeForce 6600 GT, given just how awesome that card was back in 2004, but alas I had cleared out my old stock of PCIe 6600 GTs long ago. I had an AGP 6600 GT but that would ruin my ability to keep CPU performance in-line with Surface Pro, so I had to resort to a vanilla GeForce 6600. Both core clock and memory bandwidth suffered as a result, with the latter being cut in half from using slower DDR. The core clock on the base 6600 was only 300MHz compared to 500MHz for the GT. What does make the vanilla GeForce 6600 very interesting however is that it delivered similar performance to a very famous card: the Radeon 9700 Pro (chip codename: R300). The Radeon 9700 Pro also had 8 pixel pipes, but 4 vertex shader units, and ran at 325MHz. The 9700 Pro did have substantially higher memory bandwidth, but given the bandwidth-limited target market of our only cross-platform benchmarks we won't always see tons of memory bandwidth put to good use here.

The 7800 GT and 7900 GS/GTX were included to showcase the impacts of scaling up compute units and memory bandwidth, as the architectures aren't fundamentally all that different from the GeForce 6600 - they're just bigger and better. The 7800 GT in particular was exciting as it delivered performance competitive with the previous generation GeForce 6800 Ultra, but at a more attractive price point. Given that the 6800 Ultra was cream of the crop in 2004, the performance of the competitive 7800 GT will be important to look at.

Finally we have a mainstream part from NVIDIA's G8x family: the GeForce 8500 GT. Prior to G80 and its derivatives, NVIDIA used dedicated pixel and vertex shader hardware - similar to what it does today with its ultra mobile GPUs (Tegra 2 - 4). Starting with G80 (and eventually trickling down to G86, the basis of the 8500 GT), NVIDIA embraced a unified shader architecture with a single set of execution resources that could be used to run pixel or vertex shader programs. NVIDIA will make a similar transition in its Tegra lineup with Logan in 2014. The 8500 GT won't outperform the 7900 GTX in most gaming workloads, but it does give us a look at how NVIDIA's unified architecture deals with our two cross-platform benchmarks. Remember that both 3DMark and GL/DXBenchmark 2.7 were designed (mostly) to run on modern hardware. Although hardly modern, the 8500 GT does look a lot more like today's architectures than the G70 based cards.

You'll notice a distinct lack of ATI video cards here - that's not from a lack of trying. I dusted off an old X800 GT and an X1650 Pro, neither of which would complete the first graphics test in 3DMark or DXBenchmark's T-Rex HD test. Drivers seem to be at fault here. ATI dropped support for DX9-only GPUs long ago, the latest Catalyst available for these cards (10.2) was put out well before either benchmark was conceived. Unfortunately I don't have any AMD based ultraportables, but I did grab the old Brazos E-350. As a reminder, the E-350 was a 40nm APU that used two Bobcat cores and featured 80 GPU cores (Radeon HD 6310). While we won't see the E-350 in a tablet, a faster member of its lineage will find its way into tablets beginning this year.

Choosing a Testbed CPU & 3DMark Performance
Comments Locked


View All Comments

  • Wilco1 - Friday, April 5, 2013 - link

    Yes frequency still matters. Surface RT looks bad because MS chose the lowest frequency. If they had used the 1.7GHz Tegra 3 instead then Surface RT would look a lot more competitive just because of the frequency.

    So my point stands and is confirmed by your link: at similar frequencies Tegra 3 beats the Z-2760 even on SunSpider.
  • tech4real - Friday, April 5, 2013 - link

    but why do we have to compare them at similar frequencies? one of atom's strength is working at high freq within thermal budget. If tegra 3 can't hit 2GHz within power budget, it's nvidia/arm's problem. why should atom bother to downclock itself.
  • Wilco1 - Friday, April 5, 2013 - link

    There is no need to clock the Atom down - typical A9-based tablets are at 1.6 or 1.7GHz. Yes an Z-2760 beats a 1.3GHz Tegra 3 on SunSpider, but that's not true for Cortex-A9's used today (Tegra 3 goes up to 1.7GHz, Exynos 4 does 1.6GHz), let alone future ones. So it's incorrect to claim that Atom is generally faster than A9 - that implies Atom has an IPC advantage (which it does not have - it only wins if it has a big frequency advantage). I believe MS made a mistake by choosing the slowest Tegra 3 for Surface RT as it gives RT as well as Tegra a bad name - hopefully they fix this in the next version.

    Beating an old low clocked Tegra 3 on performance/power is not all that difficult, however beating more modern SoCs is a different matter. Pretty much all ARM SoCs are already at 28 or 32nm, while Tegra 3 is still 40nm. That will finally change with Tegra 4.
  • tech4real - Sunday, April 7, 2013 - link

    Based on this anand article
    the linearly projected 1.7GHz Tegra 3 specint2000 score is about 1.12, while the 1.8Ghz atom stands at 1.20, so the gap is still there. If you consider 2GHz atom turbo case, we can argue the gap is even wider. Of course since this specint data is provided by intel, we have to take it with a grain of salt, but i think the general idea has its merit.
  • Wilco1 - Monday, April 8, 2013 - link

    Those are Intel marketing numbers indeed - Intel uses compiler tricks to get good SPEC results, and this doesn't translate to real world performance or help Atom when you use a different compiler (Android uses GCC, Windows uses VC++).

    Geekbench gives a better idea of CPU performance:

    A 1.6GHz Exynos 4412 soundly thrases the Z-2760 at 1.8GHz on the integer, FP and memory performance. Atom only wins the Stream test. Before you go "but but Atom has only 2 cores!", it has 4 threads, so it is comparable with 4 cores, and in any case it loses all but 3 single thread benchmarks despite having a 12.5% frequency advantage.

    There are also several benchmark runs by Phoronix which test older Atoms against various ARM SoCs using the same Linux kernel and GCC compiler across a big test suite of benchmarks which come to the same conclusion. This is what I base my opinion of, not some Intel marketing scores blessed by Anand or some rubbish Javascript benchmark.
  • tech4real - Wednesday, April 10, 2013 - link

    Cross ISA cross platform benchmarking is a daunting task to be done fairly, or at least trying to :-)
    SPEC benchmark has established its position after many years of tuning, and I think most people would prefer using it to gauge processor performance. If samsung or nvidia believe they can do a better job to showcase their CPUs than Intel(which I totally expect they could do better, after all it doesn't make sense for intel to spend time tuning its competitors' products), they can publish their SPEC scores. However in the absence of that, it's very hard to argue samsung/nvidia/arm has a better performing product. Remember "the worst way to lose a fight is by not showing up".
    I don't have much knowledge of these new benchmark suites, and they may well be decent, but it takes time to mature and gain professional acceptance.
    A past example of taking hobby benchmarks on face value too seriously is: back in early 2011, nvidia showed a tegra 3 is performing on the same level as(or faster than?) core 2 duo T7200 under CoreMarks. Needless to say, now we all know tegra 3 in real life is around atom level performance. This shows there is a reason we have and need a benchmark suite like SPEC.
  • Wilco1 - Sunday, April 14, 2013 - link

    SPEC is hardly used outside high-end server CPUs (it's difficult to even run SPEC on a mobile phone due to memory and storage constraints). However the main issue is that Intel has tuned its compiler to SPEC, giving it an unfair advantage. Using GCC results in a much lower score. The funny thing is, GCC typically wins on real applications (I know because I have done those comparisons). That makes Intel's SPEC scores useless as an indication of actual CPU speed in real world scenarios. Yes, ARM, NVidia, Samsung etc could tune GCC in the same way by pouring in 10's of millions over many years (it really takes that much effort). But does it really make sense to use compiler tricks to pretend you are faster?

    The NVidia T7200 claim was based on a public result on the EEMBC website that was run by someone else. It used an old GCC version with non-optimal settings. However for the Tegra score they used a newer version of GCC, giving it an unfair advantage. Same thing as with Intel's SPEC scores... This shows how much CPU performance is affected by compiler and settings.
  • theduckofdeath - Thursday, April 4, 2013 - link

    That is not true. A few months ago Anandtech themselves made a direct comparison between the Tegra 3 in the Surface tablet and an Atom processor, and the Atom beat the Tegra 3 both on performance and power efficiency.
  • Wilco1 - Friday, April 5, 2013 - link

    I was talking about similar frequencies - did you read what I said? Yes the first Surface RT is a bit of a disappointment due to the low clocked Tegra 3, but hopefully MS will use a better SoC in the next version. Tegra 4(+) or Exynos Octa would make it shine. We can then see how Atom does against that.
  • SlyNine - Saturday, April 6, 2013 - link

    Nobody cares if the frequencies are different, if one performs better and uses less power that's a win; REGARDLESS OF FREQUENCY.

    Give one good reason, that matters to the consumer and manufacture, for frequencies being an important factor.

Log in

Don't have an account? Sign up now