Choosing a Testbed CPU

Although I was glad I could put some of these old GPUs to use (somewhat justifying them occupying space for years in my parts closet), there was the question of what CPU to pair them with. Go too insane on the CPU and I may unfairly tilt performance in favor of these cards. What I decided to do was to simulate the performance of the Core i5-3317U in Microsoft's Surface Pro. That part is a dual-core Ivy Bridge with Hyper Threading enabled (4 threads). Its max turbo is 2.6GHz for a single core, 2.4GHz for two cores. I grabbed a desktop Core i3 2100, disabled turbo, and forced its default clock speed to 2.4GHz. In many cases these mobile CPUs spend a lot of time at or near their max turbo until things get a little too toasty in the chassis. To verify that I had picked correctly I ran the 3DMark Physics test to see how close I came to the performance of the Surface Pro. As the Physics test is multithreaded and should be completely CPU bound, it shouldn't matter what GPU I paired with my testbed - they should all perform the same as the Surface Pro:

3DMark - Physics Test

3DMark - Physics

Great success! With the exception of the 8500 GT, which for some reason is a bit of an overachiever here (7% faster than Surface Pro), the rest of the NVIDIA cards all score within 3% of the performance of the Surface Pro - despite being run on an open-air desktop testbed.

With these results we also get a quick look at how AMD's Bobcat cores compare against the ARM competitors it may eventually do battle with. With only two Bobcat cores running at 1.6GHz in the E-350, AMD actually does really well here. The E-350's performance is 18% better than the dual-core Cortex A15 based Nexus 10, but it's still not quite good enough to top some of the quad-core competitors here. We could be seeing differences in drivers and/or thermal management with some of these devices since they are far more thermally constrained than the E-350. Bobcat won't surface as a competitor to anything you see here, but its faster derivative (Jaguar) will. If AMD can get Temash's power under control, it could have a very compelling tablet platform on its hands. The sad part in all of this is the fact that AMD seems to have the right CPU (and possibly GPU) architectures to be quite competitive in the ultra mobile space today. If AMD had the capital and relationships with smartphone/tablet vendors, it could be a force to be reckoned with in the ultra mobile space. As we've seen from watching Intel struggle however, it takes more than just good architecture to break into the new mobile world. You need a good baseband strategy and you need the ability to get key design wins.

Enough about what could be, let's look at how these mobile devices stack up to some of the best GPUs from 2004 - 2007.

We'll start with 3DMark. Here we're looking at performance at 720p, which immediately stops some of the cards with 256-bit memory interfaces from flexing their muscles. Never fear, we will have GL/DXBenchmark's 1080p offscreen mode for that in a moment.

Graphics Test 1

Ice Storm Graphics test 1 stresses the hardware’s ability to process lots of vertices while keeping the pixel load relatively light. Hardware on this level may have dedicated capacity for separate vertex and pixel processing. Stressing both capacities individually reveals the hardware’s limitations in both aspects.

In an average frame, 530,000 vertices are processed leading to 180,000 triangles rasterized either to the shadow map or to the screen. At the same time, 4.7 million pixels are processed per frame.

Pixel load is kept low by excluding expensive post processing steps, and by not rendering particle effects.

3DMark - Graphics Test 1

Right off the bat you should notice something wonky. All of NVIDIA's G70 and earlier architectures do very poorly here. This test is very heavy on the vertex shaders, but the 7900 GTX and friends should do a lot better than they are. These workloads however were designed for a very different set of architectures. Looking at the unified 8500 GT, we get some perspective. The fastest mobile platforms here (Adreno 320) deliver a little over half the vertex processing performance of the GeForce 8500 GT. The Radeon HD 6310 featured in AMD's E-350 is remarkably competitve as well.

The praise goes both ways of course. The fact that these mobile GPUs can do as well as they are right now is very impressive.

Graphics Test 2

Graphics test 2 stresses the hardware’s ability to process lots of pixels. It tests the ability to read textures, do per pixel computations and write to render targets.

On average, 12.6 million pixels are processed per frame. The additional pixel processing compared to Graphics test 1 comes from including particles and post processing effects such as bloom, streaks and motion blur.

In each frame, an average 75,000 vertices are processed. This number is considerably lower than in Graphics test 1 because shadows are not drawn and the processed geometry has a lower number of polygons.

3DMark - Graphics Test 2

The data starts making a lot more sense when we look at the pixel shader bound graphics test 2. In this benchmark, Adreno 320 appears to deliver better performance than the GeForce 6600 and once again roughly half the performance of the GeForce 8500 GT. Compared to the 7800 GT (or perhaps 6800 Ultra), we're looking at a bit under 33% of the performance of those cards. The Radeon HD 6310 in AMD's E-350 appears to deliver performance competitive with the Adreno 320.

3DMark - Graphics

The overall graphics score is a bit misleading given how poorly the G7x and NV4x architectures did on the first graphics test. We can conclude that the E-350 has roughly the same graphics performance as Qualcomm's Snapdragon 600, while the 8500 GT appears to have roughly 2x that. The overall Ice Storm scores pretty much repeat what we've already seen:

3DMark - Ice Storm

Again, the new 3DMark appears to unfairly penalize the older non-unified NVIDIA GPU architectures. Keep in mind that the last NVIDIA driver drop for DX9 hardware (G7x and NV4x) is about a month older than the latest driver available for the 8500 GT.

It's also worth pointing out that Ice Storm also makes Intel's HD 4000 look very good, when in reality we've seen varying degrees of competitiveness with discrete GPUs depending on the workload. If 3DMark's Ice Storm test could map to real world gaming performance, it would mean that devices like the Nexus 4 or HTC One would be able to run BioShock 2-like titles at 10x7 in the 20 fps range. As impressive as that would be, this is ultimately the downside of relying on these types of benchmarks to make comparisons - they fundamentally tell us how well these platforms would run the benchmark itself, not other games unfortunately.

At a high level, it looks like we're somewhat narrowing down the level of performance that today's high end ultra mobile GPUs deliver when put in discrete GPU terms. Let's see what GL/DXBenchmark 2.7 tell us.

Digging Through the Parts Closet GL/DXBenchmark 2.7 & Final Words
Comments Locked

128 Comments

View All Comments

  • Wilco1 - Friday, April 5, 2013 - link

    Yes frequency still matters. Surface RT looks bad because MS chose the lowest frequency. If they had used the 1.7GHz Tegra 3 instead then Surface RT would look a lot more competitive just because of the frequency.

    So my point stands and is confirmed by your link: at similar frequencies Tegra 3 beats the Z-2760 even on SunSpider.
  • tech4real - Friday, April 5, 2013 - link

    but why do we have to compare them at similar frequencies? one of atom's strength is working at high freq within thermal budget. If tegra 3 can't hit 2GHz within power budget, it's nvidia/arm's problem. why should atom bother to downclock itself.
  • Wilco1 - Friday, April 5, 2013 - link

    There is no need to clock the Atom down - typical A9-based tablets are at 1.6 or 1.7GHz. Yes an Z-2760 beats a 1.3GHz Tegra 3 on SunSpider, but that's not true for Cortex-A9's used today (Tegra 3 goes up to 1.7GHz, Exynos 4 does 1.6GHz), let alone future ones. So it's incorrect to claim that Atom is generally faster than A9 - that implies Atom has an IPC advantage (which it does not have - it only wins if it has a big frequency advantage). I believe MS made a mistake by choosing the slowest Tegra 3 for Surface RT as it gives RT as well as Tegra a bad name - hopefully they fix this in the next version.

    Beating an old low clocked Tegra 3 on performance/power is not all that difficult, however beating more modern SoCs is a different matter. Pretty much all ARM SoCs are already at 28 or 32nm, while Tegra 3 is still 40nm. That will finally change with Tegra 4.
  • tech4real - Sunday, April 7, 2013 - link

    Based on this anand article
    http://www.anandtech.com/show/6340/intel-details-a...
    the linearly projected 1.7GHz Tegra 3 specint2000 score is about 1.12, while the 1.8Ghz atom stands at 1.20, so the gap is still there. If you consider 2GHz atom turbo case, we can argue the gap is even wider. Of course since this specint data is provided by intel, we have to take it with a grain of salt, but i think the general idea has its merit.
  • Wilco1 - Monday, April 8, 2013 - link

    Those are Intel marketing numbers indeed - Intel uses compiler tricks to get good SPEC results, and this doesn't translate to real world performance or help Atom when you use a different compiler (Android uses GCC, Windows uses VC++).

    Geekbench gives a better idea of CPU performance:

    http://browser.primatelabs.com/geekbench2/compare/...

    A 1.6GHz Exynos 4412 soundly thrases the Z-2760 at 1.8GHz on the integer, FP and memory performance. Atom only wins the Stream test. Before you go "but but Atom has only 2 cores!", it has 4 threads, so it is comparable with 4 cores, and in any case it loses all but 3 single thread benchmarks despite having a 12.5% frequency advantage.

    There are also several benchmark runs by Phoronix which test older Atoms against various ARM SoCs using the same Linux kernel and GCC compiler across a big test suite of benchmarks which come to the same conclusion. This is what I base my opinion of, not some Intel marketing scores blessed by Anand or some rubbish Javascript benchmark.
  • tech4real - Wednesday, April 10, 2013 - link

    Cross ISA cross platform benchmarking is a daunting task to be done fairly, or at least trying to :-)
    SPEC benchmark has established its position after many years of tuning, and I think most people would prefer using it to gauge processor performance. If samsung or nvidia believe they can do a better job to showcase their CPUs than Intel(which I totally expect they could do better, after all it doesn't make sense for intel to spend time tuning its competitors' products), they can publish their SPEC scores. However in the absence of that, it's very hard to argue samsung/nvidia/arm has a better performing product. Remember "the worst way to lose a fight is by not showing up".
    I don't have much knowledge of these new benchmark suites, and they may well be decent, but it takes time to mature and gain professional acceptance.
    A past example of taking hobby benchmarks on face value too seriously is: back in early 2011, nvidia showed a tegra 3 is performing on the same level as(or faster than?) core 2 duo T7200 under CoreMarks. Needless to say, now we all know tegra 3 in real life is around atom level performance. This shows there is a reason we have and need a benchmark suite like SPEC.
  • Wilco1 - Sunday, April 14, 2013 - link

    SPEC is hardly used outside high-end server CPUs (it's difficult to even run SPEC on a mobile phone due to memory and storage constraints). However the main issue is that Intel has tuned its compiler to SPEC, giving it an unfair advantage. Using GCC results in a much lower score. The funny thing is, GCC typically wins on real applications (I know because I have done those comparisons). That makes Intel's SPEC scores useless as an indication of actual CPU speed in real world scenarios. Yes, ARM, NVidia, Samsung etc could tune GCC in the same way by pouring in 10's of millions over many years (it really takes that much effort). But does it really make sense to use compiler tricks to pretend you are faster?

    The NVidia T7200 claim was based on a public result on the EEMBC website that was run by someone else. It used an old GCC version with non-optimal settings. However for the Tegra score they used a newer version of GCC, giving it an unfair advantage. Same thing as with Intel's SPEC scores... This shows how much CPU performance is affected by compiler and settings.
  • theduckofdeath - Thursday, April 4, 2013 - link

    That is not true. A few months ago Anandtech themselves made a direct comparison between the Tegra 3 in the Surface tablet and an Atom processor, and the Atom beat the Tegra 3 both on performance and power efficiency.
  • Wilco1 - Friday, April 5, 2013 - link

    I was talking about similar frequencies - did you read what I said? Yes the first Surface RT is a bit of a disappointment due to the low clocked Tegra 3, but hopefully MS will use a better SoC in the next version. Tegra 4(+) or Exynos Octa would make it shine. We can then see how Atom does against that.
  • SlyNine - Saturday, April 6, 2013 - link

    Nobody cares if the frequencies are different, if one performs better and uses less power that's a win; REGARDLESS OF FREQUENCY.

    Give one good reason, that matters to the consumer and manufacture, for frequencies being an important factor.

Log in

Don't have an account? Sign up now