Kabini vs CT/ARM: GPU Performance

I pulled out 3DMark and GFXBenchmark (formerly GL/DXBenchmark) for some cross platform GPU comparisons. We'll start with 3DMark Ice Storm and its CPU bound multithreaded physics benchmark:

3DMark—Physics

The physics test is a bit unreasonably multithreaded, which is why we see a 75% uplift compared to AMD's E-350. For FP heavy game physics workloads however, Jaguar does quite well. While a big Ivy Bridge is still going to be quicker, AMD's A4-5000 gets surprisingly close given its much lower cost.

The 3DMark graphics test is more of what we're interested in seeing here. Two GCN compute units (128 SPs/cores) running at 500MHz will really put the old Radeon HD 6310 in Brazos to shame:

3DMark—Graphics

The results are quite good. Kabini manages a 61% performance advantage over AMD's old Brazos platform, and actually gets surprisingly close to Intel's HD 4000 in performance. As we discovered earlier, this isn't really enough performance to play modern PC games but casual (and especially tablet) gaming workloads should do wonderfully here.

3DMark—Ice Storm

The overall Ice Storm score just incorporates both physics and graphics test components. As expected, Kabini continues to lead over everything other than the i5-3317U.

Finally we have the GFXBenchmark T-Rex HD test. I threw in a handful of older PC GPUs, although keep in mind that T-Rex HD isn't very memory bandwidth intensive (penalizing some of the old big PC GPUs that had good amounts of memory bandwidth). The test is also better optimized for unified shader architectures, which helps explain the 8500 GT's excellent performance here.

GL/DXBenchmark 2.7—T-Rex HD (Offscreen)

Kabini does very well in this test as well. If we look at the tablet-oriented Temash part (A4-1200) we see that the number of GCN compute units remains unchanged, but max GPU frequency drops to 225MHz from 500MHz. If we assume perfect scaling with GPU clock speed, Temash could offer roughly the same graphics performance as the 4th generation iPad. AMD claims the A4-1200 Temash APU carries a TDP of only 3.9W, a potentially very interesting part from a GPU perspective if our napkin math holds true.

OpenCL Performance

For our last comparison we're looking at the OpenCL performance of these on-die GPUs. We're using a subset of Ryan's GPU Compute workload, partially because many of those tests don't work properly on Kabini yet and also because some of those tests are really built for much more powerful GPUs. We've got LuxMark 2.0 and two CLBenchmark 1.1.3 tests here. Their descriptions follow:

SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

OpenCL GPU Performance
  LuxMark 2.0 CLBenchmark—Vision CLBenchmark—Fluid
AMD A5-5000 (Radeon HD 8330) 18K samples/s 1041 1496
AMD E-350 (Radeon HD 6310) 23K samples/s 292 505
Intel Core i5-3317U (HD 4000) 107K samples/s 819 1383

LuxMark is really a corner case here where Kabini shows a performance regression compared to Brazos. The explanation is simple: some workloads are better suited to AMD's older VLIW GPU architectures. GCN is a scalar architecture that is usually going to net more efficient usage, but every now and then you'll see a slight regression. Intel's HD 4000 actually does amazingly well in the LuxMark 2.0 benchmark.

Our two CLBenchmark tests however paint a very different picture. Here Kabini not only significantly outperforms its predecessor, but is faster than Ivy Bridge as well. 

Kabini vs. Clover Trail & ARM Kabini Windows 8 Laptop Performance
POST A COMMENT

130 Comments

View All Comments

  • georgec84 - Thursday, May 23, 2013 - link

    These chis look great! I hope it can provide AMD with a small spark. They certainly seem to be looking up compared to 2 years ago. Reply
  • Nintendo Maniac 64 - Thursday, May 23, 2013 - link

    I think it would have been interesting if Anand tested the CPU against some older mid-range to high-end CPUs. From my own assessments it looks like Jaguar has slightly better IPC than K8 and is overall comparable to the original Phenom (though obviously without the huge power consumption). Reply
  • JDG1980 - Thursday, May 23, 2013 - link

    I really want to see a comprehensive rundown of single-threaded tests with constant clock rate. We have a rough idea of which architectures have better IPC, but I'd like to see some hard numbers. Reply
  • Streetwind - Thursday, May 23, 2013 - link

    This is the first real step forard for AMD I've seen in nearly a decade... everything else were minor clock speed bumps, experimental architectures that ended up being slower clock-for-clock than the old ones, big iGPUs and shuffling around its product stack to target a changing market with the same technology.

    The performance advantage Intel has accumulated over the years means that AMD can still only really compete via price, but Kabini is finally the kind of product that attempts to narrow the gap with the competition again. Please AMD, more of this! Maybe in one or two years we the consumers will have a real choice in the x86 market again if you keep it up.
    Reply
  • KaarlisK - Thursday, May 23, 2013 - link

    Regarding memory performance: as I understand it, Kabini supports two DIMMs, but only single-channel. Reply
  • darkich - Thursday, May 23, 2013 - link

    Why are you always comparing that dual core ARM chip?
    Why not Octa chip?(like, the best currently available ARM chip)
    And why always avoid using Geekbench, but instead use a heavily software dependant tests?
    This always seems to be case when dealing with ARM on this site.
    Really, it looks like a deliberate undermining of the architecture, in my mind.
    Reply
  • kyuu - Thursday, May 23, 2013 - link

    One: the "octa chip" is really quad-core.

    Two: Geekbench is not a great benchmark utility, especially when comparing cross-platform.

    Three: Attributing an anti-ARM agenda to this website is pretty freakin' silly.
    Reply
  • darkich - Thursday, May 23, 2013 - link

    One: the chip has 4+4 independently operated core clusters.
    Operating at low power cores makes for a very advanced solution, compared to big cores revving down for a certain task.
    Besides, what does your remark have to do with what I said?
    My point is, Octa is a FAR more capable ARM chip than the one used in this comparison.. yet it doesn't cost more, and consumes up to 70% less power.

    Two: as opposed to what? Comparing Chrome for Android with Chrome for Windows?
    Geekbench is not perfect, but it is the best you can try when comparing across platforms.
    It is the ONLY credible comparison of pure processing abilities in this case.

    Three: answer the first two then. What am I missing here?
    Reply
  • darkich - Thursday, May 23, 2013 - link

    Correction..I meant two modules (core clusters), with 4 cores each, of course Reply
  • Wilco1 - Thursday, May 23, 2013 - link

    And those 2x4 cores can run simulataneously with the right software, hence the name Octa.

    I agree with darkich that Anand always appears to show ARM in the worst light, first by only showing JavaScript browser tests rather than native code benchmarks, and second by insisting on the Chrome browser rather than the stock or fastest available browser. For example Geekbench shows that Exynos Octa easily beats Bobcat at the same frequency:

    http://browser.primatelabs.com/geekbench2/compare/...

    This means Jaguar will get very close to A15 - until Cortex-A57 is released of course.
    Reply

Log in

Don't have an account? Sign up now