Kabini vs CT/ARM: GPU Performance

I pulled out 3DMark and GFXBenchmark (formerly GL/DXBenchmark) for some cross platform GPU comparisons. We'll start with 3DMark Ice Storm and its CPU bound multithreaded physics benchmark:

3DMark—Physics

The physics test is a bit unreasonably multithreaded, which is why we see a 75% uplift compared to AMD's E-350. For FP heavy game physics workloads however, Jaguar does quite well. While a big Ivy Bridge is still going to be quicker, AMD's A4-5000 gets surprisingly close given its much lower cost.

The 3DMark graphics test is more of what we're interested in seeing here. Two GCN compute units (128 SPs/cores) running at 500MHz will really put the old Radeon HD 6310 in Brazos to shame:

3DMark—Graphics

The results are quite good. Kabini manages a 61% performance advantage over AMD's old Brazos platform, and actually gets surprisingly close to Intel's HD 4000 in performance. As we discovered earlier, this isn't really enough performance to play modern PC games but casual (and especially tablet) gaming workloads should do wonderfully here.

3DMark—Ice Storm

The overall Ice Storm score just incorporates both physics and graphics test components. As expected, Kabini continues to lead over everything other than the i5-3317U.

Finally we have the GFXBenchmark T-Rex HD test. I threw in a handful of older PC GPUs, although keep in mind that T-Rex HD isn't very memory bandwidth intensive (penalizing some of the old big PC GPUs that had good amounts of memory bandwidth). The test is also better optimized for unified shader architectures, which helps explain the 8500 GT's excellent performance here.

GL/DXBenchmark 2.7—T-Rex HD (Offscreen)

Kabini does very well in this test as well. If we look at the tablet-oriented Temash part (A4-1200) we see that the number of GCN compute units remains unchanged, but max GPU frequency drops to 225MHz from 500MHz. If we assume perfect scaling with GPU clock speed, Temash could offer roughly the same graphics performance as the 4th generation iPad. AMD claims the A4-1200 Temash APU carries a TDP of only 3.9W, a potentially very interesting part from a GPU perspective if our napkin math holds true.

OpenCL Performance

For our last comparison we're looking at the OpenCL performance of these on-die GPUs. We're using a subset of Ryan's GPU Compute workload, partially because many of those tests don't work properly on Kabini yet and also because some of those tests are really built for much more powerful GPUs. We've got LuxMark 2.0 and two CLBenchmark 1.1.3 tests here. Their descriptions follow:

SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

OpenCL GPU Performance
  LuxMark 2.0 CLBenchmark—Vision CLBenchmark—Fluid
AMD A5-5000 (Radeon HD 8330) 18K samples/s 1041 1496
AMD E-350 (Radeon HD 6310) 23K samples/s 292 505
Intel Core i5-3317U (HD 4000) 107K samples/s 819 1383

LuxMark is really a corner case here where Kabini shows a performance regression compared to Brazos. The explanation is simple: some workloads are better suited to AMD's older VLIW GPU architectures. GCN is a scalar architecture that is usually going to net more efficient usage, but every now and then you'll see a slight regression. Intel's HD 4000 actually does amazingly well in the LuxMark 2.0 benchmark.

Our two CLBenchmark tests however paint a very different picture. Here Kabini not only significantly outperforms its predecessor, but is faster than Ivy Bridge as well. 

Kabini vs. Clover Trail & ARM Kabini Windows 8 Laptop Performance
Comments Locked

130 Comments

View All Comments

  • HisDivineOrder - Thursday, May 23, 2013 - link

    Given AMD's traditional design wins and how those systems end up, I suspect this is not going to matter much. I have more hope of Bay Trail providing a solid deal for once than I do this.

    It's a shame because this really should be AMD's niche to dominate, but I doubt any OEM'll give them a serious try.
  • Desperad@ - Thursday, May 23, 2013 - link

    On competitive positioning, is it even near IB Pentium?
  • brainee - Saturday, May 25, 2013 - link

    I think so, yes. IB Pentium 2117U (17 Watt TDP) should be around 33 % faster in legacy Intel-optimised CPU benchmarks doing the math and according to say Techspot. I would think ULV Pentiums are more expensive for OEMs, notebooks is a different story. Not to mention Kabini should cost a fraction to make for AMD compared with even crippled 2C Ivybridges aka Celeron / Pentium. Kabini wins in games and Open CL, and in AVX-enabled applications it should eat the Pentium alive since the latter doesn't support AVX extensions (should be mentioned at least). I'd prefer AVX extensions to Cinebench but this site seems to suggest I am a minority...
  • yhselp - Saturday, May 25, 2013 - link

    Comparing a 3W SoC (Z2760) to a 15W SoC (A4-5000), and calling the former laughable... not really fair.

    Sure, Kabini is definitely faster than the old Atom architecture and, yes, I understand this is not a definitive comparison; nevertheless - it seems misleading.

    What would happen if we compare a 3W Kabini to a 15W Haswell? Laughable wouldn't even begin to describe the performance difference.
  • silverblue - Saturday, May 25, 2013 - link

    But... an A4-5000 doesn't use anywhere near 15W, as far as I've heard. Still, let's consider the evidence - the Z2760 is a 32-bit, dual core, hyperthreaded CPU at 1.8GHz with a low powered graphics unit and 1MB of L2. The A4-5000 is a 64-bit, quad core CPU at 1.5GHz with a far stronger graphics unit and 2MB of dynamic L2. Temash would be a different proposition I expect as the A4-1200 is only clocked at 1GHz.
  • yhselp - Saturday, May 25, 2013 - link

    Yes, absolutely, I agree - it's just that the direct comparisons and conclusions made are a bit stark.

    There's always another side to an argument; in your case, I could argue that comparing the brand new Jaguar to a terribly old Atom architecture isn't the way to go. Consider the following evidence - Silvermont is 64-bit, quad-core, 2MB L2 cache, OoO, 2GHz+, 22nm, far more energy efficient, supports 1st gen Core instructions and Turbo Boost; it would decimate Jaguar.

    In the article, I also discovered that the 2020M is referred to as a 1.8GHz 35W part, when it's actually 2.4GHz. Are the benchmarks done on a underclocked 2020M or was that simply a typo?

    That's the kind of stuff I'm talking about, not AMD vs. Intel.
  • jcompagner - Sunday, May 26, 2013 - link

    So this is the core that will be in the next 2 big consoles?
    Am i the only one that think that these are quite weak, even if you have 8 of them?

    That does mean now that if one of those 2 consoles are the lead in the development that the games will be forced to be really good multi threaded. (So i guess the next games for the pc will also be using multiply cores way more)

    Why did they go for the jaguar core thats really targetted for ver low end or mobile stuff?

    Why didn't they just go for a Richland 8 core system with a very good gpu that lets say is a 100W part?

    What is the guess that the TDP is of the xbox one or ps4? A console can take 100W just easily that doesn't matter, so why choose for a core that is dedicated for mobile?
  • yhselp - Sunday, May 26, 2013 - link

    Yes, the Jaguar core is 'weak', but what does 'weak' mean? That is such a vague definition. For one usage scenario Jaguar might be unacceptable, for another it might be overkill. Remember, Sony/MS are not building a contemporary PC. Jaguar might seem slow to us, and in a gaming desktop it would be, but that's not the point. Think of consoles, in this case the PS4 and the Xbox One, as non-PC devices such as tablets. Would you say the latest Samsung/Apple running on a Cortex A15 is slow? No, you would say it's super fast. Well, Jaguar is even faster. Yes, a console has to deal with different workloads than a tablet, but that's why it has very different hardware.

    Why did Sony/MS choose Jaguar? Jaguar is easier to integrated, more power efficient and most importantly cheaper than Richland. It's a far simpler architecture than Richland, and probably easier to work with in a console's life. Also, it's very important to note that Sony/MS wanted an integrated solution - they weren't going to build a system with a dedicated video card like a gaming PC.

    Cost, cost, cost - everything is about the cost. A console cannot be expensive (the way a gaming PC is) - it has to sell very well in order to establish an install base to sell games to. Sony/MS will probably sell their 8th gen consoles at a loss initially - AMD's Jaguar/GCN was their best/only choice. What else could they do at the same price or even at all? Silvermont isn't ready yet and NVIDIA probably wouldn't be willing to integrate a GPU of theirs the way AMD did, and both of those would be more expensive than Jaguar/GCN. Not to mention, MS has had a ton of trouble with NVIDIA in the original Xbox - they are probably not willing to go down that road again.

    It's not really an 8-core solution - it's two quad-core modules and communication between the two might be problematic; so games on the new PS/Xbox would probably run on four Jaguar cores at 1.6 GHz. However, don't forget that neither of the two consoles has a ton of raw graphics power under the hood - the Xbox GPU is roughly equivalent to an HD 7770 (but with better memory bandwidth), and the PS to an HD 7850. Games would be specifically developed for this kind of hardware (unlike PC games) and would most probably be GPU limited so the Jaguar cores would really be sufficient.

    I hope this answers your questions.
  • Kevin G - Monday, May 27, 2013 - link

    A Pile Driver module is much larger than a Jaguar core. For die size concerns, it going with Jaguar made sense if core counts are the same. Steam Roller cores are due out in 2014 which are expected to bring higher IPC and a slight clock speed increase compared to Pile Driver.

    Power consumption is also an issue. The bulk of the power consumption from the XBox One and PS4 SoC's will come from their GPU's. Adding a high power CPU core like Pile Driver would have ballooned power consumption close to 200W which makes cooling impractical and expensive. Jaguar still adds power but it is far more manageable in comparison.

    In addition, Steam Roller is tied to processes from Global Foundries (though IBM could likely manufacture them if need be). TSMC is the preferred foundry for bulk processes due to cost and a slight edge in density. Jaguar has been prepared to be manufactured at TSMC from the start. AMD could have stuck with GF but it would have had to port GCN functional units to that same process. Such efforts are currently underway for Kaveri that is looking to be a 2014 part. So for any type of 2013 launch, going that route was not an option.
  • aikyucenter - Sunday, June 30, 2013 - link

    Great OpenCL performance ... love it ... just make it faster launch and decrease TDP too = PERFECT :D

Log in

Don't have an account? Sign up now