Kabini vs CT/ARM: GPU Performance

I pulled out 3DMark and GFXBenchmark (formerly GL/DXBenchmark) for some cross platform GPU comparisons. We'll start with 3DMark Ice Storm and its CPU bound multithreaded physics benchmark:

3DMark—Physics

The physics test is a bit unreasonably multithreaded, which is why we see a 75% uplift compared to AMD's E-350. For FP heavy game physics workloads however, Jaguar does quite well. While a big Ivy Bridge is still going to be quicker, AMD's A4-5000 gets surprisingly close given its much lower cost.

The 3DMark graphics test is more of what we're interested in seeing here. Two GCN compute units (128 SPs/cores) running at 500MHz will really put the old Radeon HD 6310 in Brazos to shame:

3DMark—Graphics

The results are quite good. Kabini manages a 61% performance advantage over AMD's old Brazos platform, and actually gets surprisingly close to Intel's HD 4000 in performance. As we discovered earlier, this isn't really enough performance to play modern PC games but casual (and especially tablet) gaming workloads should do wonderfully here.

3DMark—Ice Storm

The overall Ice Storm score just incorporates both physics and graphics test components. As expected, Kabini continues to lead over everything other than the i5-3317U.

Finally we have the GFXBenchmark T-Rex HD test. I threw in a handful of older PC GPUs, although keep in mind that T-Rex HD isn't very memory bandwidth intensive (penalizing some of the old big PC GPUs that had good amounts of memory bandwidth). The test is also better optimized for unified shader architectures, which helps explain the 8500 GT's excellent performance here.

GL/DXBenchmark 2.7—T-Rex HD (Offscreen)

Kabini does very well in this test as well. If we look at the tablet-oriented Temash part (A4-1200) we see that the number of GCN compute units remains unchanged, but max GPU frequency drops to 225MHz from 500MHz. If we assume perfect scaling with GPU clock speed, Temash could offer roughly the same graphics performance as the 4th generation iPad. AMD claims the A4-1200 Temash APU carries a TDP of only 3.9W, a potentially very interesting part from a GPU perspective if our napkin math holds true.

OpenCL Performance

For our last comparison we're looking at the OpenCL performance of these on-die GPUs. We're using a subset of Ryan's GPU Compute workload, partially because many of those tests don't work properly on Kabini yet and also because some of those tests are really built for much more powerful GPUs. We've got LuxMark 2.0 and two CLBenchmark 1.1.3 tests here. Their descriptions follow:

SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

OpenCL GPU Performance
  LuxMark 2.0 CLBenchmark—Vision CLBenchmark—Fluid
AMD A5-5000 (Radeon HD 8330) 18K samples/s 1041 1496
AMD E-350 (Radeon HD 6310) 23K samples/s 292 505
Intel Core i5-3317U (HD 4000) 107K samples/s 819 1383

LuxMark is really a corner case here where Kabini shows a performance regression compared to Brazos. The explanation is simple: some workloads are better suited to AMD's older VLIW GPU architectures. GCN is a scalar architecture that is usually going to net more efficient usage, but every now and then you'll see a slight regression. Intel's HD 4000 actually does amazingly well in the LuxMark 2.0 benchmark.

Our two CLBenchmark tests however paint a very different picture. Here Kabini not only significantly outperforms its predecessor, but is faster than Ivy Bridge as well. 

Kabini vs. Clover Trail & ARM Kabini Windows 8 Laptop Performance
POST A COMMENT

130 Comments

View All Comments

  • darkich - Friday, May 24, 2013 - link

    There you go.

    AnandTech, speak up!
    I'll take silence as a confirmation that I was right
    Reply
  • JarredWalton - Friday, May 24, 2013 - link

    Most of the smartphone/tablet testing is done elsewhere (Brian for Smartphones, Anand for tablets). Given we're looking at tablets and laptops here, comparing performance to a Smartphone would be silly, so then we need to find a tablet with the Octa...which doesn't exist except in prototype form.

    As for the "octa" having eight cores, that's true, but it typically only runs four at a time -- either the four A7 or the four A15. With the right software (basically only a benchmark designed to do something the Galaxy S4 won't ever do on its own), you can get the theoretical performance, but in practice you won't ever get this (at least not on the only currently shipping Exynos 5 device).

    Finally, as pointed about by Kyuu, Geekbench is not a great benchmark. Sure, it can tell you some theoretical performance numbers, but many of the tests have very little to do with real workloads. I don't think we've ever used Geekbench outside of some smartphone testing, just like we don't generally report things like SuperPi or Sandra performance. Then again, I don't necessarily like Cinebench or x264 HD much either. If you want the Geekbench results, here's the 32-bit numbers for the A4-5000: 2987

    browser.primatelabs.com/geekbench2/1983485
    Reply
  • Exophase - Friday, May 24, 2013 - link

    If you're doing an SoC comparison I don't see why it matters if that SoC runs on a phone instead of a tablet. And I understand that this review may not be an SoC review, but that's what a lot of people are looking for right now.

    Geekbench's integer tests aren't that bad. Crypto, bz2 and jpeg compression/decompression done in native code are actually relatively common tasks on a variety of hardware. The code being ran on the lua test (prime testing) is junk, but since lua is interpreted most of the measurement is with how well it does with interpreters and running junk code doesn't make much difference.

    IMO your criticism applies more to Kraken which you conspicuously left out of your list of not so great (but we use them anyway) benchmarks. I gave a bunch of reasons why I don't like it in an earlier post, but I'd like to add a little bit to that - it's not just that it does a lot of DSP (audio and image processing) and crypto stuff but that these tests take up proportionately a lot more of the runtime, drowning out the little path finding and string parsing scores.

    These tasks (DSP and crypto) are useful on a variety of platforms like Geekbench's, but the problem is that they're greatly distorted by being executed in Javascript - which is not where it'll usually be ran. It's going to have a hard time optimizing beyond double precision - assuming the code wasn't intended to be double precision in the first place, which would make it even less relevant. It'll have a lot of memory overhead issues and vectorization is pretty much out of the question, despite these being vector-friendly operations. This all makes it a bad proxy for how native code would perform at these tasks, especially if we're comparing with hand optimized SSE and NEON.
    Reply
  • Wilco1 - Saturday, May 25, 2013 - link

    No, the right software is a Linux kernel patch which allows all 8 cores to be used, and S4 will be upgraded to use it. Although it will improve performance, the actual goal is lower power consumption because you can now mix and match cores. Today a single high performance task forces all processes to use A15 even when they don't need it, and when the task finishes all processes have to be migrated back again. In the new world you enable 1 A15 as needed and keep 1 or 2 A7's running the background processes.

    Like most benchmarks, Geekbench is not perfect. But I agree with Exophase it is most definitely a lot better than JavaScript benchmarks. Geekbench does test real workloads (many of the tests is actual code people use), quite unlike JS benchmarks, which have nothing to do with browsing performance, let alone CPU performance.

    The state of smartphone/tablet benchmarking is a shambles - and this is an opportunity for AnandTech to make a difference. You could take a set of Linux benchmarks (eg. freely available versions of SPEC subsets, Phoronix and other common benchmarks like the ones used in Geekbench) and create an app for Android and iOS.

    Thanks for the Geekbench link, integer performance of Jaguar is slightly better than I expected vs Exynos Octa (http://browser.primatelabs.com/geekbench2/compare/... This may be partly due comparing a phone SoC with a laptop SoC (Jaguar has a major advantage on the memory/stream part), but this kind of detailed comparison is far more interesting and revealing relative strengths and weaknesses in the microarchitectures than looking at JS performance.
    Reply
  • darkich - Tuesday, May 28, 2013 - link

    Wow.. can you give a source about that kernel update?
    I can imagine all eight cores mixing would be beneficial on all areas.
    While four A15 cores can work asynchronously between each other(independently change frequency, idle/sleep state), their voltage is inherently higher that that of A7 cores.
    If the A57 soc will be able to mix cores too, then that will be an overall amazing prospect.

    And I completely agree about Geekbench.. no matter how realistic workloads it represents, it beyond any doubt DOES give an idea of raw processing power.
    It's ridiculous to neglect that.
    Reply
  • Wilco1 - Thursday, May 30, 2013 - link

    Note there are actually 3 different variants of big.Little software, ARM's hypervisor code which is OS unaware, the Linaro In-Kernel-Switcher and MP switcher (the latter supports 8 cores).

    This is the team developing the big.Little MP software: https://wiki.linaro.org/projects/big.LITTLE.MP. Here is a presentation: http://www.linaro.org/documents/download/6d58a63e4...

    Yes A57 supports big.Little with A53.
    Reply
  • darkich - Saturday, June 1, 2013 - link

    Thank you.

    This makes me wonder about the Snapdragon 800 for the Note 3 rumours..an upclocked Octa on that kernel should really be more than good enough.
    Only advantage I can see in snapdragon is the GPU..adreno 330 looks like a whole step above from anything on the market right now.
    Reply
  • Gaugamela - Thursday, May 23, 2013 - link

    This was a really underwhelming review... Comparing Kabini to a Intel Core i5 Ivy Bridge. Really?
    Why don't you make the charts with relevant comparisons instead of forcing people to dig through benchmarks to find comparable CPUs?

    And you just got to wonder what's the point of this sentence:
    "After all the bad news in terms of performance (not that it’s really bad, but it can certainly look that way at times), the good news is that not only is Kabini noticeably faster than Brazos, but it’s also mighty frugal when it comes to power use. "

    Bad news in terms of performance??? Why, because it doesn't compete with an i5 Ivy Bridge??
    If anyone wants to read a decent review to Kabini, with more comparisons to relevant notebooks head on over to Notebookcheck.net.
    Here's the link: http://translate.google.com/translate?sl=de&tl...

    To sum up: The Kabini A4-5000 is competitive with a Sandy Bridge i3 in terms of CPU and GPU performance (number of cores compensating for lower single thread performance) and it sometimes shadows an Ivy Bridge i3.
    Reply
  • JarredWalton - Thursday, May 23, 2013 - link

    That's being awfully generous on "competitive". Single-threaded, i3-3217U is about twice as fast as A4-5000; multi-threaded it's only about 20% faster. In their graphics testing, the HD 4000 in an i3-3217U is consistently leading by 20-40%. That's a Core i3 laptop with Ivy Bridge that you can get for under $500, right now, and it's ahead by 20% or more in every test I looked at...and Core i3 with HD 4000 isn't exactly known for being a performance monster.

    I'd say that AMD is over-reaching with their targets; A6 is more like a match for Pentium, A4 for Celeron, and anything below that isn't really worth discussing (i.e. Atom). When we see the Haswell update next month, the margin in favor of Intel will only increase, but at least I don't think AMD will have to worry about ULV i3 Haswell for a few more months. Based on currently available laptops, Kabini needs to be well under $500 to compete -- or I'd say $500 is acceptable if you get a decent LCD.
    Reply
  • Gaugamela - Thursday, May 23, 2013 - link

    Considering that Notebookcheck said this:

    "Even though the A4-5000 on paper only slightly higher clocked than the recently tested A6-1450 , the performance differences in practice are quite large. The reason for this is the higher TDP Classification: Not to exceed its maximum consumption of 8 watts (without "turbo Dock"), the A6-1450 can achieve the full turbo of 1.4 GHz only with utilization of a single core; under full load decreases the frequency contrast decreases to just over 1.0 GHz. Thanks to constant fitting the A4-5000 1.5 GHz can settle in some benchmarks by almost 50 percent, and so makes a clear leap forward.

    , When all four cores, the APU beats even just the Core i3-2367M and comes in part the newer Core i3-3217U close. However, the gap in the per-thread performance remains impressive: Even a Pentium 987 per core expects at least 50 percent faster. Although the parallelization of modern applications has been greatly improved, you should not completely exclude this point.

    In everyday life, the tester provided by AMD still feels quite fast and responsive. The more power than the A6-1450 or the previous E2-1800 is quite noticeable, could additionally by a turbo mode but even higher - a pity that the A4-5000 have to do without this feature. For office and multimedia applications including full HD video, the rich, however, reserves the APU from perfect. "

    I'll go by their words since they have a more thorough review than the poor job you guys did here. The A4-5000 beats the Pentium in their benchmarks - except in single threaded performance - in every aspect. The Kabini GPU is comparable with the HD3000 in many of the graphical benchmarks and can run some non-demanding or old games. Hands-down the A4 eats the Pentium brand, so no AMD isn't over-reaching with their targets - have you tested the A6 yet and compared it with an i3 IB?

    And why are you talking about Haswell, when Kabini sits below the Intel Core brand? AMD defeats Intel below the Cores and this just confirms that.
    Now if you want to talk about Richland versus Ivy Bridge and Haswell I'll concede that AMD is really behind and Steamroller can't come soon enough.
    Reply

Log in

Don't have an account? Sign up now