Kabini vs CT/ARM: GPU Performance

I pulled out 3DMark and GFXBenchmark (formerly GL/DXBenchmark) for some cross platform GPU comparisons. We'll start with 3DMark Ice Storm and its CPU bound multithreaded physics benchmark:

3DMark—Physics

The physics test is a bit unreasonably multithreaded, which is why we see a 75% uplift compared to AMD's E-350. For FP heavy game physics workloads however, Jaguar does quite well. While a big Ivy Bridge is still going to be quicker, AMD's A4-5000 gets surprisingly close given its much lower cost.

The 3DMark graphics test is more of what we're interested in seeing here. Two GCN compute units (128 SPs/cores) running at 500MHz will really put the old Radeon HD 6310 in Brazos to shame:

3DMark—Graphics

The results are quite good. Kabini manages a 61% performance advantage over AMD's old Brazos platform, and actually gets surprisingly close to Intel's HD 4000 in performance. As we discovered earlier, this isn't really enough performance to play modern PC games but casual (and especially tablet) gaming workloads should do wonderfully here.

3DMark—Ice Storm

The overall Ice Storm score just incorporates both physics and graphics test components. As expected, Kabini continues to lead over everything other than the i5-3317U.

Finally we have the GFXBenchmark T-Rex HD test. I threw in a handful of older PC GPUs, although keep in mind that T-Rex HD isn't very memory bandwidth intensive (penalizing some of the old big PC GPUs that had good amounts of memory bandwidth). The test is also better optimized for unified shader architectures, which helps explain the 8500 GT's excellent performance here.

GL/DXBenchmark 2.7—T-Rex HD (Offscreen)

Kabini does very well in this test as well. If we look at the tablet-oriented Temash part (A4-1200) we see that the number of GCN compute units remains unchanged, but max GPU frequency drops to 225MHz from 500MHz. If we assume perfect scaling with GPU clock speed, Temash could offer roughly the same graphics performance as the 4th generation iPad. AMD claims the A4-1200 Temash APU carries a TDP of only 3.9W, a potentially very interesting part from a GPU perspective if our napkin math holds true.

OpenCL Performance

For our last comparison we're looking at the OpenCL performance of these on-die GPUs. We're using a subset of Ryan's GPU Compute workload, partially because many of those tests don't work properly on Kabini yet and also because some of those tests are really built for much more powerful GPUs. We've got LuxMark 2.0 and two CLBenchmark 1.1.3 tests here. Their descriptions follow:

SmallLuxGPU is an OpenCL accelerated ray tracer that is part of the larger LuxRender suite. Ray tracing has become a stronghold for GPUs in recent years as ray tracing maps well to GPU pipelines, allowing artists to render scenes much more quickly than with CPUs alone.

CLBenchmark contains a number of subtests; we’re focusing on the most practical of them, the computer vision test and the fluid simulation test. The former being a useful proxy for computer imaging tasks where systems are required to parse images and identify features (e.g. humans), while fluid simulations are common in professional graphics work and games alike.

OpenCL GPU Performance
  LuxMark 2.0 CLBenchmark—Vision CLBenchmark—Fluid
AMD A5-5000 (Radeon HD 8330) 18K samples/s 1041 1496
AMD E-350 (Radeon HD 6310) 23K samples/s 292 505
Intel Core i5-3317U (HD 4000) 107K samples/s 819 1383

LuxMark is really a corner case here where Kabini shows a performance regression compared to Brazos. The explanation is simple: some workloads are better suited to AMD's older VLIW GPU architectures. GCN is a scalar architecture that is usually going to net more efficient usage, but every now and then you'll see a slight regression. Intel's HD 4000 actually does amazingly well in the LuxMark 2.0 benchmark.

Our two CLBenchmark tests however paint a very different picture. Here Kabini not only significantly outperforms its predecessor, but is faster than Ivy Bridge as well. 

Kabini vs. Clover Trail & ARM Kabini Windows 8 Laptop Performance
Comments Locked

130 Comments

View All Comments

  • whyso - Thursday, May 23, 2013 - link

    kabini only has a single channel memory controller. Going to two DIMMS would not improve performance at all.
  • Gaugamela - Thursday, May 23, 2013 - link

    He didn't mention dual channel memory not even once in his comment. He mentioned comparing a inexpensive chip that competes against Atoms/Pentiums in price/performance/TDP with an intel i5 which is a much more powerful chip.
  • Roland00Address - Thursday, May 23, 2013 - link

    I wonder how starvered the cores are for memory bandwith. Even the xbox one will be using quad channel memory (4 64 bit controllers) and 2133 mhz instead of 1600, that is over five times the bandwidth. The ps4 keeps the quad channel but uses gddr5 memory
  • whyso - Thursday, May 23, 2013 - link

    Well the xbox one has 6x the igp and the ps4 has 9x the igp.
  • Roland00Address - Thursday, May 23, 2013 - link

    I am not asking if the gpu is bandwidth starved but is the cpu bandwidth starved

    I understand why amd went single channel, I am just curious if dual channel would make a difference
  • whyso - Thursday, May 23, 2013 - link

    CPU is almost never bandwidth starved. Literally, you can run a 3770k on 1033 mhz ram and outside winrar you will never notice any differences compared to 2133 mhz ram. 3770k is dual channel but more than 4x as powerful as this kabini apu.
  • Roland00Address - Thursday, May 23, 2013 - link

    Your statement that an i7 does not need faster memory does not necessarily translate to this situation with jaguar and that is why I am curious if the cores are starved. I wish it was possible for anandtech to test this (but it is very hard to do so for there isn't really laptop ram faster than 1600 mhz for laptops). Here is why you can't just assume what works with an i7 translates with the jaguar chip.

    533 mhz dual channel is equivalent to 1066 single channel. (Remember i7-3770k is dual channel while amd jaguar in this implementation is single channel). Thus you can't just assume an i7 is fine with dual channel, than the jaguar would be fine with single channel for the i7 has 100% more bandwidth due to it being dual channel.

    Furthermore i7-3770k is intels high performance architecture with large r&d vs amd's energy efficent architecture with small r&d chip. It is very likely intel has better data predictors in their i7 so there's less cache miss and thus you don't care about the memory speed. Intel R&D money is a big deal and translates into better IPC for many reasons including branch prediction.

    Lets put it this way arm is going higher bandwidth for more ipc. Tegra 3 is a mere single channel 32 bit controller, Tegra 4 is likely to be a dual channel 32 bit controller, Exynos 5 uses a dual channel 32 bit controller. Apple on their Ipads now use a quad channel 32 bit controller. One reason why exynos and ipad were faster than tegra 3 was better memory bandwidth.

    I am not saying you will see improvements greater than 20% but I am curious if faster memory speeds would cause ipc to go up from 5 to 20%. For example with bulldozer amd gets 8% faster encoding with x264 first pass with 2133 memory dual channel instead of 1333 dual channel according to vr-zone.

    It may not be in amd best interest to make the jaguar chip dual channel 64 bit for cheap laptops that compete against intel's celeron or pentium lines, for laptops. Yet at the same time Jaguar is going to be AMD process for low power in the future. It is quite possible that AMD may in the future (if the tdp is low enough) make a 20nm dual channel chip for the higher margin tablets, high margin tablets would insist on higher ipc for the cpu and more bandwidith for the screen, while low margin cheap computers would not care about increase ipc. It is possible that in the future AMD can find a sweet spot between intel atom and intel haswell. (I am not saying this is likely, just merely possible all depends on intel's pricing to oems for haswell.)
  • whyso - Thursday, May 23, 2013 - link

    It is extremely unlikely for this chip's cpu to be hurt by memory bandwidth. Sure the i7 has 100% more bandwidth but you must factor in the power of the chip. The i7-3770k is going to be something like 6x more powerful (3.7 ghz turbo + IPC advantage + HT). Even if the kabini chip tested in this review was half as efficient as the 3770k in utilizing memory bandwidth it still wouldn't show any differences in cpu performance.
    Phone/tablet SOCs mainly need the bandwidth for the gpu portion of the die (the high res screen was the reason that apple needed such an interface). If they need more bandwidth it'll be for the igp portion of the die (definitely 1066 mhz ram on a single channel will hurt this thing) for kabini.
  • geoflouw - Thursday, May 23, 2013 - link

    Need a comparison against Haswell and Baytrail please. This data is misleading, i really would hope AMD can compete with 1 year old CPUs and SOCs.....
  • smilingcrow - Thursday, May 23, 2013 - link

    It’s hardly misleading. Haswell is still under NDA and Bay Trail is due around Q4 which is why they aren’t compared. The better comparison will be Bail Trail based on pricing and performance. AMD should still have a healthy lead for GPU but if Intel’s recent info is true it should be close on the CPU front. The main difference is that Bay Trail has a low enough TDP for a cheap fanless tablet whereas the chip reviewed today is 15W so Ultrabook class. So you get two out of four with AMD:
    The Good: GPU + Price
    The Bad: power consumption + low IPC
    So business as usual really.

Log in

Don't have an account? Sign up now