At CES, Samsung announced its Exynos 5 Octa SoC featuring four ARM Cortex A7s and four ARM Cortex A15s. Unusually absent from the announcement was any mention of the Exynos 5 Octa's GPU configuration. Given that the Exynos 5 Dual featured an ARM Mali-T604 GPU, we only assumed that the 4/8-core version would do the same. Based on multiple sources, we're now fairly confident in reporting that the with the Exynos 5 Octa Samsung included a PowerVR SGX 544MP3 GPU running at up to 533MHz.

The PowerVR SGX 544 is a lot like the 543 used in Apple's A5/A5X, however with the addition of DirectX 10 class texturing hardware and 2x faster triangle setup. There are no changes to the unified shader ALU count. Taking into account the very aggressive max GPU frequency, peak graphics performance of the Exynos 5 Octa should be between Apple's A5X and the A6X (assuming Samsung's memory interface is just as efficient as Apple's):

Mobile SoC GPU Comparison
  PowerVR SGX 543MP2 PowerVR SGX 543MP4 PowerVR SGX 544MP3 PowerVR SGX 554MP4
Used In A5 A5X Exynos 5 Octa A6X
# of SIMDs 8 16 12 32
MADs per SIMD 4 4 4 4
Total MADs 32 64 48 128
GFLOPS @ Shipping Frequency 16.0 GFLOPS 32.0 GFLOPS 51.1 GFLOPS 71.6 GFLOPS

It's good to see continued focus on GPU performance by the major SoC vendors, although I'd like to see a device ship with something faster than Apple's highest end iPad. At the show we heard that we might see this happen in the form of an announcement in 2013, with a shipping device in 2014.

Comments Locked


View All Comments

  • edlee - Monday, January 14, 2013 - link

  • ltcommanderdata - Monday, January 14, 2013 - link

    It would be nice if they added a row that explicitly states what the max shipping frequencies for each chip are that are used to calculate the GLFOPS rating. Working backwards the frequencies are: 250MHz (A5), 250MHz (A5X), 533MHz (Exynos 5 Octa), 280MHz (A6X). The 280MHz A6X frequency in particular would be good to get explicitly on paper since it's been hinted at before, but I don't think anyone has explicitly stated it.

    And for most correctness, I believe the GFLOPS actually round to 51.2 GLFOPS for the Exynos rather than 51.1 GFLOPS at 533MHz and 71.7 GFLOPS for the A6X rather than 71.6 GFLOPS at 280MHz.
  • alexvoda - Monday, January 14, 2013 - link

    How does the GPU in the Tegra4 compare to this, the A6X and to the future iPad 5?

    Since Nvidia is a GPU company I expect their chip while it is fresh new to stomp everything else on the GPU side. How does it actually compare?
  • augiem - Monday, January 14, 2013 - link

    Hasn't happened yet. PVR and Apple have ruled the roost since 2007. Don't hold your breath. Every single halo mobile gpu launched in that timeframe has been one disappointment after another... well except of course for PVR launches. I'm really tired of waiting for 1) Some other device maker to use as many PVR cores as Apple and 2) Some other GPU developer (Nvidia!) to get off their butts and make something competitive. I think I'm going to close my eyes and check back in 3 years.... AGAIN.
  • alexvoda - Monday, January 14, 2013 - link

    In that case is an extra question.
    How do they count cores?
    From what I understand, current SoC with PowerVR have at most 4 PowerVR cores.
    Mali designs are similar having at most 8 cores according to wikipedia
    The Tegra 4 has 72 GPU cores of some kind.
    Are they counting different things? Is the architecture so significantly different?
  • ltcommanderdata - Monday, January 14, 2013 - link

    Yes they are counting different things. Each PowerVR "core" is really a complete GPU with shader ALUs, texture units, control logic, etc. Each nVidia "core" is really just a shader ALU. That's why Anand counts SIMDs which is basically the ALU count. Each SGX543/SGX544 "core" has 4 SIMDs/ALUs. Whereas each SGX554 "core" doubles the SIMD/ALU count to 8 per core. As such the SGX554MP4 in the A6X has 32 ALUs. PowerVR ALUs are also beefier and are able to process 4 MADs instructions each whereas each nVidia ALU can only do 1 MAD. That works out to the SGX554MP4 being able to process 128 MAD instructions per clock whereas Tegra 3 is only able to do 12. Tegra 4 is reported to use basically the same GPU shader architecture as Tegra 3 just with everything increased 6x. Tegra 4's 72 cores therefore can likely process 72 MAD instructions per clock vs 128 MAD instructions per clock for the SGX554MP4. Clock speeds, memory bandwidth, and other factors will affect real world performance, but clock-for-clock, the SGX554MP4 in the A6X should be faster than the Tegra 4.
  • frenchy_2001 - Monday, January 14, 2013 - link

    The problem is always the same and likely to stay: cost.
    This is reflected in the die area of each mobile chip. A5X (iPad3) was ~163mm2 while Tegra3 was ~80mm2. You are talking of a 2x factor. For twice the cost, Apple can afford to be better.
    (see this table to see the different sizes: )

    In discreet GPU, this would be like comparing a GTX680 (top of the line, $400) to a Radeon 7870 (top middle, $200). The Nvidia card will trample the Radeon, at the cost of... money.

    Of course, in the case of tablets and phone, we do NOT see those costs, as they get hidden into the total cost and often pocketed by the product manufacturer (another interesting link explaining why Apple may have introduced the iPad4 as a cost reduction: )

    So, as the GPU capabilities will be capped by costs, it is unlikely that we will see anyone else beat Apple (which, honestly, overspends in graphic power at the moment as a differentiation and advantage for gaming).
  • mayankleoboy1 - Monday, January 14, 2013 - link

    money and.... power usage at load/idle.
  • djgandy - Tuesday, January 29, 2013 - link

    The cost between a 80mm2 die and a 160mm2 is not that huge if you are shipping in the tens of millions. You'd spend that $10 when it significantly improves your $500 device and imposes less demand on the battery, and the amount of heat from the device.

    Having cool running silicon is very important in these slim devices. That's why apple burn area. It means they can design the device they want, rather than the device that the chip enforces them to.

    Having a smaller, higher clocked and hotter running chip would probably cost Apple more in working around the constraints or having a larger battery.
  • mikegonzalez2k - Monday, March 4, 2013 - link

    I thought calculating the GFLOPS was

    # of Cores x Clock Speed x FLOPS/clock

    I'm not getting the correct values so I was wondering if someone could show the calculation.

Log in

Don't have an account? Sign up now