The Mali-400

Now that we've settled the issue of what type of GPU it is, let's talk about the physical makeup of the Mali-400. The Mali-400 isn't a unified shader architecture, it has discrete execution hardware for vertex and fragment (pixel) processing. ARM calls the Mali-400 a multicore GPU with configurations available with 1 - 4 cores. When ARM refers to a core however it's talking about a fragment (pixel shader) processor, not an entire GPU core. This is somewhat similar to NVIDIA's approach with Tegra 2, although NVIDIA counts each vertex and fragment processor as an individual core.

In its simplest configuration the Mali-400 features a single combined geometry front end and vertex processor and a single fragment processor. The 400 is also available in 2 and 4 core versions, both of which still have only a single vertex processor. The two core version has two fragment processors and the four core version has four fragment processors. Note that ARM decided to scale fragment shading performance with core count while keeping vertex performance static. This is likely the best decision given current workloads, but a risky one. NVIDIA on the other hand standardized on a 1:1 ratio between fragment and vertex processors compared to ARM's 4:1 on a 4-core Mali-400. The 4-core Mali-400 MP4 is what Samsung uses in the Exynos 4210.

ARM, like Qualcomm, isn't particularly interested in having the details of its GPUs available publicly. Unfortunately this means that we know very little about the makeup of each of these vertex and fragment processors. I suspect that both companies will eventually learn to share (just as AMD and NVIDIA did) but as this industry is still in its infancy, it will take some time.

Earlier documentation on Mali revealed that the GPU is a VLIW architecture, meaning each processor is actually a collection of multiple parallel execution units capable of working on vector data. There's no public documentation indicating how wide each processor is unfortunately, but we can make some educated guesses.

We know from history that AMD felt a 5-wide VLIW architecture made sense for DX9 class games, later moving down to a 4-wide architecture for DX11 games. AMD didn't have the die constraints that ARM and other SoC GPU suppliers do so a 5-wide unit is likely out of the question, especially considering that Imagination settled on a VLIW4 architecture. Furthermore pixels have four color elements (RGBA), making a VLIW4 an ideal choice.

Based on this as well as some internal information we can assume that a single Mali fragment shader is a 4-wide VLIW processor. The vertex shader is a big unknown as well, but knowing that vertex processing happens on two coordinate elements (U & V) Mali's vertex shader is likely a 2-wide unit.

Thus far every architecture we've looked at has been able to process one FP16 MAD (multiply+add) per execution unit per clock. If we make another assumption about the Mali-400 and say it can do the same, we get the following table:

Mobile SoC GPU Comparison
  PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE2 USSE2 Core Core Core
# of SIMDs 2 4 4 8 4 + 1 8 12
MADs per SIMD 2 2 4 4 4 / 2 1 ?
Total MADs 4 8 16 32 18 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

Based on this estimated data alone, it would appear that a four-core Mali-400 has the shader compute power of a PowerVR SGX 543. In other words, half the compute horsepower of the iPad 2's GPU or over twice the compute of any smartphone GPU today. The Mali-400 is targeted at 275MHz operation, so its figures are likely even higher than the competition. Although MADs are quite common in shader execution, they aren't the end all be all - we need to look at application performance to really see how it stacks up.

Understanding Rendering Techniques GPU Performance: Staggering
Comments Locked

132 Comments

View All Comments

  • shamalh108 - Monday, September 12, 2011 - link

    Thanks alot, going to do that today, however if you read my post above im not sure its an individual app causing it. Maybe i should root so i can wipe the battery stats and recalibrate, besides that im also going to purchase the offical extended battery from samsung, i dont mind losing slight slimness:)
  • ph00ny - Monday, September 12, 2011 - link

    I didn't even bother with rooting for a month or two until i wanted to try out chainfire plugins. Even in stock form, battery life was great. certainly better than my captivate.

    One thing to understand about SAMOLED screen is that it uses 0 power on black pixel and more power on white pixels. So maybe try out a darker themed wall paper and also check to see if you have widgets that have tendency to use up more juice than an alternative

    Also for an example, samsung's stock music app uses roughly half of Google's music app power consumption. It gets worse with spotify (offline mode of course)
  • Remeniz - Monday, September 12, 2011 - link

    The trick is to adjust the power saving features to suit and make sure very little is going on in the back ground. I only run GPS if I need too and the WiFi gets turned off when i'm out and about, unless I know i'm in a WiFi zone and want to browse the www.

    I get at least a days use out of my SGS2.
  • supercurio - Monday, September 12, 2011 - link

    Note:

    "When idle, processor goes back to 200 MHz"

    Idle - screen on or an using a wakelock to keep the device on its the case.
    Otherwise the whole CPU is turned literally OFF − everything frozen in RAM.

    And in this situation, the baseband, Wi-fi chip or an external timer will wake up the CPU and restore Linux kernel in a working state when needed, like if you received a new mail, or a phone call.

    I precise that because most people believe the CPU stays ON all the time but it's the opposite, with standard usage, the CPU is ON only a fraction of the day.
  • Lucian Armasu - Sunday, September 11, 2011 - link

    Brian, I don't think it's fair to compare the "tablet" version of A5 with the "smartphone" version of the Exynos and all the other chips. Even Nvidia's Tegra 2 has either 50% or 100% higher clock frequency for its GPU in the tablets, compared to the one in smartphones.

    It's very likely that all tablet chips are more powerful than the smartphones ones, and for all we know the iPhone 5 GPU will only one 1 GPU core instead of 2 like in the iPad 2, or they'll be clocked at a lower frequency.

    I know you'll review the iPhone 5, too, but I think you're setting a too low expectation for the Exynos and the others compared to the "A5 chip". You know what I mean? You should've at least thrown a Xoom or a Transformer in there to see how it fairs against the Tegra 2 phones.

    I hope at least you'll correct this in future reviews. Great review otherwise, though.
  • privater - Sunday, September 11, 2011 - link

    An iPad 2 can run sun spider 0.9 with 1980 score (4.3.5)
    If the Exynos is superior on every aspect of A5, the result is difficult for me to understand.
  • Lucian Armasu - Sunday, September 11, 2011 - link

    Just as I mentioned above, it's not fair to compare the tablet versions with the phone versions of the chips. All the latest smartphones get around 4000 in the Sun Spider test, but all tablets get around 2000 in that test, so even on the CPU side, it's still not a fair comparison.
  • Mike1111 - Sunday, September 11, 2011 - link

    Great review!

    But why are you so late with the review of the INTERNATIONAL version? I mean I would get it if you decided to wait for the US versions, but waiting almost 4 1/2 months and then publish a review of the international version only a week before the US versions get released? Seems strange to me...
  • ph00ny - Sunday, September 11, 2011 - link

    Brian said in the other reviews comment sections that he was waiting to get ahold of a review unit. I did offer mine if he was nearby but he's nearly on the west coast and i live in the opposite side of the country
  • shamalh108 - Sunday, September 11, 2011 - link

    Another pity is that even games from gameloft which are supposed to be adapted to the SGS2 cause significant heating of the phone.. for example the Asphalt 6 available for free in Samsung Apps .. it would be great if more games were coded to make better use of the SGS2 gpu ...

Log in

Don't have an account? Sign up now