The Mali-400

Now that we've settled the issue of what type of GPU it is, let's talk about the physical makeup of the Mali-400. The Mali-400 isn't a unified shader architecture, it has discrete execution hardware for vertex and fragment (pixel) processing. ARM calls the Mali-400 a multicore GPU with configurations available with 1 - 4 cores. When ARM refers to a core however it's talking about a fragment (pixel shader) processor, not an entire GPU core. This is somewhat similar to NVIDIA's approach with Tegra 2, although NVIDIA counts each vertex and fragment processor as an individual core.

In its simplest configuration the Mali-400 features a single combined geometry front end and vertex processor and a single fragment processor. The 400 is also available in 2 and 4 core versions, both of which still have only a single vertex processor. The two core version has two fragment processors and the four core version has four fragment processors. Note that ARM decided to scale fragment shading performance with core count while keeping vertex performance static. This is likely the best decision given current workloads, but a risky one. NVIDIA on the other hand standardized on a 1:1 ratio between fragment and vertex processors compared to ARM's 4:1 on a 4-core Mali-400. The 4-core Mali-400 MP4 is what Samsung uses in the Exynos 4210.

ARM, like Qualcomm, isn't particularly interested in having the details of its GPUs available publicly. Unfortunately this means that we know very little about the makeup of each of these vertex and fragment processors. I suspect that both companies will eventually learn to share (just as AMD and NVIDIA did) but as this industry is still in its infancy, it will take some time.

Earlier documentation on Mali revealed that the GPU is a VLIW architecture, meaning each processor is actually a collection of multiple parallel execution units capable of working on vector data. There's no public documentation indicating how wide each processor is unfortunately, but we can make some educated guesses.

We know from history that AMD felt a 5-wide VLIW architecture made sense for DX9 class games, later moving down to a 4-wide architecture for DX11 games. AMD didn't have the die constraints that ARM and other SoC GPU suppliers do so a 5-wide unit is likely out of the question, especially considering that Imagination settled on a VLIW4 architecture. Furthermore pixels have four color elements (RGBA), making a VLIW4 an ideal choice.

Based on this as well as some internal information we can assume that a single Mali fragment shader is a 4-wide VLIW processor. The vertex shader is a big unknown as well, but knowing that vertex processing happens on two coordinate elements (U & V) Mali's vertex shader is likely a 2-wide unit.

Thus far every architecture we've looked at has been able to process one FP16 MAD (multiply+add) per execution unit per clock. If we make another assumption about the Mali-400 and say it can do the same, we get the following table:

Mobile SoC GPU Comparison
  PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE2 USSE2 Core Core Core
# of SIMDs 2 4 4 8 4 + 1 8 12
MADs per SIMD 2 2 4 4 4 / 2 1 ?
Total MADs 4 8 16 32 18 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

Based on this estimated data alone, it would appear that a four-core Mali-400 has the shader compute power of a PowerVR SGX 543. In other words, half the compute horsepower of the iPad 2's GPU or over twice the compute of any smartphone GPU today. The Mali-400 is targeted at 275MHz operation, so its figures are likely even higher than the competition. Although MADs are quite common in shader execution, they aren't the end all be all - we need to look at application performance to really see how it stacks up.

Understanding Rendering Techniques GPU Performance: Staggering
Comments Locked

132 Comments

View All Comments

  • jcompagner - Monday, September 12, 2011 - link

    When i am already using it for months and months now, and i am already thinking maybe next month or 2 i will replace it with its successor the Nexus Prime or what ever it may be called...

    Again here the complains about no updates.
    What are you people complaining about, please...
    Samsung releases, yes not officially but they are real samsung releases, quite often roms
    for example here are the SGS2 once's:

    http://www.samfirmware.com/WEBPROTECT-i9100.htm

    A few releases per month, i am now on the latest one (2.3.4 of August 12)

    If you look there to other phones you also will see many updates of all the latest phones of samsung.

    So it is very easy and you dont need to root if you don't want to, just flash these roms. and you have a updated samsung made rom. (but yes 'leaked')
  • Reikon - Monday, September 12, 2011 - link

    "Vellamo produces its scores directly from frame counters, so what you're looking at is a direct representation of how fast these devices scroll through the three web tests above. The Galaxy S II is 20 - 35% faster than the Photon 4G and 45 - 100% faster than the EVO 3D."

    You mixed up Photon 4G and EVO 3D, either in the table or the comment under it. The data shows the SGS2 20-35% faster than the EVO 3D and 45-100% faster than the Photon 4G.
  • Stormkroe - Saturday, September 17, 2011 - link

    I thought I was the only one noticing this too. I'm also concerned with the adreno missing from the 2.1 off screen render tests, as well as pointing out that it would definitely be beating the S2 in GL 2.0 Pro if resolutions were normalized there. Feels like the whole thing was meant to really set the mali on a high horse. Don't get me wrong, I think it's great, just not "double the speed of the competition" when you throw the entire lineup in the mix.
  • poohbear - Tuesday, September 13, 2011 - link

    Nice to read this review finally, it is indeed an awesome piece of hardware. Im not even sure the iphone 5 will be able to compete? guess we'll find out next month!
  • PWRuser - Tuesday, September 13, 2011 - link

    Is the SGS2 memory the newer 30nm LPDDR2 1066 or the 800 one found in older phones?
  • QWIKSTRIKE - Tuesday, September 13, 2011 - link

    When will you do a Sprint review with CDMA antenna signal repsonsiveness
  • Olrac - Tuesday, September 13, 2011 - link

    Just for those who would like to know I am running The galaxy s2 overclocked to 1.6Ghz and its rock steady no crashes or freezes does not even get much warmer

    Linpack Single Threaded = 74 average and Multithreaded = 114

    Revolution rom with ninphetamine 2.1.3 kernel
  • lamecake - Wednesday, September 14, 2011 - link

    Just for comparision.. I have a HTC Sensation clocked at 1.6ghz with a Sense based rom. It's actually perfectly stable to 1.78ghz here but 1.6 should be no problem for any sensation.

    Linpack Single Threaded = 58 average and Multithreaded = 95

    Pyramid3D 7.4.0 with faux123 0.1.4 kernel.

    CM based roms just popping up, so anxious to see how a non-sense rom compares to the SGS2.
  • Pessimism - Tuesday, September 13, 2011 - link

    is not a coin cell battery. it is a supercapacitor, sort of a cross between a capacitor and a battery, they use them as a buffer between the phone and battery
  • lchen66666 - Tuesday, September 13, 2011 - link

    Rumors of iPhone5 indicate that iPhone5 will have the iPhone4 form. If this is realy, that would quite disappoint me. I will definitely consider GSS2 when I upgrade my old iPhone3GS. I was hoping iPhone5 comes with 4"+720P display+some other improvement(new camera chip, new design of antenna, and new CPU). 4" seems to be the sweet spot to be the smart phone(not too big, and not too small). If Apple doesn't have much improvement in the display. The faster CPU is not that useful.

    The review is very detail. Not very happy with a couple of things on GSS2. Resolution is not high enough for a 4.3" display. Audio quality is not good. Seems like GSS2 has very good camera chip for video and photo. Really like it. From your other review, I got impression that Super ALMOD plus display is much better than IPS. From this review, seems like SALMODE+ is similar to SIPS display used on other smart phones. I haven't seen Samsung SALMODE+ display in person.

Log in

Don't have an account? Sign up now