The Mali-400

Now that we've settled the issue of what type of GPU it is, let's talk about the physical makeup of the Mali-400. The Mali-400 isn't a unified shader architecture, it has discrete execution hardware for vertex and fragment (pixel) processing. ARM calls the Mali-400 a multicore GPU with configurations available with 1 - 4 cores. When ARM refers to a core however it's talking about a fragment (pixel shader) processor, not an entire GPU core. This is somewhat similar to NVIDIA's approach with Tegra 2, although NVIDIA counts each vertex and fragment processor as an individual core.

In its simplest configuration the Mali-400 features a single combined geometry front end and vertex processor and a single fragment processor. The 400 is also available in 2 and 4 core versions, both of which still have only a single vertex processor. The two core version has two fragment processors and the four core version has four fragment processors. Note that ARM decided to scale fragment shading performance with core count while keeping vertex performance static. This is likely the best decision given current workloads, but a risky one. NVIDIA on the other hand standardized on a 1:1 ratio between fragment and vertex processors compared to ARM's 4:1 on a 4-core Mali-400. The 4-core Mali-400 MP4 is what Samsung uses in the Exynos 4210.

ARM, like Qualcomm, isn't particularly interested in having the details of its GPUs available publicly. Unfortunately this means that we know very little about the makeup of each of these vertex and fragment processors. I suspect that both companies will eventually learn to share (just as AMD and NVIDIA did) but as this industry is still in its infancy, it will take some time.

Earlier documentation on Mali revealed that the GPU is a VLIW architecture, meaning each processor is actually a collection of multiple parallel execution units capable of working on vector data. There's no public documentation indicating how wide each processor is unfortunately, but we can make some educated guesses.

We know from history that AMD felt a 5-wide VLIW architecture made sense for DX9 class games, later moving down to a 4-wide architecture for DX11 games. AMD didn't have the die constraints that ARM and other SoC GPU suppliers do so a 5-wide unit is likely out of the question, especially considering that Imagination settled on a VLIW4 architecture. Furthermore pixels have four color elements (RGBA), making a VLIW4 an ideal choice.

Based on this as well as some internal information we can assume that a single Mali fragment shader is a 4-wide VLIW processor. The vertex shader is a big unknown as well, but knowing that vertex processing happens on two coordinate elements (U & V) Mali's vertex shader is likely a 2-wide unit.

Thus far every architecture we've looked at has been able to process one FP16 MAD (multiply+add) per execution unit per clock. If we make another assumption about the Mali-400 and say it can do the same, we get the following table:

Mobile SoC GPU Comparison
  PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE2 USSE2 Core Core Core
# of SIMDs 2 4 4 8 4 + 1 8 12
MADs per SIMD 2 2 4 4 4 / 2 1 ?
Total MADs 4 8 16 32 18 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

Based on this estimated data alone, it would appear that a four-core Mali-400 has the shader compute power of a PowerVR SGX 543. In other words, half the compute horsepower of the iPad 2's GPU or over twice the compute of any smartphone GPU today. The Mali-400 is targeted at 275MHz operation, so its figures are likely even higher than the competition. Although MADs are quite common in shader execution, they aren't the end all be all - we need to look at application performance to really see how it stacks up.

Understanding Rendering Techniques GPU Performance: Staggering
Comments Locked

132 Comments

View All Comments

  • Astri - Sunday, September 11, 2011 - link

    Great work, the difference is obvious! Cant wait for the release
    thanks for your reply. is good to know that is not hardware issue. it gives us hopes for quality gradients in future sw updates
  • supercurio - Sunday, September 11, 2011 - link

    I'm glad it works for you ;)

    Don't expect Samsung to change the screen rendering in an update because if some would prefer "Native", others would not after loosing some perceived sharpness even if it's an artificial one that creates halos and artifacts.
    Anyway the app is here, and free!
  • Jon Irenicus - Monday, September 12, 2011 - link

    Your audio section scared me about the audio quality, is there any chance the US sprint variant will use a different DAC? or get a tweaked version of the Yamaha DAC?
  • supercurio - Monday, September 12, 2011 - link

    From dumps I received AT&T and Sprint versions are exactly the same for audio.

    T-Mobile, I'm not sure yet, I got some dumps from an non released device with a separate Yamaha headphone+speaker driver that looked like a potential T-Mobile Galaxy S II.
    No idea about the DAC itself today.
  • Gnarr - Sunday, September 11, 2011 - link

    "TouchWiz 4.0 is a much cleaner, less claustrophobic, and considerably less garish experience."

    http://en.wikipedia.org/wiki/Claustrophobia
  • DeciusStrabo - Monday, September 12, 2011 - link

    "something feels claustrophobic" isn't an uncommon phrase for saying something feels small, cluttered and cramped.
  • jigglywiggly - Sunday, September 11, 2011 - link

    THIS IS THE MOST INDEPTH REVIEW FOR A PHONE EVAR
  • Omid.M - Monday, September 12, 2011 - link

    And their childishness?

    Look what they've done to the American versions of the SGS2. Childish, for wanting their own "version" of an amazing phone. Why mess with a great thing? Oh, because you don't want to just compete on service--as you should--you want "exclusive" features on your version of the phone?

    Wish I was on AT&T so I could import the Int'l version.

    Brian,

    I'm honestly amazed at your 180. I recall you being a little "so what?" about the SGS2 (this is way back before summer 2011) and now it looks to be your favorite smartphone (I think). And we know you're a harsh critic :)

    I hope we get to see soon what the SGS3 might look like: will Samsung keep with the Exynos SoC and add LTE to compete with Krait? What will the next gen Mali GPU look like? Next Gen SAMOLED? So curious...and yet, we know an SGS3 wouldn't reach America for at least another 18 months...hopefully, VZW customers won't be let down by a Nexus Prime (and that includes bloat).

    The addition of Supercurio (Francois) is perfect; you have a talented dev who is passionate enough to explain to the layman how things work. He's helped me on more than one occasion when I had a Fascinate :)

    Great work, Anand, Brian, and Francois. One of the best reviews I've ever read on any product. No question.

    @moids
  • ph00ny - Monday, September 12, 2011 - link

    Agreed. My main reason for purchasing the international version this time around was to receive more timely updates along with less restrictions.

    As for next gen, there is already a LTE version of SGS2 and ARM already announced the next gen Mali graphics quite some time ago. Regardless, no one knows if samsung will use mali's gpu on the SGS3 and hopefully the SGS3 will come in an ATT compatible flavor when it's released
  • Brian Klug - Monday, September 12, 2011 - link

    I definitely admit that I was very *meh* about the phone after seeing it at MWC. It clearly has come a really, really long way, and now it's my absolute favorite Android device because of all those reasons outlined above - just incredible smoothness and huge performance. :)

    -Brian

Log in

Don't have an account? Sign up now