The Mali-400

Now that we've settled the issue of what type of GPU it is, let's talk about the physical makeup of the Mali-400. The Mali-400 isn't a unified shader architecture, it has discrete execution hardware for vertex and fragment (pixel) processing. ARM calls the Mali-400 a multicore GPU with configurations available with 1 - 4 cores. When ARM refers to a core however it's talking about a fragment (pixel shader) processor, not an entire GPU core. This is somewhat similar to NVIDIA's approach with Tegra 2, although NVIDIA counts each vertex and fragment processor as an individual core.

In its simplest configuration the Mali-400 features a single combined geometry front end and vertex processor and a single fragment processor. The 400 is also available in 2 and 4 core versions, both of which still have only a single vertex processor. The two core version has two fragment processors and the four core version has four fragment processors. Note that ARM decided to scale fragment shading performance with core count while keeping vertex performance static. This is likely the best decision given current workloads, but a risky one. NVIDIA on the other hand standardized on a 1:1 ratio between fragment and vertex processors compared to ARM's 4:1 on a 4-core Mali-400. The 4-core Mali-400 MP4 is what Samsung uses in the Exynos 4210.

ARM, like Qualcomm, isn't particularly interested in having the details of its GPUs available publicly. Unfortunately this means that we know very little about the makeup of each of these vertex and fragment processors. I suspect that both companies will eventually learn to share (just as AMD and NVIDIA did) but as this industry is still in its infancy, it will take some time.

Earlier documentation on Mali revealed that the GPU is a VLIW architecture, meaning each processor is actually a collection of multiple parallel execution units capable of working on vector data. There's no public documentation indicating how wide each processor is unfortunately, but we can make some educated guesses.

We know from history that AMD felt a 5-wide VLIW architecture made sense for DX9 class games, later moving down to a 4-wide architecture for DX11 games. AMD didn't have the die constraints that ARM and other SoC GPU suppliers do so a 5-wide unit is likely out of the question, especially considering that Imagination settled on a VLIW4 architecture. Furthermore pixels have four color elements (RGBA), making a VLIW4 an ideal choice.

Based on this as well as some internal information we can assume that a single Mali fragment shader is a 4-wide VLIW processor. The vertex shader is a big unknown as well, but knowing that vertex processing happens on two coordinate elements (U & V) Mali's vertex shader is likely a 2-wide unit.

Thus far every architecture we've looked at has been able to process one FP16 MAD (multiply+add) per execution unit per clock. If we make another assumption about the Mali-400 and say it can do the same, we get the following table:

Mobile SoC GPU Comparison
  PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE2 USSE2 Core Core Core
# of SIMDs 2 4 4 8 4 + 1 8 12
MADs per SIMD 2 2 4 4 4 / 2 1 ?
Total MADs 4 8 16 32 18 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

Based on this estimated data alone, it would appear that a four-core Mali-400 has the shader compute power of a PowerVR SGX 543. In other words, half the compute horsepower of the iPad 2's GPU or over twice the compute of any smartphone GPU today. The Mali-400 is targeted at 275MHz operation, so its figures are likely even higher than the competition. Although MADs are quite common in shader execution, they aren't the end all be all - we need to look at application performance to really see how it stacks up.

Understanding Rendering Techniques GPU Performance: Staggering
Comments Locked

132 Comments

View All Comments

  • Mugur - Tuesday, September 13, 2011 - link

    Well, for most Android devices I've tried (I currently own 3), if you just leave them doing nothing overnight (even with wifi on on some of them, but no 3G/HSDPA, no GPS etc.) the battery drain is like 2-3%. Of course, if some app or push email or an updating widget wakes them, the drain could reach 20-25%.

    You just have to play a bit with the phone and find out what is mostly consuming your battery, even get one of the "green" apps on the Market. Through experimentation, I'm sure most people (excluding the really heavy users) will get 50% more time of the battery.
  • wuyuanyi - Monday, September 12, 2011 - link

    It must be the final answer for my pending problem.my GS II has this problem and I has been very annoyed.the CPU current produce a EMI on the output circuit ,for the BT earphone DOESN'T play such hiss and noisy.apprecite it to solve my problem rather than suspect whether it is my own case. but the next question is how to solve it ? can we manual fix the shield or , generate a noisy that is against the noisy --with reverse wave?
    hehe

    sorry for my poor ENGLISH
  • awesomedeleted - Monday, September 12, 2011 - link

    This is a fresh copy of my current phone...Samsung Infuse 4G...which came out in May. I hate the newer Galaxy S round home button thingy too. What's so special, the name?
  • awesomedeleted - Monday, September 12, 2011 - link

    Although I now notice a few small differences in hardware, such as 1.2Ghz Dual-core A9 vs. my Infuse's 1.2Ghz Single-core A8, and the 1GB RAM.
  • supercurio - Monday, September 12, 2011 - link

    Infuse 4G is a Galaxy S "repackaged" with a Galaxy S II look, screen and probably camera sensor for AT&T.
  • bmgoodman - Monday, September 12, 2011 - link

    So I understand that the audio quality of this phone is a step down from the original galaxy. My question is how big a step down? For a non-"audiophile" who just wants to connect the headphone jack into the AUX port on his OEM car stereo to listen to his variable bit rate MP3 (~128 bps IIRC) music collection, is this something that's likely to disappoint? Is it a notable shortcoming for a more typical music fan?
  • supercurio - Monday, September 12, 2011 - link

    No doubt cars are in general a noisy environment.
    Furthermore its very rare to find cars benefiting from good speakers and implementation, resulting in far from linear frequency response, left/right imbalance, resonance in other materials etc :P

    Trained ears or sensible people are capable of detecting subtle difference in sound like nobody can imagine ^^ but I don't think it will Galaxy S II DAC issues described will make a noticeable difference when listening to music while driving a car for most people.

    Note: I have no idea how was the original Samsung Galaxy phone on this regard, but its a regression over Galaxy S.

    Headphones.. that's something else because even cheap ones (price doesn't matter) can provide some low distortion levels and let your perceive fine details.
  • Deusfaux - Monday, September 12, 2011 - link

    It is there and does work, speaking from experience with a Nexus S.
  • Deusfaux - Monday, September 12, 2011 - link

    An HTC I used did it best though, with integrating the feature right into the browser settings. No special URL strings needed to access functionality.
  • aNYthing24 - Monday, September 12, 2011 - link

    But isn't there a version of the Tegra 2 that is clocked at 1.2 GHz? It's going to be at that clock speed in the Fusion Grid table.t

Log in

Don't have an account? Sign up now