The Mali-400

Now that we've settled the issue of what type of GPU it is, let's talk about the physical makeup of the Mali-400. The Mali-400 isn't a unified shader architecture, it has discrete execution hardware for vertex and fragment (pixel) processing. ARM calls the Mali-400 a multicore GPU with configurations available with 1 - 4 cores. When ARM refers to a core however it's talking about a fragment (pixel shader) processor, not an entire GPU core. This is somewhat similar to NVIDIA's approach with Tegra 2, although NVIDIA counts each vertex and fragment processor as an individual core.

In its simplest configuration the Mali-400 features a single combined geometry front end and vertex processor and a single fragment processor. The 400 is also available in 2 and 4 core versions, both of which still have only a single vertex processor. The two core version has two fragment processors and the four core version has four fragment processors. Note that ARM decided to scale fragment shading performance with core count while keeping vertex performance static. This is likely the best decision given current workloads, but a risky one. NVIDIA on the other hand standardized on a 1:1 ratio between fragment and vertex processors compared to ARM's 4:1 on a 4-core Mali-400. The 4-core Mali-400 MP4 is what Samsung uses in the Exynos 4210.

ARM, like Qualcomm, isn't particularly interested in having the details of its GPUs available publicly. Unfortunately this means that we know very little about the makeup of each of these vertex and fragment processors. I suspect that both companies will eventually learn to share (just as AMD and NVIDIA did) but as this industry is still in its infancy, it will take some time.

Earlier documentation on Mali revealed that the GPU is a VLIW architecture, meaning each processor is actually a collection of multiple parallel execution units capable of working on vector data. There's no public documentation indicating how wide each processor is unfortunately, but we can make some educated guesses.

We know from history that AMD felt a 5-wide VLIW architecture made sense for DX9 class games, later moving down to a 4-wide architecture for DX11 games. AMD didn't have the die constraints that ARM and other SoC GPU suppliers do so a 5-wide unit is likely out of the question, especially considering that Imagination settled on a VLIW4 architecture. Furthermore pixels have four color elements (RGBA), making a VLIW4 an ideal choice.

Based on this as well as some internal information we can assume that a single Mali fragment shader is a 4-wide VLIW processor. The vertex shader is a big unknown as well, but knowing that vertex processing happens on two coordinate elements (U & V) Mali's vertex shader is likely a 2-wide unit.

Thus far every architecture we've looked at has been able to process one FP16 MAD (multiply+add) per execution unit per clock. If we make another assumption about the Mali-400 and say it can do the same, we get the following table:

Mobile SoC GPU Comparison
  PowerVR SGX 535 PowerVR SGX 540 PowerVR SGX 543 PowerVR SGX 543MP2 Mali-400 MP4 GeForce ULP Kal-El GeForce
SIMD Name USSE USSE USSE2 USSE2 Core Core Core
# of SIMDs 2 4 4 8 4 + 1 8 12
MADs per SIMD 2 2 4 4 4 / 2 1 ?
Total MADs 4 8 16 32 18 8 ?
GFLOPS @ 200MHz 1.6 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS ?
GFLOPS @ 300MHz 2.4 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS ?

Based on this estimated data alone, it would appear that a four-core Mali-400 has the shader compute power of a PowerVR SGX 543. In other words, half the compute horsepower of the iPad 2's GPU or over twice the compute of any smartphone GPU today. The Mali-400 is targeted at 275MHz operation, so its figures are likely even higher than the competition. Although MADs are quite common in shader execution, they aren't the end all be all - we need to look at application performance to really see how it stacks up.

Understanding Rendering Techniques GPU Performance: Staggering
Comments Locked

132 Comments

View All Comments

  • mbetter - Sunday, September 11, 2011 - link

    Nice looking phone but after my last Sprint Epic turned out to be such piece of crap, I'm not getting burned again.
  • jmcb - Sunday, September 11, 2011 - link

    Sadly....this happened to me from the old Win Mo days with the Omnia 1. I kept up with the GS 1 and now the GS2...and I give Samsung credit for whatever pros the phones have.

    But like with any phone manufacture...a bad experience can have a lasting effect. And for me it was something simple: build quality and reception. Both were bad with the Omnia 1 IMO. And ever since....I've been leery of Samsung phones.

    But...all n all the GS2 looks like more of a winner than the GS1.
  • warisz00r - Monday, September 12, 2011 - link

    Eh, your loss. (you and the poster you're replying to)
  • steven75 - Sunday, September 11, 2011 - link

    ...still doesn't have functioning GPS? Yikes!
  • WinProcs - Sunday, September 11, 2011 - link

    The GPS now works very well. It finds the satellites faster than any other smartphone I have tried including the iphone 4. Navigon is preloaded onto the phone (in Australia at least). The earlier version of Navigon had some problems on the Galaxy S. That appears to be fixed with the latest software version. The S2 has never had a problem with the GPS.

    I loaded Litening ROM and find that the phone is faster than original and battery life is much better too. I charge it every night but it is normally sitting at about 65-70% after an normal days use.

    I had an iphone and a Galaxy S before the S2. It is better than both of those.
  • ph00ny - Sunday, September 11, 2011 - link

    Are we reading the same article?

    "GPS works this time around, and works well. I took the SGS2 on a 7-hour long road trip with me and used its GPS continually with no issues."

    Every review since the release has made it a point to check this and mentioned it clearly since the SGS1 debacle.
  • Reikon - Monday, September 12, 2011 - link

    You missed the subject of the comment. He's talking about the original SGS, not the SGS2.
  • JMS3072 - Sunday, September 11, 2011 - link

    Does Hulu work using the Desktop user agent?
  • Astri - Sunday, September 11, 2011 - link

    Great review as always, but i was expecting to get more information about the famous color banding problem.
    Yes, the device is super etc etc, but its a pity to not be able to see everything on 24bit
  • supercurio - Sunday, September 11, 2011 - link

    Hi Astri.

    In some conditions yes on Galaxy S II you can perceive gradient banding or suboptimal dithering.
    The reason is not hardware at all, Super AMOLED+ controller and display work on higher bit-depth than 24bit, Gingerbread uses 32bit surfaces by default.

    You can see 3 situations with degraded gradients:
    - pre-dithered to 16bit or lower gradients or images
    - web browser (automatic 16bit dithering)
    - some games using 16bit without dithering instead of 32 on other phones.

    Every available mDNIe preset apply a sharpness filter between the GPU and the screen itself. Of course, it doesn't play well with the 3 type of content listed before.

    I reverse-engineered mDNIe controller registers to build a screen tuning app. Give a try to the dev snapshots: https://market.android.com/details?id=org.projectv... - root required.
    The current version is basic but I'll offer complete rendering configuration in the end.

    To avoid banding, use "Native" preset: as its named: no effect applied.

Log in

Don't have an account? Sign up now