Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

Name: Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored
Item: Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

by Brian Klug & Anand Lal Shimpi on October 7, 2011 12:35 PM EST

108 Comments | Add A Comment

108 Comments

The Adreno 225 GPU

Qualcomm has historically been pretty silent about its GPU architectures. You'll notice that specific details of Adreno GPU execution resources have been absent from most of our SoC comparisons. Starting with MSM8960 however, this is starting to change.

The MSM8960 uses a current generation Adreno GPU with a couple of changes. Qualcomm calls this GPU the Adreno 225, a follow-on to Adreno 220. Subsequent Krait designs will use Adreno 3xx GPUs based on a brand new architecture.

As we discussed in our Samsung Galaxy S 2 review, Qualcomm's Adreno architecture is a tile based immediate mode renderer with early-z rejection. By Qualcomm's own admission, Adreno is somewhere in the middle of the rendering spectrum between IMRs and Imagination Technologies' TBDR architectures. One key difference is Adreno's tiling isn't as fine grained as IMG's.

Architecturally the Adreno 225 and 220 are identical. Adreno 2xx is a DX9-class unified shader design. There's a ton of compute on-board with eight 4-wide vector units and eight scalar units. Each 4-wide vector unit is capable of a maximum of 8 MADs per clock, while each scalar unit is similarly capable of 2 MADs per clock. That works out to 160 floating point operations per clock, or 32 GFLOPS at 200MHz.

Update: Qualcomm has clarified the capabilities of its 4-wide Vector ALUs. Similar to the PowerVR SGX 543, each 4-wide vector ALU is capable of four MADs (one per component). The scalar units cannot be combined to do any MADs, although they are helpful we haven't really been tracking those in this table (IMG has something similar) so we've excluded them for now.

Mobile SoC GPU Comparison
	Adreno 225	PowerVR SGX 540	PowerVR SGX 543	PowerVR SGX 543MP2	Mali-400 MP4	GeForce ULP	Kal-El GeForce
SIMD Name	-	USSE	USSE2	USSE2	Core	Core	Core
# of SIMDs	8	4	4	8	4 + 1	8	12
MADs per SIMD	4	2	4	4	4 / 2	1	?
Total MADs	32	8	16	32	18	8	?
GFLOPS @ 200MHz	12.8 GFLOPS	3.2 GFLOPS	6.4 GFLOPS	12.8 GFLOPS	7.2 GFLOPS	3.2 GFLOPS	?
GFLOPS @ 300MHz	19.2 GFLOPS	4.8 GFLOPS	9.6 GFLOPS	19.2 GFLOPS	10.8 GFLOPS	4.8 GFLOPS	?

Looking at the table above you'll see that this is the same amount of computing power than even IMG's PowerVR SGX 543MP2. However as we've already seen in our tests, Adreno 220 isn't anywhere near as quick.

Shader compiler efficiency and data requirements to actually populate a Vec4+1 array are both unknowns, and I suspect both significantly gate overall Adreno performance. There's also the fact that the Adreno 22x family only has two TMUs compared to four in the 543MP2, limiting texturing performance. Combine that with the fact that most Adreno 220 GPUs have been designed into single-channel memory controller systems and you've got a recipe for tons of compute potential limited by other bottlenecks.

With Adreno 225 Qualcomm improves performance along two vectors, the first being clock speed. While Adreno 220 (used in the MSM8660) ran at 266MHz, Adreno 225 runs at 400MHz thanks to 28nm. Secondly, Qualcomm tells us Adreno 225 is accompanied by "significant driver improvements". Keeping in mind the sheer amount of compute potential of the Adreno 22x family, it only makes sense that driver improvements could unlock a lot of performance. Qualcomm expects the 225 to be 50% faster than the outgoing 220

Qualcomm claims that MSM8960 will be able to outperform Apple's A5 in GLBenchmark 2.x at qHD resolutions. We'll have to wait until we have shipping devices in hand to really put that claim to the test, but if true it's good news for Krait as the A5 continues to be the high end benchmark for mobile GPU performance.

While Adreno 225 is only Direct3D feature level 9_3 compliant, Qualcomm insisted that when the time is right it will have a D3D11 capable GPU using its own IP - putting to rest rumors of Qualcomm looking to license a third party GPU in order to be competitive in Windows 8 designs. Although Qualcomm committed to delivering D3D11 support, it didn't commit to a timeframe.

Memory Hierarchy & Process Technology MSM8960 Cellular Connectivity

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

108 Comments

View All Comments

dagamer34 - Friday, October 7, 2011 - link
Great stuff to look forward to. Some comments:

1) I wasn't aware that Microsoft released DirectX 9.3. Perhaps you meant 9.0c or 9.1?
2) Why is nVidia still using a single LPDDR2 channel when everyone else has gone to dual channel memory?

I do look forward to seeing what the next generation of GPUs will provide. Seems like we've stayed in this console generation too long with cell phones having graphics nearly on par with their 200W cousins.
A5 - Friday, October 7, 2011 - link
Re: DX 9.3, you beat me to it.

Ilomilo is pretty, but it's not exactly Gears or Battlefield, you know?
Ryan Smith - Friday, October 7, 2011 - link
It's actually more complex than that. When it comes to programming for Direct3D11, there are a number of different GPU feature level targets. The idea is that developers will write their application in DX11, and then have customized render backends to target each feature level they want to hit.

As it stands there are 6 feature levels: 11, 10_1, 10, 9_3, 9_2, and 9_1. Unfortunately everyone has been lax in their naming standards; DirectX and Direct3D often get thrown around interchangeably, as do periods and underscores in the feature levels (since prior to D3D 11, we'd simply refer to the version of D3D). This is how you end up with DirectX 9.3 and all permutations thereof. The article has been corrected to be more technically accurate to clear this up.

In any case, 9_1 is effectively identical to Direct3D 9.0. 9_3 is somewhere between D3D 9.0b and 9.0c; it implements a bunch of extra features like multiple render targets, but the shader language is 2.x (Vertex Shader 2.0a, Pixel Shader 2.0b) rather than 3.0
partylikeits1999 - Saturday, October 8, 2011 - link
Microsoft made such a mess out of its DirectX nomenclature in the DX9 timeframe that the rest of the industry started to ignore it and invent their own. Hardly anybody even bothers to distinguish between Direct3D and DirectX anymore...they're used interchangeably, even though the former is a subset of the latter.

Windows 8 requires Shader Model 3.0 to be supported by the hardware. Whether you call that 10Level9_3 or 9_3, or DX9.3, or D3D9.3, who cares...from a graphics perspective, it is all just Shader Model 3.0 in the end, whatever you want to call it. All of the Windows 8 launch chipsets from nVidia, TI and Qualcomm, including this MSM8960 will all support Shader Model 3.0 as far as I can tell.
ET - Sunday, October 9, 2011 - link
Feature level 9_3 isn't the same as Shader Model 3 support. The Qualcomm docs say DX9.3 though, which is quite confusing since it doesn't exist. That said, I agree with your assessment that it means Shader Model 3, and not feature level 9_3.
felixyang - Saturday, October 8, 2011 - link
2) I believe dual channels don't give any advantage due to tegra's system bus.
metafor - Friday, October 7, 2011 - link
MSM8260 and MSM8660 only have single-channel 32-bit LP-DDR2 memory, not dual.
z0mb13n3d - Friday, October 7, 2011 - link
Please read:

Although Scorpion featured a dual-channel LPDDR2 memory controller, in a PoP configuration only one channel was available to any stacked DRAM. In order to get access to both 32-bit memory channels the OEM had to implement a DRAM on-package as well as an external DRAM on the PCB. Memory requests could be interleaved between the two DRAM, however Qualcomm seemed to prefer load balancing between the two with CPU/GPU accesses being directed to the lower latency PoP DRAM. Very few OEMs seemed to populate both channels and thus Scorpion based designs were effectively single-channel offerings.

I can tell you with a modicum of confidence that this is true, at least partially.

Aren't you the same person who went on (ranting, obviously) about Krait using HKMG and hitting 2.5GHz next year, in another article?
ArunDemeure - Friday, October 7, 2011 - link
I suggest you stop embarassing yourself. metafor knows what he's talking about, and you clearly don't. I read that previous thread - he was pretty much spot on for everything, as you would expect. I honestly don't know why he even bothers here given the reception he's getting...

Anyway unlike what the article says, the MSM8x60 indeed only has single-channel 32-bit LPDDR2. However there's a twist: Qualcomm offers it in a PoP (Package-on-Package) configuration at up to 266MHz or an 'ISM' (i.e. SiP or System-in-Package) at up to 333MHz. I wouldn't be surprised if many OEMs used the PoP for cost reasons.

I think the confusion might come from another (older) Qualcomm SoC working like the article described iirc, but this does not apply to the MSM8x60 AFAIK.
Anand Lal Shimpi - Friday, October 7, 2011 - link
Arun,

This information does come from Qualcomm, although the odd PoP + external DRAM configuration (that no one seems to use) basically means that MSM8x60 is a single-channel architecture (which is why I starred it in the table above). I will ask Qualcomm once more for confirmation that this applies to MSM8x60 as well as the older single core variants.

Take care,
Anand

Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

The Adreno 225 GPU

Post Your Comment

108 Comments

View All Comments

dagamer34 - Friday, October 7, 2011 - link

A5 - Friday, October 7, 2011 - link

Ryan Smith - Friday, October 7, 2011 - link

partylikeits1999 - Saturday, October 8, 2011 - link

ET - Sunday, October 9, 2011 - link

felixyang - Saturday, October 8, 2011 - link

metafor - Friday, October 7, 2011 - link

z0mb13n3d - Friday, October 7, 2011 - link

ArunDemeure - Friday, October 7, 2011 - link

Anand Lal Shimpi - Friday, October 7, 2011 - link

Log in

Don't have an account? Sign up now