Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested

Name: Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested
Item: Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested
Author: Brian Klug

by Brian Klug on June 18, 2013 8:00 PM EST

115 Comments | Add A Comment

115 Comments

3DMark

3DMark for Android features the Ice Storm benchmark and uses OpenGL ES 2.0. Ice Storm is divided into two graphics tests and a physics test. The first graphics test is geometry heavy while the second test is more pixel shader intensive. The physics test, as you might guess, is CPU bound and multithreaded. The overall score takes into account both graphics and physics tests. The benchmark is rendered to an offscreen buffer at 720p/1080p and then scaled up to the native resolution of the device being tested. This is a very similar approach we've seen by game developers to avoid rendering at native resolution on some of the ultra high resolution tablets. The beauty of 3DMark's approach here is the fact that all results are comparable, regardless of a device's native resolution. The downside is we don't get a good idea of how some of the ultra high resolution tablets would behave with these workloads running at their native (> 1080p) resolutions.

For these benchmarks we stuck with the default presets (720p, normal quality).

Here the key comparisons are against the Adreno 320 based HTC One/SGS4 (T-Mobile) and the PowerVR SGX 544MP3 based SGS4 (SHEVE300S). The Nexus 10 is interesting but pretty much a blowout. Snapdragon 800 is clearly the new high-end Android tablet SoC of choice.

3DMark - Graphics

The overall graphics score from Adreno is amazing. We're looking at almost 2x the next fastest contender here, the Adreno 320 based Snapdragon 600.

Graphics Test 1

Ice Storm Graphics test 1 stresses the hardware’s ability to process lots of vertices while keeping the pixel load relatively light. Hardware on this level may have dedicated capacity for separate vertex and pixel processing. Stressing both capacities individually reveals the hardware’s limitations in both aspects.

In an average frame, 530,000 vertices are processed leading to 180,000 triangles rasterized either to the shadow map or to the screen. At the same time, 4.7 million pixels are processed per frame.

Pixel load is kept low by excluding expensive post processing steps, and by not rendering particle effects.

3DMark - Graphics Test 1

Graphics Test 2

Graphics test 2 stresses the hardware’s ability to process lots of pixels. It tests the ability to read textures, do per pixel computations and write to render targets.

On average, 12.6 million pixels are processed per frame. The additional pixel processing compared to Graphics test 1 comes from including particles and post processing effects such as bloom, streaks and motion blur.

In each frame, an average 75,000 vertices are processed. This number is considerably lower than in Graphics test 1 because shadows are not drawn and the processed geometry has a lower number of polygons.

3DMark - Graphics Test 2

3DMark - Ice Storm

The overall Ice Storm score shows a 71% improvement over Snapdragon 600, which is the closest competitor.

3DMark - Physics

The physics test takes multicore CPU performance into account, but even then the Snapdragon 800 remains ahead of the pack. The performance advantage over the lower clocked Snapdragon 600 shrinks to just 20%, which is a bit lower than clock speeds alone would normally tell us.

3DMark - Physics Test

CPU Performance GPU Performance - GFXBench 2.7

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

115 Comments

View All Comments

shodanshok - Thursday, June 20, 2013 - link
I forgot to specify the benchmark used. It is Coremark: http://www.coremark.org/

It is a industry standard benchmark with freely available sources.
Wilco1 - Friday, June 21, 2013 - link
Really? Looking at the published results it shows Exynos 4 does 5560 Coremarks/core at 1.4GHz.

The fastest per-core Atom result is 2.3 CM/MHz for 1 thread, and 3.3 with Hyperthreading.

Cortex-A9 does 4.0 for 1 thread - so it is 74% faster single threaded, and 21% faster core for core.

So the A9 destroys Atom on CoreMark as well. I am surprised several of you are trying to argue that in-order cores beat out-of-order cores despite the facts.
shodanshok - Friday, June 21, 2013 - link
No, it is incredible how you pretend to extrapolate _precise_ performance numbers from vague arch details.

Return to Coremark site, because you misunderstan the benchmark results. The CM/MHz score represent the score of the entire soc - so it don't rule out core count differences. Let see the CM/core score instead and you will find that Atom is in the same field of A9 scores, sometime much better.

Some examples: Atom z520 vs Tegra2 and Atom n2800 vs exynos4 quad.

Please also note that:
- Coremark does not stress l2/memory in any way. This is the only reason why A9 slow memory interface does not interfere here;
- the compiler has enormous importance in it's score.

The real Atom problem was the terrible GPU and companion chipset.

Regards.
Wilco1 - Friday, June 21, 2013 - link
I listed the per core results, as I said A9 is 74% faster single threaded and 21% faster with Hyperthreading enabled. These are results from the EEMBC website, no complex extrapolation involved.

Coremark runs mostly in L1, however it does stress the branch predictor seriously. All benchmarks have a major compiler component. Coremark is horrible like pretty much any EEMBC stuff so I don't think it will become popular.
shodanshok - Saturday, June 22, 2013 - link
I can not agree. From CoreBench site:

### Comparison 1:
Tegra2 @ 1.00 GHz (2 A9 cores):
Coremark: 5866.39
Coremark/Core: 2933.20

Atom Z520 @ 1.33 GHz (1 Atom Core):
Coremark: 3192.17
Coremark/Core: 3192.17

Atom advantage: 9%

### Comparison 2:
Exynos4 Quad @ 1.4 GHz (4x A9 cores)
Coremark: 22243.00
Coremark/core: 5560.75

Atom N2800 @ 1.86 GHz (2 Atom cores)
Coremark: 12286.90
Coremark/Core: 6143.45

Atom advantage: 10%

### Note:
Why the two A9 and Atom scores are so much different (see Tegra2 vs Exynos and Atom Z530 vs N2800)? The reason lie in the compiler: recent GCC version have greatly improved their efficienty with in-order uarch. Moreover, please also note that the high A9 score (Exynos) was obtained with their specific arm compiler. I am sure that, if benchmarked using Intel C Compiler, the Atom score would be higher.

### Summary:
the Atom core is more than capable to compete against A9. You can argue than Atom has an higher clock, but in phone/tablet environmento clocks don't mean nothing. What is important is performance/watt.
This bring us to the two real Atom's problem:
1) a very low efficiency chipset and low integration. Moorestown (intel first attempt to mobile with Atom) was doomed from the start because it require 4/5 chips to enable a full-featured phone;

2) a very slow GPU (with very bad performance/watt).

Moreover, it is widely understand that A9 OoO engine is a mild implementation only. A15 is much stronger in this reguard, sometime (not too often, anyway) even apporaching AMD Bobcat single-thread performance.

Regards.
Wilco1 - Saturday, June 22, 2013 - link
No - the performance comparisons that are useful are:

1. Max score for a SoC - despite running at a far lower clock, in both comparisons A9-based SoCs win by more than 80% in overall performance.
2. Efficiency of a core at the same frequency (IPC) - Without Hyperthreading A9 is 74% faster, with Hyperthreading A9 wins by more than 20%.

Note that your comparison doesn't work. You can't come to a conclusion about A9 vs Atom performance when you compare with wildly different frequencies. Also it means giving Atom the advantage of having 2 threads vs 1 on A9. So to make the comparison fair you need to compare with an equal number of threads or at the same clock.

Yes GCC has improved a lot in recent years, on ARM it has become a reasonable compiler and competitive with ARM's armcc compiler. I don't know how much better ICC would be on Atom, but I suspect the gap is far smaller as well.

A9 is not hugely OoO indeed, just like Silvermont. A15 is aggressive OoO and beats Jaguar.
shodanshok - Saturday, June 22, 2013 - link
No, I don't agree again.
You explicitly talket about CortexA9 and Atom uarch, _not_ their SoC implementation.

You can not use the total SoC score as uarch benchmark - simply because it don't rule out differences in cores number. To measure uarch performances you need to do a core-by-core comparison. Let me do an example: using total SoC score, a 4xA9 SoC is faster then 2xA15 one. However, the latter uarch is considerably more advanced.

A very similar argument can be done for frequency: Atom was _from the start_ designed to hit a relatively high-clock, yet low power target. This was deliberately done to exploit Intel 45/32nm HKMG process, which don't scale power down much for lower frequency target. It is simply a question of design targets: for low power chips, you can get (relatively) high-freq _or_ (relatively) high IPC - not both (actually).

So, you must decide: are you comparing uarch of final SoC implementation? Because, from an uarch point, Atom win. From a performance/watt metric, their bare cores tend to be on par. From a final product specification, A9 is way better because there are many high-integrated, low power, low cost SoCs from a multitude of vendors. On contrast, Atom-based SoCs are offered only by Intel and with a much lower integration factor (and higher cost) - until now,where they latest platform begin to be very competitive against older A9 SoC.

The "little problem" is that ARM is shipping with 2x and 4x A15 cores, and against them Atom is a disvantage.

Regards.
Wilco1 - Saturday, June 22, 2013 - link
While Atom was indeed designed for high frequency, A9 reaches higher frequencies: Atom maxes out at 2GHz on 32nm, while A9 does 1.7GHz on 40nm and 2.3GHz on 28nm. So you can't claim a "microarchitecture" win for Atom when you compare against a low clocked A9.

Secondly, since you argue that frequency is an important aspect of the microarchitecture, I would argue that core count matters equally. A9 was designed to be simple and small, so it is typically used as a quad-core. On the other hand Atom is a large and complex core which uses Hyperthreading rather than multiple cores. So if you want to do a fair comparison with Hyperthreading enabled then you have to use 2 A9 cores for every Atom core. That's how they have been designed to be used.

What is the difference between a module, a HT enabled core and a dual core? These are just different ways of improving multithreaded performance with different hardware tradeoffs - but to software they all appear identical.

In conclusion: you cannot just pick whatever comparison you want. Either you compare the whole SoC, including its frequency as well as core count, or you compare microarchitectures normalized on core count and frequency. You can't include one but not the other as frequency, core count and TDP are related.
shodanshok - Sunday, June 23, 2013 - link
So, you started about in-order vs OoO and now you are speaking of die size and perm/mm2?

1) While CortexA9 was rated for 2 GHz operation, a single A9 core would dissipate more than 2 Watt at this frequency. Atom is not so much different in this reguard. Moreover, can you point me a phone that use a 2 GHz A9 implementation? I bet no.

2) Atom is also MP form the start: it has the same bus unit and MP capability of Netburst uarch. By which metrics these are inferior to the ARM MP implementation?

3) By die size comparison, A9 is clearly better then Atom. However, its performance are lower.

4) HT is simply a smart sharing of some key structure in order to interleave two thread on the same core. You can not count HT as another core. For example, barrel microprocessors can interleave many threads on a single core: Sun T1 can inteleave 4x threads per core, T2 8x core. Do you count T1 as having 32 cores? If so, you are wrong.

Both I and other users pointed you many reviews and benchmarks where Atom is clearly identified as faster then A9. However, you contine to change metrics.

The only benchmark that paint a different picture is Geekbench, which show A9 in the same league as Sandy Bridge. Do you _really_ think this is true? In SPEC benchmarks, SB is quite close to the big, power hungry but powerfull POWER7. Do you really think that A9 is remotely comparable to this core? Really?

I already stated this: if you compare SoCs, well, A9 wins, because there are many well done SoCs based around it. However, from uarch/performance side, Atom wins.

The funny thing is that is now totally irrelevant: A9 is superseeded by A15, and Atom is very near its EOL. Moreover, Jaguar seems to be a very competent table chip.

Regards.
MrPhilo - Sunday, June 23, 2013 - link
Unfair to compare the A9's to Atom. The Tegra 2 was a old revision of A9 while lacking NEON etc. The newer A9 are more fair to compare. Also a single A9 at 2Ghz wont produce 2 watts at all, the 2.3Ghz Tegra 4i would be worse than the A15 if it did. Remember the nm is 28 not the old 40's.

Snapdragon 800 (MSM8974) Performance Preview: Qualcomm Mobile Development Tablet Tested

3DMark

Post Your Comment

115 Comments

View All Comments

shodanshok - Thursday, June 20, 2013 - link

Wilco1 - Friday, June 21, 2013 - link

shodanshok - Friday, June 21, 2013 - link

Wilco1 - Friday, June 21, 2013 - link

shodanshok - Saturday, June 22, 2013 - link

Wilco1 - Saturday, June 22, 2013 - link

shodanshok - Saturday, June 22, 2013 - link

Wilco1 - Saturday, June 22, 2013 - link

shodanshok - Sunday, June 23, 2013 - link

MrPhilo - Sunday, June 23, 2013 - link

Log in

Don't have an account? Sign up now