Final Words

Qualcomm's Snapdragon 800 is quite possibly its most ambitious SoC to date. The goal? To drive absolute performance while maintaining power efficiency. While Snapdragon 600 was clearly about delivering evolutionary gains in performance, Snapdragon 800 intends to compete with ARM's Cortex A15 and Intel's Bay Trail platform. 

On the CPU performance front, Snapdragon 800's 2.3GHz Krait 400 cores do appear to hold their own quite well against ARM's Cortex A15. In some cases ARM holds the advantage, while in others the higher clocked Krait 400 takes the lead. We still have the question of power to answer, but Qualcomm bets it can deliver A15-like performance without A15-like power thanks to the 28nm HPM process at its foundry partners.

Qualcomm didn't have any power demos setup, so power analysis and battery life performance will have to come at a later date, but the claim is better performance at equivalent platform power as Snapdragon 600.

On the GPU side, we have a new king. Adreno 330 delivers huge performance improvements over Adreno 320 and everything else we've tested thus far. Snapdragon 800 is the new benchmark to beat. It's very clear to me why many tablet designs scheduled for later this year are based on Snapdragon 800 silicon.

The Great Equalizer: Snapdragon 800 vs. PC GPUs
Comments Locked

115 Comments

View All Comments

  • shodanshok - Thursday, June 20, 2013 - link

    I forgot to specify the benchmark used. It is Coremark: http://www.coremark.org/

    It is a industry standard benchmark with freely available sources.
  • Wilco1 - Friday, June 21, 2013 - link

    Really? Looking at the published results it shows Exynos 4 does 5560 Coremarks/core at 1.4GHz.

    The fastest per-core Atom result is 2.3 CM/MHz for 1 thread, and 3.3 with Hyperthreading.

    Cortex-A9 does 4.0 for 1 thread - so it is 74% faster single threaded, and 21% faster core for core.

    So the A9 destroys Atom on CoreMark as well. I am surprised several of you are trying to argue that in-order cores beat out-of-order cores despite the facts.
  • shodanshok - Friday, June 21, 2013 - link

    No, it is incredible how you pretend to extrapolate _precise_ performance numbers from vague arch details.

    Return to Coremark site, because you misunderstan the benchmark results. The CM/MHz score represent the score of the entire soc - so it don't rule out core count differences. Let see the CM/core score instead and you will find that Atom is in the same field of A9 scores, sometime much better.

    Some examples: Atom z520 vs Tegra2 and Atom n2800 vs exynos4 quad.

    Please also note that:
    - Coremark does not stress l2/memory in any way. This is the only reason why A9 slow memory interface does not interfere here;
    - the compiler has enormous importance in it's score.

    The real Atom problem was the terrible GPU and companion chipset.

    Regards.
  • Wilco1 - Friday, June 21, 2013 - link

    I listed the per core results, as I said A9 is 74% faster single threaded and 21% faster with Hyperthreading enabled. These are results from the EEMBC website, no complex extrapolation involved.

    Coremark runs mostly in L1, however it does stress the branch predictor seriously. All benchmarks have a major compiler component. Coremark is horrible like pretty much any EEMBC stuff so I don't think it will become popular.
  • shodanshok - Saturday, June 22, 2013 - link

    I can not agree. From CoreBench site:

    ### Comparison 1:
    Tegra2 @ 1.00 GHz (2 A9 cores):
    Coremark: 5866.39
    Coremark/Core: 2933.20

    Atom Z520 @ 1.33 GHz (1 Atom Core):
    Coremark: 3192.17
    Coremark/Core: 3192.17

    Atom advantage: 9%

    ### Comparison 2:
    Exynos4 Quad @ 1.4 GHz (4x A9 cores)
    Coremark: 22243.00
    Coremark/core: 5560.75

    Atom N2800 @ 1.86 GHz (2 Atom cores)
    Coremark: 12286.90
    Coremark/Core: 6143.45

    Atom advantage: 10%

    ### Note:
    Why the two A9 and Atom scores are so much different (see Tegra2 vs Exynos and Atom Z530 vs N2800)? The reason lie in the compiler: recent GCC version have greatly improved their efficienty with in-order uarch. Moreover, please also note that the high A9 score (Exynos) was obtained with their specific arm compiler. I am sure that, if benchmarked using Intel C Compiler, the Atom score would be higher.

    ### Summary:
    the Atom core is more than capable to compete against A9. You can argue than Atom has an higher clock, but in phone/tablet environmento clocks don't mean nothing. What is important is performance/watt.
    This bring us to the two real Atom's problem:
    1) a very low efficiency chipset and low integration. Moorestown (intel first attempt to mobile with Atom) was doomed from the start because it require 4/5 chips to enable a full-featured phone;

    2) a very slow GPU (with very bad performance/watt).

    Moreover, it is widely understand that A9 OoO engine is a mild implementation only. A15 is much stronger in this reguard, sometime (not too often, anyway) even apporaching AMD Bobcat single-thread performance.

    Regards.
  • Wilco1 - Saturday, June 22, 2013 - link

    No - the performance comparisons that are useful are:

    1. Max score for a SoC - despite running at a far lower clock, in both comparisons A9-based SoCs win by more than 80% in overall performance.
    2. Efficiency of a core at the same frequency (IPC) - Without Hyperthreading A9 is 74% faster, with Hyperthreading A9 wins by more than 20%.

    Note that your comparison doesn't work. You can't come to a conclusion about A9 vs Atom performance when you compare with wildly different frequencies. Also it means giving Atom the advantage of having 2 threads vs 1 on A9. So to make the comparison fair you need to compare with an equal number of threads or at the same clock.

    Yes GCC has improved a lot in recent years, on ARM it has become a reasonable compiler and competitive with ARM's armcc compiler. I don't know how much better ICC would be on Atom, but I suspect the gap is far smaller as well.

    A9 is not hugely OoO indeed, just like Silvermont. A15 is aggressive OoO and beats Jaguar.
  • shodanshok - Saturday, June 22, 2013 - link

    No, I don't agree again.
    You explicitly talket about CortexA9 and Atom uarch, _not_ their SoC implementation.

    You can not use the total SoC score as uarch benchmark - simply because it don't rule out differences in cores number. To measure uarch performances you need to do a core-by-core comparison. Let me do an example: using total SoC score, a 4xA9 SoC is faster then 2xA15 one. However, the latter uarch is considerably more advanced.

    A very similar argument can be done for frequency: Atom was _from the start_ designed to hit a relatively high-clock, yet low power target. This was deliberately done to exploit Intel 45/32nm HKMG process, which don't scale power down much for lower frequency target. It is simply a question of design targets: for low power chips, you can get (relatively) high-freq _or_ (relatively) high IPC - not both (actually).

    So, you must decide: are you comparing uarch of final SoC implementation? Because, from an uarch point, Atom win. From a performance/watt metric, their bare cores tend to be on par. From a final product specification, A9 is way better because there are many high-integrated, low power, low cost SoCs from a multitude of vendors. On contrast, Atom-based SoCs are offered only by Intel and with a much lower integration factor (and higher cost) - until now,where they latest platform begin to be very competitive against older A9 SoC.

    The "little problem" is that ARM is shipping with 2x and 4x A15 cores, and against them Atom is a disvantage.

    Regards.
  • Wilco1 - Saturday, June 22, 2013 - link

    While Atom was indeed designed for high frequency, A9 reaches higher frequencies: Atom maxes out at 2GHz on 32nm, while A9 does 1.7GHz on 40nm and 2.3GHz on 28nm. So you can't claim a "microarchitecture" win for Atom when you compare against a low clocked A9.

    Secondly, since you argue that frequency is an important aspect of the microarchitecture, I would argue that core count matters equally. A9 was designed to be simple and small, so it is typically used as a quad-core. On the other hand Atom is a large and complex core which uses Hyperthreading rather than multiple cores. So if you want to do a fair comparison with Hyperthreading enabled then you have to use 2 A9 cores for every Atom core. That's how they have been designed to be used.

    What is the difference between a module, a HT enabled core and a dual core? These are just different ways of improving multithreaded performance with different hardware tradeoffs - but to software they all appear identical.

    In conclusion: you cannot just pick whatever comparison you want. Either you compare the whole SoC, including its frequency as well as core count, or you compare microarchitectures normalized on core count and frequency. You can't include one but not the other as frequency, core count and TDP are related.
  • shodanshok - Sunday, June 23, 2013 - link

    So, you started about in-order vs OoO and now you are speaking of die size and perm/mm2?

    1) While CortexA9 was rated for 2 GHz operation, a single A9 core would dissipate more than 2 Watt at this frequency. Atom is not so much different in this reguard. Moreover, can you point me a phone that use a 2 GHz A9 implementation? I bet no.

    2) Atom is also MP form the start: it has the same bus unit and MP capability of Netburst uarch. By which metrics these are inferior to the ARM MP implementation?

    3) By die size comparison, A9 is clearly better then Atom. However, its performance are lower.

    4) HT is simply a smart sharing of some key structure in order to interleave two thread on the same core. You can not count HT as another core. For example, barrel microprocessors can interleave many threads on a single core: Sun T1 can inteleave 4x threads per core, T2 8x core. Do you count T1 as having 32 cores? If so, you are wrong.

    Both I and other users pointed you many reviews and benchmarks where Atom is clearly identified as faster then A9. However, you contine to change metrics.

    The only benchmark that paint a different picture is Geekbench, which show A9 in the same league as Sandy Bridge. Do you _really_ think this is true? In SPEC benchmarks, SB is quite close to the big, power hungry but powerfull POWER7. Do you really think that A9 is remotely comparable to this core? Really?

    I already stated this: if you compare SoCs, well, A9 wins, because there are many well done SoCs based around it. However, from uarch/performance side, Atom wins.

    The funny thing is that is now totally irrelevant: A9 is superseeded by A15, and Atom is very near its EOL. Moreover, Jaguar seems to be a very competent table chip.

    Regards.
  • MrPhilo - Sunday, June 23, 2013 - link

    Unfair to compare the A9's to Atom. The Tegra 2 was a old revision of A9 while lacking NEON etc. The newer A9 are more fair to compare. Also a single A9 at 2Ghz wont produce 2 watts at all, the 2.3Ghz Tegra 4i would be worse than the A15 if it did. Remember the nm is 28 not the old 40's.

Log in

Don't have an account? Sign up now