Qualcomm Snapdragon S4 (Krait) Performance Preview - 1.5 GHz MSM8960 MDP and Adreno 225 Benchmarks
by Brian Klug & Anand Lal Shimpi on February 21, 2012 3:01 AM EST- Posted in
- Smartphones
- Snapdragon
- Qualcomm
- Adreno
- Krait
- Mobile
We won't go too deep into Krait's CPU architecture, because we've already done so in an earlier piece. What we can provide however is a quick recap. Architecturally Krait isn't a design of tradeoffs, rather it's a significant step forward along almost all vectors. Each core can fetch, decode and execute more instructions in parallel than its predecessor (Scorpion, Snapdragon S1/S2/S3).
Qualcomm Architecture Comparison | ||||
Scorpion | Krait | |||
Pipeline Depth | 10 stages | 11 stages | ||
Decode | 2-wide | 3-wide | ||
Issue Width | 3-wide? | 4-wide | ||
Execution Ports | 3 | 7 | ||
L2 Cache (dual-core) | 512KB | 1MB | ||
Core Configurations | 1, 2 | 1, 2, 4 |
Even if you're not comparing to Qualcomm's previous architecture, Krait maintains the same low level advantage over any other ARM Cortex A9 based design (NVIDIA Tegra 2/3, TI OMAP 4, Apple A5). Clock speeds are up with only a small increase in pipeline depth. The combination of these two factors alone should result in significant performance improvements for even single threaded applications. If you want to abstract by one more level: Krait will be faster regardless of application, regardless of usage model. You're looking at a generational gap in architecture here, not simply a clock bump.
Architecture Comparison | ||||||||
ARM11 | ARM Cortex A8 | ARM Cortex A9 | Qualcomm Scorpion | Qualcomm Krait | ||||
Decode | single-issue | 2-wide | 2-wide | 2-wide | 3-wide | |||
Pipeline Depth | 8 stages | 13 stages | 8 stages | 10 stages | 11 stages | |||
Out of Order Execution | N | N | Y | Partial | Y | |||
FPU | VFP11 (pipelined) | VFPv3 (not-pipelined) | Optional VFPv3 (pipelined) | VFPv3 (pipelined) | VFPv4 (pipelined) | |||
NEON | N/A | Y (64-bit wide) | Optional MPE (64-bit wide) | Y (128-bit wide) | Y (128-bit wide) | |||
Process Technology | 90nm | 65nm/45nm | 40nm | 40nm | 28nm | |||
Typical Clock Speeds | 412MHz | 600MHz/1GHz | 1.2GHz | 1GHz | 1.5GHz |
The memory interface of the chip has been improved tremendously. At a high level, the MSM8960 is Qualcomm's first SoC to feature PoP support for two LPDDR2 memory channels. We suspect there are lower level improvements to the memory interface as well however we don't have more details from Qualcomm, not to mention the current state of memory latency/bandwidth testing on Android is pretty abysmal.
Quantifying the Krait performance advantage requires a mixture of synthetic and application level tests. We'll start with Linpack, a Java port of the classic memory bandwidth/FPU test:
Occasionally we'll see performance numbers that just make us laugh at their absurdity. Krait's Linpack performance is no exception. The performance advantage here is insane. The MSM8960 is able to deliver more than twice the performance of any currently shipping SoC. The gains are likely due in no small part to improvements in Krait's cache/memory controller. Krait can also issue multi-issue FP instructions, A9 class architectures can apparenty only dual-issue integer instructions.
Moving on we have our standard JavaScript benchmarks: Sunspider and Browsermark. Both of these tests show significant performance improvements, although understandably not by the margins we saw above in Linpack:
Krait and the MSM8960 are 20 - 35% faster than the dual-core Cortex A9s used in Samsung's Galaxy Nexus. For a look at how overall web page loading is impacted we loaded AnandTech.com three times and averaged the results. We presented results with the browser cache cleared after each run as well as results after all assets were cached:
AnandTech.com Page Loading Comparison (Stock ICS Browser) | ||||
Browser Cache Cleared | Cache In Use | |||
Qualcomm MDP MSM8960 (Krait) | 5.5 seconds | 3.0 seconds | ||
Samsung Galaxy Nexus (ARM Cortex A9) | 5.8 seconds | 4.4 seconds |
There's hardly any advantage when you're network bound, which is to be expected. However whenever the device can pull assets from a local cache (something that is quite common as images, CSS and even many page elements remain static between loads) the advantage grows considerably. Here we're seeing a 46% advantage from Krait over the Cortex A9 in the Galaxy Nexus.
We turn to Qualcomm's own Vellamo as a system/CPU/browser performance test:
Again, we're showing a huge performance advantage here thanks to Krait. Seeing as how Vellamo is a Qualcomm benchmark don't get too attached to the advantage here, but it does echo some of what we've seen earlier.
Finally we have Rightware's Basemark OS 1.1 RC which is fast becomming an impressively polished system benchmark, one which will hopefully eventually take the place of the likes of Quadrant.
Basemark OS - System | |||
HTC Rezound | Galaxy Nexus | MDP MSM8960 | |
System Overall Score | 658 | 538 | 907 |
Simple Java 1 | 298 loops/s | 210 loops/s | 375 loops/s |
Simple Java 2 | 7.28 loops/s | 8.61 loops/s | 10.8 loops/s |
SMP Test | 35.3 loops/s | 49.2 loops/s | 64.4 loops/s |
100K File (eMMC->SD) | 6.49 mB/s | 9.52 mB/s | 8.64 mB/s |
100K File (SD->eMMC) | 33.0 mB/s | 17.8 mB/s | 39.8 mB/s |
100K File (eMMC->eMMC) | 37.8 mB/s | 34.5 mB/s | 48.9 mB/s |
100K File (SD->SD) | 8.47 mB/s | 8.30 mB/s | 12.7 mB/s |
Database Operation | 10.0 ops/s | 5.73 ops/s | 19.4 ops/s |
Zip Compression | 0.509 s | 0.848 s | 0.561 s |
Zip Decompression | 0.097 s | 0.206 s | 0.073 s |
On the CPU centric tests Basemark OS is showing anywhere from a 20% - 80% increase in performance over the 1.5 GHz APQ8060 based HTC Rezound. IO performance is also tangibly improved although that could be a function of NAND performance rather than the SoC specifically.
These results as a whole simply quantify what we've felt during our use of the MSM8960 MDP: this is the absolute smoothest we've ever seen Ice Cream Sandwich run.
86 Comments
View All Comments
k1ng617 - Tuesday, February 21, 2012 - link
Honestly, I don't trust Linpack and believe it is probably one of the most outdated android benchmarks, that doesn't represent what a person will see with realworld user experience.Can you try out Antutu & CFBench and post the scores please?
juicytuna - Tuesday, February 21, 2012 - link
Indeed. Linpack is a test of software as much as hardware, who knows what kind of optimizations they could have done to the VM to get these headline grabbing scores.GPU is distincly meh for a 2012 soc, and single threaded performance doesn't seem that impressive to me. Sunspider and Browsermark seem to be on a par with what you'd expect to see from an A9@1.5ghz.
And how much of that 'faster feel' can be attributed to NAND performance?
metafor - Wednesday, February 22, 2012 - link
There are some hickups in Android that have to do with the UI thread looking up storage but for the most part, it's a CPU thing. The thing to keep in mind is that UI fluidity is an entirely different type of code than Javascript parsing. And looking at the Basemark results, Krait is quite capable in that department.arm.svenska - Tuesday, February 21, 2012 - link
Why is the phone so long? I get that it is a reference design. But, could someone tell why it is like that?douglaswilliams - Tuesday, February 21, 2012 - link
I don't know for sure, not a definitive answer here, just adding to the discussion.Like you said, it's a reference design (Mobile Development Platform). They put as little time as possible into making this pretty.
When I was in college we had some old development platforms for some Motorola chips that were essentially a large circuit card with ports on all the sides for all the I/O and buttons to push for different operating modes like programming mode. It in no way looked like what an actual product would look like - because that wasn't its purpose.
peevee - Tuesday, February 21, 2012 - link
Out-of-order Krait core at 1.5GHz consumes only 750mW. An Atom core at the same frequency consumes as much as 10x of that! While being in-order, no faster, if not slower! What a fail for Intel!Khato - Tuesday, February 21, 2012 - link
You might consider reading Anandtech's article covering the Intel Atom Z2460 launch in January - http://www.anandtech.com/show/5365/intels-medfield...Granted, we're only given SunSpider and BrowserMark benchmarks for the Atom Z2460 reference platform, but they're both actually ahead of the numbers for the Krait MDP - 1331.5 versus 1532 on SunSpider and 116425 vs 110345 on BrowserMark. While I expected Atom to be competitive, I'd thought it likely for Krait to be slightly ahead on the single threaded benchmarks, so I'm somewhat surprised that it's not. (Note that I'm somewhat surprised that there was no mention of how Krait compares to Atom Z2460 in the article.)
As for power, that same article states that the Atom Z2460 SoC consumes ~750 mW at 1.6GHz - that's for the entire SoC, not just the CPU core. It'll be quite interesting to see how actual battery life compares between products once released.
metafor - Tuesday, February 21, 2012 - link
The difference is, one is Intel's numbers and the other is a 3rd party reviewer's on an actual device.So yes, I agree. We'll have to see what actual phones using Atom will be like. Note that Sunspider isn't the end-all of "single-threaded performance" either. The JIT for Javascript on x86 is far more mature -- having been developed for a decade now -- than it is for ARM.
Khato - Tuesday, February 21, 2012 - link
Well, I tend to trust Intel's numbers when they're actual hard numbers rather than percentages or normalized figures - they can't exactly get away with making up figures.And no question about the fact that SunSpider/Browsermark aren't indicative of all too much... but I wouldn't claim that Intel's advantages on those benchmarks are due to a superior JIT/software advantage. Remember the performance figures from that Oak Trail Tablet prototype running an early Android port from June of 2011? That was a prime example of the sort of software disadvantage that Intel had to overcome in order to get Android running well on x86. While a bit dated, here's an excellent example of the performance differences on x86 java implementations between OS (note that linux had a slightly newer version, but they were both using the latest available) - http://www.phoronix.com/scan.php?page=article&...
metafor - Tuesday, February 21, 2012 - link
No, but you'd be surprised how much a bit of pick-and-choose can help. Most comprehensive reviews are pretty rigorous with how many times they repeat a test, how much warm-up they give a device and whether or not they pick the median, average, etc.One could easily pick the best number, which can vary quite a bit especially for a JIT benchmark.
I've also seen that comparison before. There was a rather thorough discussion of it and its relative lack of merits at RWT. I'd link, but it's being marked as spam :/