Original Link: http://www.anandtech.com/show/5563/qualcomms-snapdragon-s4-krait-vs-nvidias-tegra-3
The Qualcomm Snapdragon S4 (Krait) Preview Part IIby Anand Lal Shimpi on February 22, 2012 11:40 PM EST
Yesterday we presented the first results of Qualcomm's Krait based MSM8960 SoC. While we still await the first Krait based phones (widely expected to begin shipping sometime in Q2), courtesy of Qualcomm's MSM8960 Mobile Development Platform we were able to get a good idea of the upper bound for Krait and MSM8960 performance. I mention it's the upper bound because, at least in the past, MDP performance hasn't corresponded directly to shipping device performance. There was a pretty big delta between MSM8660 MDP performance and phones that used the MSM8660. Qualcomm tells us that this time around things are going to be different. Qualcomm is expecting a much narrower (nonexistent?) gap between the MSM8960 development platform and phones that use MSM8960 silicon. One major difference between the MSM8960 MDP and our earlier MSM8660 MDP was the state of the CPU governor. In the earlier MDP the governer was set to max performance, always delivering the CPU's maximum clock frequency. With the MSM8960 platform the governor was set to ondemand, allowing for variable CPU speeds depending on what the OS requests of the device. The ondemand setting is in-line with what we can expect device manufacturers to use when they ship phones. All of this goes to say that while we have a good handle of what Krait and the MSM8960 are capable of, there are still a lot of unknowns.
While it's true that shipping performance remains to be seen, some of the deltas we saw between MSM8960 and the current competition were so great that even a much slower implementation in a shipping phone would still be significantly faster than anything else out today.
We left our MSM8960 investigation with two major unknowns. The first was power consumption. We still haven't been able to get Qualcomm's Trepn tool running on the MSM8660 MDP, which has always been a bit finicky. To get a true feel for MSM8960 battery life we will have to wait for shipping devices. The other major unknown was really how MSM8960 stacks up against NVIDIA's Tegra 3.
Tegra 3 was everything Tegra 2 should have been. We got higher clocks, NEON support and a much faster GPU. The only thing missing from Tegra 3 was a dual channel memory interface. We were happy with Tegra 3 on ASUS' Eee Pad Transformer Prime, but in less than a week we'll get to meet some of the first smartphones based on T3 silicon.
Armed with the Eee Pad Transformer Prime (updated to Ice Cream Sandwich) we're able to get a rough idea of how these two heavyweights will compare. The same caveats that applied to the MDP apply to our Tegra 3 platform as well. Since we are using a tablet we're obviously dealing with a higher TDP than what you'll find in a phone. The comparison today is largely academic and naturally shipping devices may be better or worse that these two representatives. With the disclaimers out of the way, let's get to the comparison.
CPU Performance: Preferring Single vs. Multithreaded Performance
The MSM8960 features two Krait cores compared to the four ARM Cortex A9 cores in NVIDIA's Tegra 3. While the A9 is a very power efficient core, Krait offers a much wider front end, wider execution back end, faster FPU and an improved cache/memory interface. All of these factors together combined with similar clock speeds to what Tegra 3 is able to hit should result in better absolute performance in single or lightly threaded applications. As video decode and transcode are both fully offloaded in all modern SoCs, finding workloads that scale well across more than two cores is difficult. We noted this in our Eee Pad Transformer Prime review - it's just not easy coming up with current apps that scale well to four ARM cores. That's not to say that there are no advantages to more than two cores, but you're more likely to get a benefit from two faster cores vs. four slower ones.
NVIDIA's saving grace is the fact that it did ramp up A9 clock speed very high in Tegra 3, and it has that handy
companion core 4-PLUS-1 architecture to keep power consumption low throughout very light workloads. There's also the fact that while very few smartphone apps will peg four cores constantly, there are periods of time when you'll see more than two cores in use. Multitasking, although more likely to happen in significant amounts on a tablet, can also increase usage of the third and fourth cores on Tegra 3.
We'll start with Linpack, our heaviest floating point/cache/memory bandwidth test:
Single threaded floating point performance is obviously a strength of the MSM8960 and Krait. Qualcomm tells us that Krait is able to multi-issue floating point instructions, something that the Cortex A9 cannot do. The MSM8960 memory controller also appears to be more efficient than previous designs, contributing to the magnitude of the win here.
Move to more threads and the situation doesn't change dramatically, although Tegra 3 is obviously far more competitive thanks to its sheer core count:
Browsermark tells a different story. Here the Tegra 3 based Transformer Prime is actually able to be slightly faster than the MSM8960. The margin of victory is small enough to be a wash, but the fact that NVIDIA is able to remain competitive is important.
Basemark OS echoes more of what we'd expect. In the overall score the MSM8960 is around 50% faster than the Tegra 3 based tablet. Even if the MSM8960 MDP is unrealistically fast for a Krait platform, it's likely that we'll still see a Krait advantage.
|Basemark OS - System|
|HTC Rezound||Galaxy Nexus||ASUS Transformer Prime||MDP MSM8960|
|System Overall Score||658||538||602||907|
|Simple Java 1||298 loops/s||210 loops/s||240 loops/s||375 loops/s|
|Simple Java 2||7.28 loops/s||8.61 loops/s||7.27 loops/s||10.8 loops/s|
|SMP Test||35.3 loops/s||49.2 loops/s||81.2 loops/s||64.4 loops/s|
|100K File (eMMC->SD)||6.49 mB/s||9.52 mB/s||11.0 mB/s||8.64 mB/s|
|100K File (SD->eMMC)||33.0 mB/s||17.8 mB/s||14.5 mB/s||39.8 mB/s|
|100K File (eMMC->eMMC)||37.8 mB/s||34.5 mB/s||29.7 mB/s||48.9 mB/s|
|100K File (SD->SD)||8.47 mB/s||8.30 mB/s||8.06 mB/s||12.7 mB/s|
|Database Operation||10.0 ops/s||5.73 ops/s||4.56 ops/s||19.4 ops/s|
|Zip Compression||0.509 s||0.848 s||0.637 s||0.561 s|
|Zip Decompression||0.097 s||0.206 s||0.089 s||0.073 s|
Most of the Basemark tests are lightly threaded, but looking at the SMP test gives you another example of Tegra 3's strengths given the right workload. With the right application, Tegra 3 can be faster than the MSM8960, however it's still our opinion that you're more likely to find a lightly threaded workload on a smartphone than you are going to encounter something that scales well to four cores.
Prior to today there was a bug in GLBenchmark that prevented it from running on some Android 4.0.3 devices. Our Eee Pad Transformer Prime was one of those devices and thus we couldn't produce updated Tegra 3 scores using GLBenchmark. Thankfully GLBench 2.1.1 finally made it through testing/validation and includes a slightly different workload, with a number of bug fixes. Android 4.0.3 now works properly and we were able to continue our MSM8960 vs. Tegra 3 comparison. Note that the iOS build of GLBenchmark 2.1.1 is not yet available so we can't provide any iPad 2 comparisons yet.
Tegra 3's GPU performance is much improved compared to Tegra 2, and in the Egypt benchmark we see a tangible advantage over MSM8960. As we mentioned yesterday, only the first Krait SoC will use Adreno 225 - future versions will ship with Adreno 3xx, offering even better GPU performance. As the initial showdown is likely going to be Tegra 3 vs. MSM8960, this is a valid comparison.
Qualcomm and NVIDIA swap places once again when we look at the older GLBenchmark Pro test, although both perform well thanks to the lighter nature of this test.
Basemark ES 2.0 is completely dominated by Adreno however:
I'm still not totally sure why Basemark favors Adreno architectures so much but the results are what they are.
We've also been playing with Electopia, another Qualcomm-friendly test:
We do bump into Vsync limits with both the Tegra 3 and Qualcomm MDP at 800 x 480. Unfortunately Electopia doesn't allow for custom display resolutions, the only options are WVGA or native. The MDP has a native resolution of 1024 x 600 compared to the TF Prime's 1280 x 800 making a comparison at native resolutions unfair. That being said, according to Qualcomm the MSM8960 should be able to deliver around 40 fps at 1280 x 720 compared to the 24.6 fps we measured on the Transformer Prime at 1280 x 800.
Although Electopia is a game, it's still tough to tell how killer 3D titles on Android will end up performing. Oh the things I would do for an Unreal Engine 3 benchmark on Android...
As I mentioned at the start of this comparison, we're trying to compare two SoCs in two platforms that may offer wildly different experiences than shipping devices based on these SoCs. The hope (on both sides) is that we'll see similar, but likely slightly lower performance in phones. The reality will have to wait until we have final hardware in hand.
Qualcomm's strengths are clearly single/lightly threaded CPU performance as Krait is able to offer some significant steps forward in that department. Tegra 3 can hold onto an advantage in heavily threaded apps, but I'm not entirely convinced that in phones we'll see a lot of that.
The bigger question is about power efficiency, and this is the one not as easily answered based on what we know today. Qualcomm gains a lot by being on a 28nm LP process, however it also has more power hungry cores on that process. Device level power efficiency for a given workload may truly improve as a result of having faster cores on a lower power process (race to sleep, lower power idle). Generally speaking however, single threaded performance often comes at the expense of core level power efficiency. That's the reason it's taken this long for a 3-wide out-of-order core to make it into a smartphone. Will Moore's Law, and the 28nm LP process in particular, be enough to offset the power consumption of a higher performance Krait core under full load? Depending on how conservative device makers choose to build their power profiles we may get varying answers to this question.
Tegra 3 on the other hand should be a known quantity from a power consumption standpoint. All of the A9s in Tegra 3 are power gated (unlike in Tegra 2) and there's the fifth core for light workloads. For typical usage models I would expect better battery life out of Tegra 3 phones compared to Tegra 2 counterparts since the extra cores will likely be power gated, and idle power consumption should be lower. It's only for the heavier workloads where all cores are engaged that the impact of Tegra 3 remains to be seen.
There's also the LTE component. Today we're focused on the SoC comparisons however the first MSM8960 devices will also benefit from having integrated 28nm LTE baseband as well. Qualcomm will also have discrete 28nm LTE baseband solutions as well (e.g. MDM9615) for device makers who choose not to use Qualcomm application processors.
We'll obviously figure all of this out in due time, but my final concern remains with the device vendors. Far too often we review great platforms that are burdened with horrible software sold under the guise of differentiation. We're finally on the cusp of getting some really powerful smartphone hardware, I do hope the device vendors do these SoCs justice.