POST A COMMENT

86 Comments

Back to Article

  • hans_ober - Tuesday, March 14, 2017 - link

    A10 deepdive? Reply
  • Araa - Tuesday, March 14, 2017 - link

    Thats the smell of a dead elephant in the room. Reply
  • MrSpadge - Tuesday, March 14, 2017 - link

    Wrong thread, this one is about Kirin 960. Reply
  • niva - Tuesday, March 14, 2017 - link

    Well I for one am glad to see this deepdive into the performance. It gives a much more complete picture of what's happening. People on Android Central were giving this chip such glowing reviews and I really wasn't sold on it yet. That being said I'm fairly confident AC is sponsored by Huawei because any phone they push out gets glowing reviews despite it's Chinese hackware. Only Huawei phone worth buying is the Nexus 6P and still remains so. This won't change even if/when they actually do make better hardware. Reply
  • close - Tuesday, March 14, 2017 - link

    "Only Huawei phone worth buying is the Nexus 6P [...]. This won't change even if/when they actually do make better hardware."
    So you "know" that even if they make better hardware only the 6P will be worth buying? Crystal ball much? Are you also sponsored by somebody or did you just choose ignorance? And I'm being delicate here.
    Reply
  • BertrandsBox - Tuesday, March 14, 2017 - link

    "Only Huawei phone worth buying is the Nexus 6P and still remains so. This won't change even if/when they actually do make better hardware."

    If Android Central gets accused of hiding sponsorship from Huawei, do silly comments like these make you someone who's being paid by Huawei's competitors?
    Reply
  • Alexvrb - Tuesday, March 14, 2017 - link

    A lot of reviewers take free bread. Especially ones that aren't making enough off ads alone, or that have a personal slant, etc. I'm not saying that includes AC, but you shouldn't dismiss it so casually either. The most effective place to apply grease is reviewers and review sites. The Kirin 960 is obviously a step sideways, so to give it glowing reviews is hilarious.

    However... even though the process and design they chose may be hindering power consumption at these clocks, they did have another goal in mind. Cost. If they really achieved such massive increases in density, and have a similar % yield, they can sell bucketloads of these chips for cheap. So at least for the mid-range devices, these would be plenty good for the foreseeable future.
    Reply
  • close - Wednesday, March 15, 2017 - link

    A lot of internet commentators are paid to praise one sine or company and accuse competing ones. This is an undisputed fact so you shouldn't dismiss this casually.

    The point wasn't whether AC is or isn't on Huawei's payroll but rather the attitude of the person commenting who basically disqualified themselves by saying that "even with better products they will never be worth buying".That sound like the user doesn't actually care about the product or the facts, and is only here to criticize Huawei and anything that might be related to them.

    And no matter what he says in the future none of it can be taken seriously and contain any trace of objective value ;).
    Reply
  • fanofanand - Wednesday, March 15, 2017 - link

    Where does one go to cash in on the whole "paid internet commenting" bit? I'd like some of that "internet money" please :) Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Yeah me too Reply
  • nikhilmaurya10 - Friday, July 21, 2017 - link

    Believe me I bought the honor 8 pro with kirin 960 at 466 USD(india 30k RS), so they have achieved their goal of making a flagship level chip for below flagship price. After reading this review and finding that 8W power of that GPU i am worried about the VR content on this 2k display. One thing is good that this phone is huge metallic slab, that should keep it somewhat cool. Reply
  • socalbigmike - Thursday, March 16, 2017 - link

    They ARE sponsored by Huawei. Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    This. This is a great article but what's missing is the A10 (and Core M and Atom for comparison).

    I'm less interested in the deeper technical stuff, tbh. But I'm very interested in the performance, power consumption, and resulting efficiency. So I'd love to see this test battery for the A10 and Core too.

    Mind you, why don't you do SPi2000 and GB4 against power consumption, rather than only PCMark?
    Reply
  • jjj - Tuesday, March 14, 2017 - link

    ARM was comparing A73 on 10nm vs A72 on 16nm in efficiency ,not peak power for both on same process.
    Likely the memory controller and the interconnect have an impact too in increasing the differences between the 950 and 960.
    Reply
  • Matt Humrick - Tuesday, March 14, 2017 - link

    ARM's power comparison was for the same process and same frequency. Reply
  • jjj - Tuesday, March 14, 2017 - link

    On a per same task basis not peak load. http://images.anandtech.com/doci/10347/11.PNG

    In this slide it's 10nm vs 16nm http://images.anandtech.com/doci/10347/1_575px.PNG
    Reply
  • degasus - Tuesday, March 14, 2017 - link

    > I cannot think of any CPU-centric workloads for a phone that would load two big cores for anywhere near this long

    You haven't run an emulator, have you? With a bit improved GPU drivers (here, EXT_buffer_storage), this will be a very good device for playing Gamecube and Wii games. This will stress two threads, and a bit the GPU.
    Reply
  • tuxRoller - Friday, March 17, 2017 - link

    You're joking.
    Dolphin doesn't run on my pixel c at anything resembling a useful frame rate (even when it actually works).
    Reply
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    I'm not read up on all the lingo, but what does 16FFC stand for, and how does it differ from 16FF+? Reply
  • Ian Cutress - Tuesday, March 14, 2017 - link

    Page 4:

    The Kirin 950 uses TSMC’s 16FF+ FinFET process, but HiSilicon switches to TSMC’s 16FFC FinFET process for the Kirin 960. The newer 16FFC process reduces manufacturing costs and die area to make it competitive in mid- to low-end markets, giving SoC vendors a migration path from 28nm. It also claims to reduce leakage and dynamic power by being able to run below 0.6V, making it suitable for wearable devices and IoT applications. Devices targeting price-sensitive markets, along with ultra low-power wearable devices, tend to run at lower frequencies, however, not 2.36GHz like Kirin 960. It’s possible that pushing the less performance-oriented 16FFC process, which targets lower voltages/frequencies, to higher frequencies that lay beyond its peak efficiency point may partially explain the higher power consumption relative to 16FF+.
    Reply
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    I'm a dunce sometimes. I totally missed that. Thank you Ian! Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    I love that you have begun moderating (to a degree) the comments section! It's nice to have someone with so much knowledge there to dispel the FUD! Not saying his question was bad, but I really do like that you are getting in the mud with us plebs :) Reply
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    My question wasn't bad, just stupid :P Should have read that page a little more closely. Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    I didn't mean to imply your question was bad at all, and I certainly wasn't lumping you in with those spreading FUD, but Ian has become a growing presence in the comments section and I for one like what he's doing. The comments section in nearly every tech article has become ugly, and having a calming, logical, rational presence like Ian only helps to contribute to a more polite atmosphere where disagreement can be had without presuming that the person with an opposing viewpoint is Hitler. Reply
  • MajGenRelativity - Tuesday, March 14, 2017 - link

    I thought this was the Internet, where the opposing viewpoint is always Hitler? :P Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    Hitler has become omnipresent, now the Barrista who underfoams your latte must be Hitler! Reply
  • lilmoe - Tuesday, March 14, 2017 - link

    Shouldn't this provide you with even more evidence that max frequency workloads are super artificial, and are completely unrepresentative of normal, day-to-day workloads? This further supports my claim in earlier article comments that chip designers are targeting a certain performance target, and optimizing efficiency for that point in particular.

    I keep saying this over and over (like a broken record at this point), but I do firmly believe that the benchmarking methodology for mobile parts of the entire blogsphere is seriously misleading. You're testing these processors the same way you would normally do for workstation processors. The author even said it himself, but the article contradicts his very statement. I believe further research/investigations should be done as to where that performance target is. It definitely defers from year to year, with different popular app trends, and from OS upgrade to another.

    Spec, Geekbench and browser benchmarks, if tested in context of same device, same OS upgrades, are a good indication of what the chip can artificially achieve. But the real test, I believe, is launching a website, using facebook, snapchat, etc., and comparing power draw of various chips, since that's what these chips were designed to run.

    There's also the elephant in the room that NO ONE is accounting for when testing and benchmarking, and that's touch input overhead. Most user interaction is through touch. I don't know about iOS, but everyone knows that Android ramps up the clock when the touchscreen detects input to reduce lag and latency. Your browser battery test DO NOT account for that, further reducing its potential credibility as a valid representation of actual usage.

    I mention touch input clock ramps in particular because I believe this is the clock speed that OEMs believe it delivers optimal efficiency on the performance curve for a given SoC, at least for the smaller cluster. A better test would be logging the CPU clocks of certain workloads, and taking the average, then calculating the power draw of the CPU on that particular average clock.

    This is where I believe Samsung's SoCs shine the most. I believe they deliver the best efficiency for common workloads, evident in the battery life of their devices after normalization of screen size/resolution to battery capacity.

    Worth investigating IMO.
    Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    If you can come up with a methodology where opening snapchat is a repeatable scientific test, send your hypothesis to Ryan, I'm sure he will indulge your fantasy. Reply
  • lilmoe - Tuesday, March 14, 2017 - link

    Yea, we all love fantasies. Thing is, in the last couple of paragraphs, Matt literally said that the entirety of the review does not match with the actual real-world performance and battery life of the Mate 9.

    But sure, go ahead and keep testing mobile devices using these "scientific" conventional anyway, since it makes readers like fanofanand happy.
    Reply
  • close - Tuesday, March 14, 2017 - link

    That is, of course, an awesome goal. Now imagine the next review the battery life varies between 10 and 18 hours even on the same phone. Now judge for yourself if this kind of result is more useful to determine which phone has a better battery life. Not only is your real world usage vastly different from mine (thus irrelevant) but you yourself can't even get through 2 days with identical battery life or identical usage. If you can't determine one phone's battery life properly how do you plan on comparing that figure to the ones I come up with?

    If you judged your comment by the same standards you judge the article you wouldn't have posted it. You implicitly admit there's no good way of testing in the manner you suggest (by refusing or being unable to provide a clearly better methodology) but still insisted on posting it. I will join the poster above in asking you to suggest something better. And don't skimp on the details. I'm sure that if you have a reasonable proposal it will be taken into consideration not for your benefit but for all of ours.

    Some of these benchmarks try to simulate a sort of average real world usage (a little bit of everything) in a reproducible manner in order to be used in a comparison. That won't be 100% relevant but there is a good overlap and it's the best comparative tool we've got. Your generic suggestion would most likely provide even less relevant figures unless you come up with that better scenario that you insist on keeping to yourself.
    Reply
  • lilmoe - Tuesday, March 14, 2017 - link

    I read things thoroughly before criticizing. You should do the same before jumping in to support an idiotic comment like fanofanand's. He's more interested in insulting people than finding the truth.

    These tests are the ones which aren't working. No one gets nearly as much battery life as they report. Nor are the performance gains anywhere near what benchmarks like geekbench are reporting. If something isn't working, one should really look for other means. That's how progress works.

    You can't test a phone the same way you test a workstation. You just can't. NO ONE leaves their phone lying on a desk for hours waiting on it to finish compiling 500K lines of code, or rendering a one-hour 3D project or a 4K video file for their channel on Youtube. But they do spend a lot of time watching video on Youtube, browsing the web with 30 second pauses between each scroll, and uploading photos/videos to social media after applying filters. Where are these tests??? You know, the ones that actually MATTER for most people? You know, the ones that ST performance matters less for, etc, etc...

    Anyway, I did suggest what I believe is a better, more realistic, method for testing. Hint, it's in the fifth paragraph of my original reply. But who cares right? We just want to know "which is the fastest", which method confirms our biases, regardless of the means of how such performance is achieved. Who cares about the truth.

    People are stubborn. I get that. I'm stubborn too. But there's a limit at how stubborn people can be, and they need to be called out for it.
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    I'm with fanof and close on this one. Here we have a consistent battery of repeatable tests. They're not perfectly 'real-world' but they're not far off either; there's only so many things a CPU can do.

    I like this test suite (though I'd like to see GB/clock and SPi and GB/power calculated and graphed too). If you can propose a better one, do so.
    Reply
  • close - Wednesday, March 15, 2017 - link

    This isn't about supporting someone's comment, I was very clear which part I agree with: the one where you help come up with a practical implementation of your suggestion.

    Phone can and should be tested like normal desktops since the vast majority of them spend most of their time idling, just like phones. The next this is running Office like applications, normal browsing, and media consumption.

    You're saying that "NO ONE leaves their phone lying on a desk for hours waiting on it to finish compiling 500K lines of code". But how many people would find even that relevant? How many people compile 500K lines of code regularly? Or render hours of 4K video? And I'm talking about percentage of the total.

    Actually the ideal case for testing any device is multiple scenarios that would cover a more user types: from light browsing and a handful of phone calls to heavy gaming or media consumption. These all result in vastly different results as a SoC/phone might be optimized for sporadic light use or heavier use for example. So a phone that has best battery life and efficiency while gaming won't do so while browsing. So just like benchmarks, any result would only be valid for people who follow the test scenario closely in their daily routine.

    But the point wasn't whether an actual "real world" type scenario is better, rather how exactly do you apply that real world testing into a sequence of steps that can be reproduced for every phone consistently? How do you make sure that all phones are tested "equally" with that scenario and that none has an unfair (dis)advantage from the testing methodology? Like Snapchat or FB being busier one day and burning through the battery faster.

    Just like the other guy was more interested in insults (according to you), you seem more interested in cheap sarcasm than in actually providing an answer. I asked for a clear methodology. You basically said that "it would be great if we had world peace and end hunger". Great for a beauty pageant, not so great when you were asked for a testing methodology. A one liner is not enough for this. A methodology is you describing exactly how you proceed with testing the phones, step by step, while guaranteeing reproducibility and fairness. Also please explain how opening a browser, FB, or Snapchat is relevant for people who play games 2 hours per day, watch movies or actually use the phone as a phone and talk to other people.

    You're making this more difficult than it should be. You look like you had plenty of time to think about this. I hald half a day and already I came up with a better proposal then yours (multiple scenarios vs. single scenario). And of course, I will also leave out the exact methodology part because this is a comment competition not an actual search for solutions.
    Reply
  • lilmoe - Wednesday, March 15, 2017 - link

    I like people who actually spend some time to reply. But, again, I'd appreciate it more if you read my comments more carefully. I told you that the answer you seek is in my first reply, in the fifth paragraph. If you believe I have "plenty of time" just for "cheap sarcasm", then sure we can end it here. If you don't, then go on reading.

    I actually like this website. That's why I go out of my way to provide constructive criticism. If I was simply here for trolling, my comments won't be nearly as long.

    SoCs don't live in a vacuum, they come bundled with other hardware and software (Screen, radios, OS/Kernel), optimized to work on the device being reviewed. In the smartphone world, you can't come to a concrete conclusion on the absolute efficiency of a certain SoC based on one device, because many devices with the same SoC can be configured to run that SoC differently. This isn't like benchmarking a Windows PC, where the kernel and governer are fixed across hardware, and screens are interchangeable.

    Authors keep acknowledging this fact, yet do very little to go about testing these devices using other means. It's making it hard for everyone to understand the actual performance of said devices, or the real bang for the buck they provide. I think we can agree on that.

    "You're making this more difficult than it should be"
    No, really, I'm not. You are. When someone is suggesting something a bit different, but everyone is slamming them for the sake of "convention" and "familiarity", then how are we supposed to make progress?

    I'm NOT saying that one should throw benchmarks out. But I do believe that benchmarks should stay in meaningful context. They give you a rough idea about the snappiness of a ultra-mobile device, since it's been proven time after time that the absolute performance of these processors is ONLY needed for VERY short bursts, unlike workstations. However, they DO NOT give you anywhere near a valid representation of average power draw and device battery life, and neither do scripts written to run synthetic/artificial workloads. Period.

    This is my point. I believe the best way to measure a specific configuration is by first specifying the performance point a particular OEM is targeting, and then measuring the power draw of that target. This comes in as the average clocks the CPU/GPU at various workloads, from gaming, browsing, playing video, to social media. It doesn't matter how "busy" these content providers are at specific times, the average clocks will be the same regardless because the workload IS the same.

    I have reason to believe that OEMs are optimizing their kernels/governers for each app alone. Just like they did with benchmarks several years ago, where they ramp clocks up when they detect a benchmark running. Except, they're doing it the right way now, and optmizing specific apps to run differently on the device to provide the user with the best experience.

    When you've figured out the average the OEM is targetting for various workloads, you'd certainly know how much power it's drawing, and how much battery life to expect AFTER you've already isolated other factors, such as the screen and radios. It also makes for a really nice read, as a bonus (hence, "worth investigating").

    This review leaves an important question unanswered about this SoC's design (I'm really interested to know the answer); did HiSilicon cheap out on the fab process to make more money and leach on the success of its predecessor? Or did they do that with good intentions to optimize their SoC further for modern, real world workloads that currently used benchmarks are not detecting? I simply provided a suggest to answer that question. Does that warrant the language in his, or your reply? Hence my sarcasm.
    Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    It's exciting to see the envelope being pushed, and though these are some interesting results I like that they are pushing forward and not with a decacore. The G71 looks like a botched implementation if it's guzzling power that heavily, I wonder if some firmware/software could fix that? A73 still looks awesome, and I can't wait to see a better implementation! Reply
  • psychobriggsy - Tuesday, March 14, 2017 - link

    TBH the issue with the GPU appears to be down to the clock speed it is configured with.

    It's clear that this is set for benchmarking purposes, and it's good that this has been caught.

    Once the GPU settles down into a more optimal 533MHz configuration, power consumption goes down significantly. Sadly it looks like there are four clock settings for the GPU, and they've wasted three of them on stupid high clocks. A better setup looks to be 800MHz, 666MHz, 533MHz and a power saving 400MHz that most Android games would still find overkill.
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Performance/Watt is frankly rubbish whatever the clock speed. Clearly they ran out of time or money to implement Bifrost properly. Reply
  • fanofanand - Wednesday, March 15, 2017 - link

    That's what I'm thinking, I read the preview to Bitfrost and thought "wow this thing is going to be killer!" I was right on the money, except that it's a killer of batteries, not competing GPUs. Reply
  • Shadowmaster625 - Tuesday, March 14, 2017 - link

    What is HTML5 DOM doing that wrecks the Snapdragon 821 so badly? Reply
  • joms_us - Tuesday, March 14, 2017 - link

    Just some worthless test that the Monkey devs put to show how awesome iPhones are. But if you do real side-by-side website comparison between iPhone and and phone with SD821, SD821 will wipe the floor. Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Evidence? Reply
  • Shadowmaster625 - Tuesday, March 14, 2017 - link

    You should use an older iphone for these graphs, not just to make the graphs look better, but also to help these android phones appear to not get murdered so badly. Reply
  • fanofanand - Tuesday, March 14, 2017 - link

    I was really impressed with what I was seeing until that iPhone 7 reared it's ugly head, it certainly puts the ecosystem in perspective doesn't it. Reply
  • joms_us - Tuesday, March 14, 2017 - link

    Doesn't matter, real-world tests show iPhones are pathetic in comparison. It is just fast in loading games and playing it at low-resolution bwahaha. Reply
  • MrSpadge - Tuesday, March 14, 2017 - link

    Trolling troll is trolling. Reply
  • joms_us - Tuesday, March 14, 2017 - link

    Butt-hurt iCrap fan...

    https://www.youtube.com/watch?v=mcTAXsFHu5I
    Reply
  • MrSpadge - Wednesday, March 15, 2017 - link

    Never bought or recommended any Apple device yet. Reply
  • fanofanand - Wednesday, March 15, 2017 - link

    I've never even owned an Apple device in my life, but I am not delusional enough to think the Cyclone/Twister or whatever their current gen is named, doesn't mop the floor with all other mobile SOCs. Reply
  • MrSpadge - Wednesday, March 15, 2017 - link

    +1 Reply
  • CrazyElf - Tuesday, March 14, 2017 - link

    I still cannot get over how much more potent the iPhone's SOC is compared to the rest of the Android phones.

    Why won't an Android SOC vendor try to match that awesome IPC?
    Reply
  • BedfordTim - Tuesday, March 14, 2017 - link

    I suspect it comes down to cost and usage. The iPhone cores are roughly four times the size of an A73. Reply
  • name99 - Tuesday, March 14, 2017 - link

    True. But the iPhone cores are still small ENOUGH. The main CPU complex on an A10 (two big cores, two small cores, and L2, is maybe 15 mm^2.
    ARM STILL seems to be optimizing for core area, and then spending that same core area anyway in octacores and decacores. It makes no sense to me.

    Obviously part of it is that Apple must be throwing a huge number of engineers at the problem. But that's not enough; there has to be some truly incredible project management involved to keep all those different teams in sync, and I don't think anyone has a clue how they have done that.
    They certainly don't seem to be suffering from any sort of "mythical man-month" Fred Brooks problems so far...

    My personal suspicion is that, by luck or by hiring the best senior engineer in the world, they STARTED OFF at a place that is pretty much optimal for the trajectory they wanted.
    They designed a good 3-wide core, then (as far as anyone can tell) converted that to a 6-wide core by clustering and (this is IMPORTANT) not worrying about all the naysayers who said that a very wide core could not be clocked very high.

    Once they had the basic 6-wide core in place, they've had a superb platform on top of which different engineers can figure out improved sub-systems and just slot them in when ready. So we had the FP pipeline redesigned for lower latency, we had an extra NEON functional unit added, we've doubtless had constant improvements to branch prediction, I-fetching, pre-fetching, cache placement and replacement; and so on --- but these are all (more or less) "easy" to optimize given a good foundation on which to build.

    I suspect, also, that unlike some in the industry, they have been extremely open to new ideas from academia, so that there's an implementation turnaround time of maybe two years or so from encountering a good idea (say a new design for a cluster predictor) through simulating it to validate its value, to implementing it.
    I'm guessing that management (again unlike most companies) is willing to entertain a constant stream of ideas (from engineers, from reading the literature, from talking to academics) and to ACCEPT and NOT COMPLAIN about the cost of writing the simulations, in the full understanding that only 5 or 10% of simulated ideas are worth emulating. My guess is that they've managed to increase frequency rapidly (in spite of the 6-wide width) by implementing a constant stream of the various ideas that have been published (and generally mocked or ignored by the industry) for ways to scale things like load-store queues, issue, and rename --- the standard frequency/power pain-points in OoO design.

    Meanwhile ARM seems to suffer from terminal effort-wasting. Apple has a great design, which they have been improving every year. ARM's response, meanwhile, has been to hop like a jack rabbit from A57 to A72 to A73, with no obvious conceptual progression. If each design spends time revising basics like the decoder and the optimal pipeline width, there's little time left to perform the huge number of experiments that I think Apple perform to keep honing the branch predictors, the instruction fusion, the pre-fetchers, and so on.

    It reminds me of a piece of under-appreciated software, namely Mathematica, which started off with a ridiculously good foundation and horrible performance. But because the foundation was so good, every release had to waste very little time re-inventing the wheel, it could just keep adding and adding, until the result is just unbelievable.
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Didn't Jim Keller have something to do with their current architecture?

    And yes, Apple seems to have excellent project management. Really, they have every stage of every process nailed. They're not the biggest company in the world by accident.
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Also don't forget that​ (like Intel) ARM has multiple design teams. A72 and A73 are from separate teams; from that perspective, ARM's design progression does make sense. The original A73 'deepdive' by Andrei explained it very well. Reply
  • name99 - Wednesday, March 15, 2017 - link

    This is a facet of what I said about project management.
    The issue is not WHY there are separate CPU design teams --- no-one outside the companies cares about the political compromises that landed up at that point.
    The issue is --- are separate design teams and restarting each design from scratch a good fit to the modern CPU world?

    It seems to me that the answer has been empirically answered as no, and that every company that follows this policy (which seem to include IBM, don't know about QC or the GPU design teams) really ought to rethink. We don't recreate compilers, or browsers, or OS's every few years from scratch, but we seem to have taken it for granted that doing so for CPUs made sense.

    I'm not sure this hypothesis explains everything --- no-one outside Apple (and few inside) have the knowledge necessary to answer the question. But I do wonder if the biggest part of Apple's success came from their being a SW company, and thus looking at CPU design as a question of CONSTANTLY IMPROVING a good base, rather than as a question of re-inventing the wheel every few years the way the competition has always done things.
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Part of having separate teams is to engender competition; another is to hedge bets and allow risk-taking. Core replacing Netburst is the standard example, I suppose. I'm sure there are others but they aren't coming to mind at the moment... Does replacing Windows CE with Windows 10 count? Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Methinks it's more to do with Safari having some serious optimisations for browser benchmarks baked in deep.

    I'd like to see the A10 subjected to GB4 and SpecInt.
    Reply
  • name99 - Wednesday, March 15, 2017 - link

    The A10 GeekBench numbers are hardly secret. Believe me, they won't make you happy.
    SPEC numbers, yeah, we're still waiting on those...
    Reply
  • name99 - Wednesday, March 15, 2017 - link

    Here's an example:
    https://browser.primatelabs.com/v4/cpu/959859
    Summary:

    Single-Core Score 3515
    Crypto Score 2425
    Integer Score 3876
    Floating Point Score 3365
    Memory Score 3199

    The even briefer summary is that basically every sub-benchmark has A10 at 1.5x to 2x the Kirin 960 score. FP is even more brutal with some scores at 3x, and SGEMM at ~4.5x.

    (And that's the A10... The A10X will likely be out within a month, likely fabbed on TSMC 10nm, likely an additional ~50% faster...)
    Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Thanks. Would love to see those numbers in Anandtech charts, and normalised for power. Reply
  • lilmoe - Friday, March 17, 2017 - link

    We've been asking for this for years now... Reply
  • wardrive2017 - Tuesday, March 14, 2017 - link

    Im quiet surprised by the Snapdragon 650 here. That has 2 1.8 ghz a72s and a 180 Gflop gpu on 28nm and manages to stay a little north of 2 watts on the gpu power consumption charts. Upgrade to the Snapdragon 652 with 4 a72s and i bet that is a high mid ranger with little to no throttling. Sustained game performance much closer to the high end than the specs would suggest. Reply
  • wardrive2017 - Tuesday, March 14, 2017 - link

    I wonder about the sustained performance of the snapdragon 625 as well. I understand it has half the single thread performance or less of these flagship SoCs, but its got a Vulkin/ES 3.2 130 Gflop gpu, and 8 homogenous a53's on 14 nm. You know that wont throttle, and its going to be in many devices this year. I wonder what sustained performance it would have after an hour of Modern Combat 5 or something similiar compared to these flagships. Reply
  • wardrive2017 - Tuesday, March 14, 2017 - link

    Nvm i guess a game that the 625 could run wouldnt exactley be pushing these flagships into a higher thermal envelope anyway. Really need an edit button Reply
  • whitecliff90 - Wednesday, March 15, 2017 - link

    28nm is quite mature now, plus the Adreno GPU so you can expect it to give better perf/W. Reply
  • zeeBomb - Tuesday, March 14, 2017 - link

    When you were expecting a deep dive but not this kind of chipset...lecry

    I rather have an A11 or the newer Exynos 8895 w.e it's called dive though!
    Reply
  • zodiacfml - Tuesday, March 14, 2017 - link

    Thanks. It looks to me that the high power consumption was intentional for it to keep up in the benchmarks. Otherwise, it will have great battery life but lower scores. Reply
  • joms_us - Tuesday, March 14, 2017 - link

    SD821 will continue to shine and this article shows it can still compete even with SD835 that will have similar cluster config as Kirin 960. Reply
  • Eden-K121D - Tuesday, March 14, 2017 - link

    The Android SOC ecosystem is not in a very good shape with few bright spots Reply
  • MrSpadge - Tuesday, March 14, 2017 - link

    Yeah.. with the bright spots being Qualcom, Samsung and HiSilicon. Reply
  • Eden-K121D - Tuesday, March 14, 2017 - link

    Samsung only Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    I think the 820 acquitted itself well here. The 835 could be even better. Reply
  • name99 - Tuesday, March 14, 2017 - link

    "Despite the substantial microarchitectural differences between the A73 and A72, the A73’s integer IPC is only 11% higher than the A72’s."

    Well, sure, if you're judging by Intel standards...
    Apple has been able to sustainabout a 15% increase in IPC from A7 through A8, A9, and A10, while also ramping up frequency aggressively, maintaining power, and reducing throttling. But sure, not a BAD showing by ARM, the real issue is will they keep delivering this sort of improvement at least annually?

    Of more technical interest:
    - the largest jump is in mcf. This is a strongly memory-bound benchmark, which suggests a substantially improved prefetcher. In particular simplistic prefetchers struggle with it, suggesting a move beyond just next-line and stride prefetchers (or at least the smarts to track where these are doing more harm than good and switch them off.) People agree?

    - twolf appears to have the hardest branches to predict of the set, with vpr coming up second. So it's POSSIBLE (?) that their relative shortcomings reflect changes in the branch/fetch engine that benefit
    most apps but hurt specifically weird branching patterns?

    One thing that ARM has not made clear is where instruction fusion occurs, and so how it impacts the two-decode limit. If, for example, fusion is handled (to some extent anyway) as a pre-decode operation when lines are pulled into L1I, and if fusion possibilities are being aggressively pursued [basically all the ideas that people have floated --- compare+branch, large immediate calculation, op+storage (?), short (+8) branch+op => predication like POWER8 (?)] there could be a SUBSTANTIAL fraction of fused instruction going through the system so that the 2-wide decode is basically as good as the 3-wide of A72?
    Reply
  • fanofanand - Wednesday, March 15, 2017 - link

    Once WinArm (or whatever they want to call it) is released, we will FINALLY be able to compare apples to apples when it comes to these designs. Right now there are mountains of speculation, but few people actually know where things are at. We will see just how performant Apple's cores are once they can be accurately compared to Ryzen/Core designs. I have the feeling a lot of Apple worshippers are going to be sorely disappointed. Time will tell. Reply
  • name99 - Wednesday, March 15, 2017 - link

    We can compare Apple's ARM cores to the Intel cores in Apple laptops today, with both GeekBench and Safari. The best matchup I can find is this:
    https://browser.primatelabs.com/v4/cpu/compare/177...

    (I'd prefer to compare against the MacBook 12" 2016 edition with Skylake, but for some reason there seem to be no GB4 results for that.)

    This compares an iPhone (so ~5W max power?) against a Broadwell that turbo's up to 3.1 GHz (GB tends to run everything at the max turbo speed bcs it allows the core to cool between the [short] tests), and with TDP of 15W.

    Even so, the performance is comparable. When you normalize for frequency, you get that A10 is about 20% better IPC, so drops down to maybe 15% better IPC for Skylake.
    Of course that A10 runs at a lower (peak) frequency --- but also at much lower power.

    There's every reason to believe that the A10X will beat absolutely the equivalent Skylake chip in this class (not just m-class but also U-class), running at a frequency of ?between 3 and 3.5GHz? while retaining that 15-20% IPC advantage over Skylake and at a power of ?<10W?
    Hopefully we'll see in a few weeks --- the new iPads should be released either end March or beginning April.

    Point is --- I don't see why we need to wait for WinARM server --- specially since MS has made no commitment to selling WinARM to the public, all they've committed to is using ARM for Azure.
    Comparing GB4 or Safari on Apple devices gives us comparable compilers, comparable browsers, comparable OSs, comparable hardware design skill. I don't see what a Windows equivalent brings to the table that adds more value.
    Reply
  • joms_us - Wednesday, March 15, 2017 - link

    Bwahaha keep dreamin iTard, GB is your most trusted benchmark. =D

    Why don't you run both machine with A10 and Celeron released in 2010. You will see how pathetic your A10 is in realworld apps.
    Reply
  • name99 - Wednesday, March 15, 2017 - link

    When I was 10 years old, I was in the car and my father and his friend were discussing some technical chemistry. I was bored with this professional talk of pH and fractionation and synthesis, so after my father described some particular reagent he'd mixed up, I chimed in with "and then you drank it?", to which my father said "Oh be quiet. Listen to the adults and you might learn something." While some might have treated this as a horrible insult, the cause of all their later failures in life, I personally took it as serious advice and tried (somewhat successfully) to abide by it, to my great benefit.
    Thanks Dad!

    Relevance to this thread is an exercise left to the reader.
    Reply
  • joms_us - Wednesday, March 15, 2017 - link

    Even the latest Ryzen is just barely equal or faster than Skylake clock per clock so what makes you think a worthless low-powered mobile chip will surpass them? A10 is not even better than SD821 on real-world apps comparison. Again real-world apps not Antutu, not Geekbench. Reply
  • zodiacfml - Wednesday, March 15, 2017 - link

    Intel's chips are smaller than Apple's. Apple also has the luxury to spend much on the SoC. Reply
  • Andrei Frumusanu - Tuesday, March 14, 2017 - link

    Stamp of approval. Reply
  • Meteor2 - Wednesday, March 15, 2017 - link

    Andrei! That's a pretty big stamp :). I hope you're well. Reply
  • aryonoco - Tuesday, March 14, 2017 - link

    A great article, very insightful, and absolutely unique on the web.

    Well done Matt, well done AT.
    Reply
  • name99 - Tuesday, March 14, 2017 - link

    One issue in comparing the GeekBench results to the SPEC results is the question of compiler optimization.
    Were the SPEC results specifically targeted at the A73? And using the most recent version of LLVM?

    GeekBench (as far as I can tell) compiles a particular version (say version 4) with a particular compiler and target, and does not update those over time until say GeekBench 5 is released. This is not an awful practice --- it certainly makes it a lot easier to compare results in such a way that compiler optimizations don't confuse the issue. But it DOES mean that
    - the SPEC results may be picking up A73 specific tuning that GeekBench does not reflect.
    - depending on when you compiled the comparison AArch64 binaries, some fraction (and this may be high, 5% or more) of the A73 "improvement" may reflect LLVM improvement, both generally and in specific ARMv8 optimizations.

    If the SPEC results were "maximally" compiled (so that LTO was used, something that only really started to work well in the most recent LLVM versions) there could be even more of a compiler-based discrepancy.
    Reply
  • MrSewerPickle - Thursday, March 16, 2017 - link

    Thank you guys for the review. Please keep these details up and going. You guys are a rarity these days and still doing great. Facts are hard to post without rambling about them and excessive opinion rambling accompanying it. Reply
  • socalbigmike - Thursday, March 16, 2017 - link

    Just don't try and run any apps on it! Because half of them will NOT run. Reply
  • darkich - Thursday, March 23, 2017 - link

    Wow they messed up again with that high clocked GPU implementation. Just ridiculous..Huweai making the same mistake over and over again, refusing to take note from Samsung ..I'm certain the Samsung's low clocked G71 MP18 will consume far less power than this 8 core setup! Reply

Log in

Don't have an account? Sign up now