System Performance Cont'd

Moving on towards our more GPU-bound workloads, we use our standard test suite of benchmarks like GFXBench and 3DMark to get a good idea for performance. Unfortunately, due to the move to iOS 9 the Unity engine version used in Basemark X is no longer working so for now we’re left with 3DMark and GFXBench. There is also Basemark OS II’s graphics test, but this is embedded in a larger benchmark with CPU and storage performance tests.

3DMark 1.2 Unlimited - Overall

3DMark 1.2 Unlimited - Graphics

3DMark 1.2 Unlimited - Physics

As always with 3DMark, there are some issues in the data structures used. Due to the data dependencies present within the physics test, it is necessary for the CPU to stall for data to be committed to memory before continuing on to the next portion of the test instead of executing instructions in parallel. This strongly reduces the practical performance of the CPU because the architecture is primarily focused upon instruction-level parallelism to deliver major performance gains. However, due to the strong showing in graphics performance the iPhone 6s’ still manage to take the lead.

GFXBench 3.0 Manhattan (Onscreen)

GFXBench 3.0 T-Rex HD (Onscreen)

GFXBench 3.0 Manhattan (Offscreen)

GFXBench 3.0 T-Rex HD (Offscreen)

In GFXBench, the A9 SoC just shows absurd performance. It’s strange to think about how the iPad Air 2’s GPU seemed incredibly quick at the time but with the A9 Apple has surpassed that level of performance in their smartphone SoCs. The move to a new generation of PowerVR GPU IP, in addition to the move to a FinFET process node are really the drivers for this kind of performance improvement.

Overall, the Apple A9 SoC is the best SoC in any phone shipping today. In cases like web browsing, gaming, and even just going through the UI it’s quite evident that this new SoC is a major factor in improving performance and smoothness across the board. Something as simple as visiting some popular tech websites will show this, which really goes to show how much “specs” still matter due to their influence on user experience.

NAND Performance

At this point is almost goes without saying that storage performance is important, but in a lot of ways the testing here is still in its early days. In the case of the iPhone 6s we’ve discussed what distinguishes its storage solution from others in this industry, but for those that are unaware the iPhone 6s uses PCIe and NVMe instead of a UFS or eMMC storage solution. In a lot of ways, this makes the storage on board closer to the SSD that you might find in a more expensive PC but due to PCB limitations you won’t necessarily see the enormous parallelism that you might expect from a true SSD. In the time since the initial results we've found that all of our review units use Hynix-supplied NAND. In order to test how this storage solution performs, we use Eric Patno’s storage test which allows for a simple storage test comparable to AndroBench 3.6.

Internal NAND - Sequential Read

Internal NAND - Sequential Write

Internal NAND - Random Read

Internal NAND - Random Write

Here, we can really see the enormous performance improvements that result from a combination of TLC NAND with an SLC cache, along with the new NVMe protocol which allows for low CPU overhead and removes architectural bottlenecks to storage performance. This should allow for things like faster burst photos and faster app updates. Downloading and updating apps on the iPhone 6s feels noticeably faster than it is on the iPhone 6, to the extent that small apps feel like they install almost instantly when I’m on a WiFi connection fast enough to saturate storage bandwidth.

System Performance Battery Life and Charge Time
Comments Locked

531 Comments

View All Comments

  • toukale - Monday, November 2, 2015 - link

    Damn, "Now."
  • Kevin G - Monday, November 2, 2015 - link

    Not only is it enough to scare all other ARM SoC's but Intel has to be frighten by what Apple's engineers are capable of. Normalizing for clock speeds, it seems that the A9 is around Sandy/Ivy Bridge IPC and now with FinFET, there is a clock speed overlap with those chips as well. Intel has two newer generations of core designs (Haswell and Sky Lake) but they don't offer huge leaps over Sandy Bridge/Ivy Bridge. I'm really, really curious how the A9X in the iPad Pro will perform against various Core M designs in tablets. It is very conceivable that Apple could take the performance crown.

    Against low power i3/i5/i7 Sky Lake chips, Intel should still have performance lead. Granted those chips have a higher power budget it but it makes me wonder what Apple could pull off with a similar power budget.

    As for the A9 itself, it is a very solid improvement and there is still room to grow. My personal prediction for the A9, SMT, appears to be absent. Considering the width of the A9 design, there should be some performance gains. Certainly while running in a 4T2C mode, power consumption will be higher, 2T1C should be lower power than 2T2C.

    My predictions for the A10? I'm still sticking to the idea that SMT in Apple's CPU designs make sense so there is that. 4 MB of L2 cache and 12 MB of L3 cache are natural evolutions of their current topology. The GPU will core to an 8 core Rogue 7 design. The real SoC change will be in the memory subsystem with Apple adopting WideIO. I predict that the iPhone 7 will be the first product to drop the lightning connector and offer a USB Type-C port so USB and DisplayPort block will be included in the next iteration.
  • aliasfox - Monday, November 2, 2015 - link

    While Intel should be worried about the performance Apple's SoC engineers are capable of, what they should really be worried about is price. Sure, Apple might only offer 75% of the performance of a ULV Core chip, but when it comes at 20-30% of the price, that's serious competition.
  • Kevin G - Monday, November 2, 2015 - link

    There is the whole dichotomy of Apple being an end product supplier with the iPhone/iPad vs. Intel being a parts supplier. There is also the difference that Apple needs a third party to manufacture the A9 chip where as Intel does this in house. Intel is more of a middle man here and thus inflates the end cost of the OEM handsets and tablets. Intel can make the same amount of profit if they were able to spur volume sales but that trade off has never appealed much to Intel who historically enjoyed healthy margins on component pricing.
  • name99 - Monday, November 2, 2015 - link

    Guys, it's time to stop this pretense that Apple is "almost" at Intel performance.
    Apple IPC has exceeded the best Intel has to offer by about 15%.
    (gcc SPEC)
    A9 vs haswell = 3148/1.85 / 4800/3.3 = 1.16
    http://gcc.opensuse.org/SPEC/CINT/sb-czerny-head-6...
    i5-4670T boost 3.3G ~4800

    Or compare against the Broadwell in a MacBook:
    https://browser.primatelabs.com/geekbench3/compare...
    (Note that while the Bwell is nominally at 1.3GHz, Geekbench is short enough that it can turbo at 2.9GHz)

    With the A7 Apple got an "inner" core that was equal to the best Intel has to offer. With the A9 they now have an uncore that matches Intel (look at all the memory dependent benchmarks in the Geekbench comparison above, things like Sobel, Sharpen, and FFT --- Apple now matches Intel pretty much exactly).

    The only place where Apple still lags behind Intel (as far as the mobile space is concerned) is turbo-ing (ie an accurate on-SoC thermal model that allows parts of the SoC to run faster than rated up until the thermal budget is exceeded).
    This does not necessarily mean that turbo is the feature Apple will implement next. There are other directions they could go which provide (in their opinion) a better tradeoff, at least for now, than turbo'ing. Possibilities include
    - het core (add a low power low performance core. This sounds like big.LITTLE, but done right. The core selection and switching is done by a dedicated microcontroller which is tracking various CPU statistics like branch mispredictions and cache misses and using those to decide which core to use. The OS only sees one CPU; the het core is purely an internal implementation detail.
    Done right papers suggest this can buy you about 20% power reduction.)

    - KIP (kilo-instruction processor). A set of ideas that extend OoO from its current ability to tolerate latency out to L3, but not all the way to RAM, all the way out to RAM. This requires a ROB of size 1000 or so, and numerous modifications to allow the physical register set and load-store queues to match this size.

    - post-rename loop buffer. Places the loop buffer not just after fetch, not just after decode, but all the way after rename. Requires various modifications (to handle the "frozen" renaming) but capable of a nice drop in power whenever executing out of the loop buffer.

    Apart from starting down these paths, the obvious visible change for the A10 would appear to be that they
    - drop 32-bit support (which should probably allow them to drop at least one pipeline stage, and simplify the decoder substantially)
    - add support for the ARMv8.1a instructions.

    SMT is (IMHO) a low priority for Apple. They can add more cores faster than they can design in SMT, and area won't be a critical constraint until the Moore's law scaling party stops.
  • vFunct - Monday, November 2, 2015 - link

    They basically already have big.LITTLE with their M9 co-processor.
  • doggface - Monday, November 2, 2015 - link

    I'm sorry but no. Your geekbench scores mean nothing. Intel still has quite the lead. Otherwise Apple Mac book Pros would be using Apple SOCs.

    Apple will find that all the easy gains in CPU ipc/clocks are disappearing and like intel will struggle to make speed improvements beyond a certain level. Then chip cost will start going up. It is inevitable, it is physics.

    All that aside. The A9 is impressive. Kudos to Apple.
  • IanHagen - Wednesday, November 4, 2015 - link

    Whilst I agree mostly with you, the MacBook Pros don't sporting an Apple SoC is IMHO proof of nothing. The migration will be very costly and will brake compatibility with a ton of software. They can't simply slap a nice ARM chip on that thing and call it a day.
  • DerekZ06 - Wednesday, November 4, 2015 - link

    Switching architecture on the Mac book pros is like going from powerpc to x86 all over again.
  • gonsolo - Tuesday, November 3, 2015 - link

    Interesting. Can you quote some of the mentioned papers?

Log in

Don't have an account? Sign up now