Improved ISP in A5

So we’ve been over the optical system and the sensor, but there’s another factor as well - image signal processing (ISP). It surprised me to see Apple bring this up on stage, but it’s a hugely important point to make, that the quality of images captured on a given platform depends on everything in the image processing chain. The A5 SoC includes an improved ISP over what was in the A4, and is referred to as the H4. You can watch the OS power gate the ISP and activate it when you launch the camera on console as well:

Oct 18 16:35:02 unknown kernel[0] : AppleH4CamIn::ISP_LoadFirmware_gated: fw len=1171480 Oct 18 16:35:02 unknown kernel[0] : AppleH4CamIn::ISP_LoadFirmware_gated - firmware checksum: 0x0545E78A Oct 18 16:35:02 unknown kernel[0] : AppleH4CamIn::power_on_hardware

The changes include faster processing to accommodate an 8 MP sensor, and vastly improved white balance (which we will show later), and finally some face detection algorithms that work in conjunction with autofocus and autoexposure. I’ve also noticed that the A5’s ISP seems to have improved AF speed (it’s hard to measure, but it just seems much faster) and more importantly the framerate of the capture preview is much higher. I’ve included a small video showing just how much smoother the 4S looks than the 4, even on my 1080p60 camera (which YouTube then reduces to 30fps) the difference is noticeable.

When the ISP detects a face, it’ll paint a green rectangle over the region and run the AF/AE routine just like it would if you tapped to focus. Like all face detection algorithms, it’s decent but not perfect, and I saw the face detection rectangle come up while shooting pictures of pumpkins at a pumpkin patch (which was fairly repeatable on one pumpkin), and a few other random occasions. Apple claims their ISP will run face detection on up to 10 faces and balance AF/AE accordingly for the best exposure.

I mentioned that the camera application preview framerate is improved - which it is - but the camera application is also speedier. Word on the street is that camera application launch time was a significant focus for the 4S, and I set out to measure the difference over the predecessors cameras. Camera launch time is one thing that was singled out during the presentation, but another that can be measured is HDR processing time. I quit all tasks and launched the camera application fresh five times (from tapping camera to seeing the iris fully open), then averaged.

Camera Performance Comparison
Property iPhone 3GS iPhone 4 iPhone 4S
Camera Launch Time (seconds) 2.8 2.3 1.4
HDR Capture Time (seconds) - 4.9 3.2
Working Distance (cm) ~7.0 7.0 6.5

The result on the 4S is a bit behind Apple’s quoted 1.1 seconds, though it’s possible they were measuring after an initial launch, whereas I’m starting with the camera completely closed each time. Still, 0.3 seconds isn’t that far away from their own measurements. The 4S is almost an entire second faster at launching the camera app than the 4, and 1.5x faster at merging three images to HDR than the 4. I also decided to get a rough measure of working distance on the three cameras, or the closest an object can be to the camera and still be focused on.

Camera Improvements Still Image Capture Quality
Comments Locked

199 Comments

View All Comments

  • doobydoo - Friday, December 2, 2011 - link

    Its still absolute nonsense to claim that the iPhone 4S can only use '2x' the power when it has available power of 7x.

    Not only does the iPhone 4s support wireless streaming to TV's, making performance very important, there are also games ALREADY out which require this kind of GPU in order to run fast on the superior resolution of the iPhone 4S.

    Not only that, but you failed to take into account the typical life-cycle of iPhones - this phone has to be capable of performing well for around a year.

    The bottom line is that Apple really got one over all Android manufacturers with the GPU in the iPhone 4S - it's the best there is, in any phone, full stop. Trying to turn that into a criticism is outrageous.
  • PeteH - Tuesday, November 1, 2011 - link

    Actually it is about the architecture. How GPU performance scales with size is in large part dictated by the GPU architecture, and Imagination's architecture scales better than the other solutions.
  • loganin - Tuesday, November 1, 2011 - link

    And I showed it above Apple's chip isn't larger than Samsung's.
  • PeteH - Tuesday, November 1, 2011 - link

    But chip size isn't relevant, only GPU size is.

    All I'm pointing out is that not all GPU architectures scale equivalently with size.
  • loganin - Tuesday, November 1, 2011 - link

    But you're comparing two different architectures here, not two carrying the same architecture so the scalability doesn't really matter. Also is Samsung's GPU significantly smaller than A5's?

    Now we've discussed back and forth about nothing, you can see the problem with Lucian's argument. It was simply an attempt to make Apple look bad and the technical correctness didn't really matter.
  • PeteH - Tuesday, November 1, 2011 - link

    What I'm saying is that Lucian's assertion, that the A5's GPU is faster because it's bigger, ignores the fact that not all GPU architectures scale the same way with size. A GPU of the same size but with a different architecture would have worse performance because of this.

    Put simply architecture matters. You can't just throw silicon at a performance problem to fix it.
  • metafor - Tuesday, November 1, 2011 - link

    Well, you can. But it might be more efficient not to. At least with GPU's, putting two in there will pretty much double your performance on GPU-limited tasks.

    This is true of desktops (SLI) as well as mobile.

    Certain architectures are more area-efficient. But the point is, if all you care about is performance and can eat the die-area, you can just shove another GPU in there.

    The same can't be said of CPU tasks, for example.
  • PeteH - Tuesday, November 1, 2011 - link

    I should have been clearer. You can always throw area at the problem, but the architecture dictates how much area is needed to add the desired performance, even on GPUs.

    Compare the GeForce and the SGX architectures. The GeForce provides an equal number of vertex and pixel shader cores, and thus can only achieve theoretical maximum performance if it gets an even mix of vertex and pixel shader operations. The SGX on the other hand provides general purpose cores that work can do either vertex or pixel shader operations.

    This means that as the SGX adds cores it's performance scales linearly under all scenarios, while the GeForce (which adds a vertex and a pixel shader core as a pair) gains only half the benefit under some conditions. Put simply, if a GeForce core is limited by the number of pixel shader cores available, the addition of a vertex shader core adds no benefit.

    Throwing enough core pairs onto silicon will give you the performance you need, but not as efficiently as general purpose cores would. Of course a general purpose core architecture will be bigger, but that's a separate discussion.
  • metafor - Tuesday, November 1, 2011 - link

    I think you need to check your math. If you double the number of cores in a Geforce, you'll still gain 2x the relative performance.

    Double is a multiplier, not an adder.

    If a task was vertex-shader bound before, doubling the number of vertex-shaders (which comes with doubling the number of cores) will improve performance by 100%.

    Of course, in the case of 543MP2, we're not just talking about doubling computational cores.

    It's literally 2 GPU's (I don't think much is shared, maybe the various caches).

    Think SLI but on silicon.

    If you put 2 Geforce GPU's on a single die, the effect will be the same: double the performance for double the area.

    Architecture dictates the perf/GPU. That doesn't mean you can't simply double it at any time to get double the performance.
  • PeteH - Tuesday, November 1, 2011 - link

    But I'm not talking about relative performance, I'm talking about performance per unit area added. When bound by one operation adding a core that supports a different operation is wasted space.

    So yes, doubling space always doubles relative performance, but adding 20 square millimeters means different things to the performance of different architectures.

Log in

Don't have an account? Sign up now