A Word on Packaging

Unlike the first two iPads, the 3rd generation iPad abandons the high density flip-chip PoP SoC/DRAM stack and uses a discrete, flip-chip BGA package for the SoC and two discrete BGA packages for the DRAMs.

If you think of SoC silicon as a stack, the lowest layer is where you'll find the actual transistor logic, while the layers of metal above it connect everything together. In the old days, the silicon stack would sit just as I've described it—logic at the bottom, metal layers on top. Pads around the perimeter of the top of the silicon would connect to very thin wires, that would then route to the package substrate and eventually out to balls or pins on the underside of the package. These wire bonded packages, as they were called, had lower limits of how many pins you could have connecting to your chip.

There are also cooling concerns. In a traditional wire bonded package, your cooling solution ultimately rests on a piece of your packaging substrate. The actual silicon itself isn't exposed.

As its name implies, a flip-chip package is literally the inverse of this. Instead of the metal layers being at the top of the stack, before packaging the silicon is inverted and the metal layers are at the bottom of the stack. Solder bumps at the top of the silicon stack (now flipped and at the bottom) connect the topmost metal layer to the package itself. Since we're dealing with solder bumps on the silicon itself rather than wires routed to the edge of the silicon, there's much more surface area for signals to get in/out of the silicon.

Since the chip is flipped, the active logic is now exposed in a flip-chip package and the hottest part of the silicon can be directly attached to a cooling solution.


An example of a PoP stack

To save on PCB real estate however, many SoC vendors would take a flip-chip SoC and stack DRAM on top of it in a package-on-package (PoP) configuration. Ultimately this re-introduces many of the problems from older packaging techniques—mainly it becomes difficult to have super wide memory interfaces as your ball-out for the PoP stack is limited to the area around your die, and cooling is a concern once more. For low power, low bandwidth mobile SoCs this hasn't really been a problem, which is why we see PoP stacks deployed all over the place.

Take a look at the A5, a traditional FC-BGA SoC with PoP DRAM vs. the A5X (this isn't to scale):


Images courtesy iFixit

The A5X in this case is a FC-BGA SoC but without any DRAM stacked on top of it. The A5X is instead covered in a thermally conductive paste and then with a metallic heatspreader to conduct heat away from the SoC and protect the silicon.

Given the size and complexity of the A5X SoC, it's no surprise that Apple didn't want to insulate the silicon with a stack of DRAM on top of it. In typical package-on-package stacks, you'd see solder bumps around the silicon, on the package itself, that a separate DRAM package would adhere to. Instead of building up a PoP stack here, Apple simply located its two 64-bit DRAM devices on the opposite side of the iPad's logic board and routed the four 32-bit LP-DDR2 memory channels through the PCB layers.


iPad (3rd gen) logic board back (top) and front (bottom), courtesy iFixit

If I'm seeing this correctly, it looks like the DRAM devices are shifted lower than the center point of the A5X. Routing high speed parallel interfaces isn't easy and getting the DRAM as close to the memory controller as possible makes a lot of sense. For years motherboard manufacturers and chipset vendors alike complained about the difficulties of routing a high-speed, 128-bit parallel DRAM interface on a (huge, by comparison) ATX motherboard. What Apple and its partners have achieved here is impressive when you consider that this type of interface only made it to PCs within the past decade.

Looking Forward: 12.8GB/s, the Magical Number

The DRAM speeds in the new iPad haven't changed. The -8D in the Elpida DRAM string tells us this memory is rated at the same 800MHz datarate as what's used in the iPhone 4S and iPad 2. With twice the number of channels to transfer data over however, the total available bandwidth (at least to the GPU) doubles. I brought back the graph I made for our iPhone 4S review to show just how things have improved:

The A5X's memory interface is capable of sending/receiving data at up to 12.8GB/s. While this is still no where near the 100GB/s+ we need for desktop quality graphics at Retina Display resolutions, it's absolutely insane for a mobile SoC. Bandwidth utilization is another story entirely—we have no idea how good Apple's memory controller is (it is designed in-house), but there's 4x the theoretical bandwidth available to the A5X as there is to NVIDIA's Tegra 3.

There's a ton of memory bandwidth here, but Apple got to this point by building a huge, very power hungry SoC. Too power hungry for use in a smartphone. As I mentioned at the start of this article, the SoC alone in the new iPad can consume more power than the entire iPhone 4S (e.g. A5X running Infinity Blade 2 vs. iPhone 4S loading a web page):

Power Consumption Comparison
  Apple A5X (SoC + mem interface) Apple iPhone 4S (entire device)
Estimated Power Consumption 2.6W—Infinity Blade 2 1.6W—Web Page Loading

There's no question that we need this much (and more) memory bandwidth, but the A5X's route to delivering it is too costly from a standpoint of power. There is a solution to this problem however: Wide IO DRAM.

Instead of using wires to connect DRAM to solder balls on a package that's then stacked on top of your SoC package, Wide IO DRAM uses through-silicon-vias (TSVs) to connect a DRAM die directly to the SoC die. It's an even more costly packaging technique, but the benefits are huge.

Just as we saw in our discussion of flip-chip vs. wire bonded packages, conventional PoP solutions have limits to how many IO pins you can have in the stack. If you can use the entire silicon surface for direct IO however, you can build some very wide interfaces. It also turns out that these through silicon interfaces are extremely power efficient.

The first Wide IO DRAM spec calls for a 512-bit, 200MHz SDR (single data rate) interface delivering an aggregate of 12.8GB/s of bandwidth. The bandwidth comes at much lower power consumption, while delivering all of the integration benefits of a traditional PoP stack. There are still cooling concerns, but for lower wattage chips they are less worrisome.

Intel originally predicted that by 2015 we'd see 3D die stacking using through-silicon-vias. Qualcomm's roadmaps project usage of TSVs by 2015 as well. The iPhone won't need this much bandwidth in its next generation thanks to a lower resolution display, but when the time comes, there will be a much lower power solution available thanks to Wide IO DRAM.

Oh and 2015 appears to be a very conservative estimate. I'm expecting to see the first Wide IO memory controllers implemented long before then...

The GPU & Apple Builds a Quad-Channel Memory Controller The Impact of Larger Memory
Comments Locked

234 Comments

View All Comments

  • name99 - Friday, March 30, 2012 - link

    Just to clarify, this is NOT some Apple proprietary thing. The Apple ports are following the USB charging spec. This is an optional part of the spec, but any other manufacturer is also welcome to follow it --- if they care about the user experience.
  • darkcrayon - Thursday, March 29, 2012 - link

    All recent Macs (last 2-3 years) can supply additional power via their USB ports which is enough to charge an iPad that's turned on (though probably not if it's working very hard doing something). Most non-Mac computer USB ports can only deliver the standard amount of USB power, which is why you're seeing this.

    Your Lenovo *should* still recharge the iPad if the iPad is locked and sleeping, though it will do so very slowly.
  • dagamer34 - Friday, March 30, 2012 - link

    I did the calculations and it would take about 21 hours to recharge an iPad 3 on a normal non-fast charging USB port from dead to 100%. Keep in mind, we're talking about a battery that's larger in capacity than the 11" MacBook Air.
  • snoozemode - Thursday, March 29, 2012 - link

    http://www.qualcomm.com/media/documents/files/snap...
  • Aenean144 - Thursday, March 29, 2012 - link

    Anandtech: "iPhoto is a very tangible example of where Apple could have benefitted from having four CPU cores on A5X"

    Is iPhoto really a kind of app that can actually take advantage of 2 cores? If there are batch image processing type functionality, certainly, though I don't know if iPhoto for iOS has this type of functionality. The slowness could just be from a 1.0 product and further tuning and refinement will fix it.

    I'm typically highly skeptical of the generic "if the app is multithreaded, it can make use of all of the cores" line of thought. Basically all of the threads, save one, are typically just waiting on user input.
  • Anand Lal Shimpi - Thursday, March 29, 2012 - link

    It very well could be that iOS iPhoto isn't well written, but in using the editing tools I can typically use 60 - 95% of the A5X's two hardware threads. Two more cores, at the bare minimum, would improve UI responsiveness as it gives the scheduler another, lightly scheduled core to target.

    Alternatively, a 50% increase in operating frequency and an improvement in IPC could result in the same net benefit.

    Take care,
    Anand
  • shompa - Friday, March 30, 2012 - link

    *hint* Use top on a iOS/Android device and you will see 30-60 processes at all time. The single threaded, single program thinking is Windows specific and have been solved on Unix since late 1960. Todays Windows phones are all single threaded because windows kernel is not good at Multit hreding.

    With many processes running, it will always be beneficial to have additional cores. Apple have also solved it in OSX by adding Grand central dispatch in their development tools making multithreaded programs easy.

    Iphoto for Ipad: Editing 3 million pixel will demand huge amount of CPU/GPU time + memory. Apple have so far been able to program elegant solutions around the limits of ARM CPUs by using NOVA SIMD extensions and GPU acceleration. An educated guess is that Iphoto is not fully optimized and will be at later time.

    (the integrated approach gives Apple a huge advantage over Android since Apple can accelerate stuff with SIMDs. Google does not control the hardware and can therefore not optimize its code. That is one of the reasons why single core A4 was almost as fast as dual core Tegras. I was surpassed when Google managed to implement their own acceleration in Andriod 4.X. Instead of SIMD, Google uses GL, since all devices have graphics cards. This is the best feuture in Android 4.x.)
  • name99 - Thursday, March 29, 2012 - link

    [/quote]
    Apple’s design lifespan directly correlates to the maturity of the product line as well as the competitiveness of the market the product is in.
    [/quote]

    I think this is completely the wrong way to look at it. Look across the entire Apple product line.
    I'd say a better analysis of chassis is that when a product first comes out, Apple can't be sure how it will be used and perceived, so there is some experimentation with different designs. But as time goes by, the design becomes more and more perfected (yes yes, if you hate Apple we know your feelings about the use of this word) and so there's no need to change until something substantial drives a large change.

    Look, for example, at the evolution of iMac from the Luxo Jr version to the white all-in-on-flatscreen, to the current aluminum-edged flatscreen which is largely unchanged for what, five or six years now. Likewise for the MacBook Pro.
    Look at the MacBook Air. The first two revs showed the same experimentation, trying different curves and angles, but Apple (and I'd say customers) seems to feel that the current wedge shape is optimal --- a definite improvement on the previous MBA models, and without anything that obviously needs to be improved. (Perhaps the sharp edges could be rounded a little, and if someone could work out the mechanicals, perhaps the screen could tilt further back.)

    And people accept and are comfortable with this --- in spite of "people buy Apple as a fashion statement idiocy". No-one will be at all upset if the Ivy League iMacs and MBAs and Mac Minis look like their predecessors (apart from minor changes like USB3 ports) --- in fact people expect it.

    So for iPhone and iPad. Might Apple keep using the same iPhone4 chassis for the next two years, with only minor changes? Why not? There's no obvious improvement it needs.
    (Except, maybe, a magnet on the side like iPad has, so you could slip a book-like case on it that covered the screen, and switched it on by opening the book.)
    Likewise for iPad.

    New must have features in phones/tablets (NFC? near-field charging? waterproof? built-in projector like Samsung Beam?) might change things. But absent those, really, the issue is not "Apple uses two year design cycles", it is "Apple perfects the design, then sticks with it".
  • mr_ripley - Thursday, March 29, 2012 - link

    "In situations where a game is available in both the iOS app store as well as NVIDIA's Tegra Zone, NVIDIA generally delivers a comparable gaming experience to what you get on the iPad... The iPad's GPU performance advantage just isn't evident in those cases..."

    Would you expect it to be if all the games you compare have not been optimized for the new ipad yet? They run at great frame rates but suffer in visuals or are only available at ipad 2 resolutions. The tegra zone games are clearly optimized for Tegra while their iOS counterparts are not optimized for the A5x, so of course the GPU advantage is not evident.

    This comparison does not seem fair unless there is a valid reason to believe that the tegra zone games cannot be further enhanced/optimized to take advantage of the new ipad hardware.

    I suspect that the tegra zone games optimized for A5x will offer a tangibly superior performance and experience. And the fact that the real world performance suffers today does not mean we will not see it shortly.
  • Steelbom - Thursday, March 29, 2012 - link

    Exactly this.

Log in

Don't have an account? Sign up now