A Word on Packaging

Unlike the first two iPads, the 3rd generation iPad abandons the high density flip-chip PoP SoC/DRAM stack and uses a discrete, flip-chip BGA package for the SoC and two discrete BGA packages for the DRAMs.

If you think of SoC silicon as a stack, the lowest layer is where you'll find the actual transistor logic, while the layers of metal above it connect everything together. In the old days, the silicon stack would sit just as I've described it—logic at the bottom, metal layers on top. Pads around the perimeter of the top of the silicon would connect to very thin wires, that would then route to the package substrate and eventually out to balls or pins on the underside of the package. These wire bonded packages, as they were called, had lower limits of how many pins you could have connecting to your chip.

There are also cooling concerns. In a traditional wire bonded package, your cooling solution ultimately rests on a piece of your packaging substrate. The actual silicon itself isn't exposed.

As its name implies, a flip-chip package is literally the inverse of this. Instead of the metal layers being at the top of the stack, before packaging the silicon is inverted and the metal layers are at the bottom of the stack. Solder bumps at the top of the silicon stack (now flipped and at the bottom) connect the topmost metal layer to the package itself. Since we're dealing with solder bumps on the silicon itself rather than wires routed to the edge of the silicon, there's much more surface area for signals to get in/out of the silicon.

Since the chip is flipped, the active logic is now exposed in a flip-chip package and the hottest part of the silicon can be directly attached to a cooling solution.


An example of a PoP stack

To save on PCB real estate however, many SoC vendors would take a flip-chip SoC and stack DRAM on top of it in a package-on-package (PoP) configuration. Ultimately this re-introduces many of the problems from older packaging techniques—mainly it becomes difficult to have super wide memory interfaces as your ball-out for the PoP stack is limited to the area around your die, and cooling is a concern once more. For low power, low bandwidth mobile SoCs this hasn't really been a problem, which is why we see PoP stacks deployed all over the place.

Take a look at the A5, a traditional FC-BGA SoC with PoP DRAM vs. the A5X (this isn't to scale):


Images courtesy iFixit

The A5X in this case is a FC-BGA SoC but without any DRAM stacked on top of it. The A5X is instead covered in a thermally conductive paste and then with a metallic heatspreader to conduct heat away from the SoC and protect the silicon.

Given the size and complexity of the A5X SoC, it's no surprise that Apple didn't want to insulate the silicon with a stack of DRAM on top of it. In typical package-on-package stacks, you'd see solder bumps around the silicon, on the package itself, that a separate DRAM package would adhere to. Instead of building up a PoP stack here, Apple simply located its two 64-bit DRAM devices on the opposite side of the iPad's logic board and routed the four 32-bit LP-DDR2 memory channels through the PCB layers.


iPad (3rd gen) logic board back (top) and front (bottom), courtesy iFixit

If I'm seeing this correctly, it looks like the DRAM devices are shifted lower than the center point of the A5X. Routing high speed parallel interfaces isn't easy and getting the DRAM as close to the memory controller as possible makes a lot of sense. For years motherboard manufacturers and chipset vendors alike complained about the difficulties of routing a high-speed, 128-bit parallel DRAM interface on a (huge, by comparison) ATX motherboard. What Apple and its partners have achieved here is impressive when you consider that this type of interface only made it to PCs within the past decade.

Looking Forward: 12.8GB/s, the Magical Number

The DRAM speeds in the new iPad haven't changed. The -8D in the Elpida DRAM string tells us this memory is rated at the same 800MHz datarate as what's used in the iPhone 4S and iPad 2. With twice the number of channels to transfer data over however, the total available bandwidth (at least to the GPU) doubles. I brought back the graph I made for our iPhone 4S review to show just how things have improved:

The A5X's memory interface is capable of sending/receiving data at up to 12.8GB/s. While this is still no where near the 100GB/s+ we need for desktop quality graphics at Retina Display resolutions, it's absolutely insane for a mobile SoC. Bandwidth utilization is another story entirely—we have no idea how good Apple's memory controller is (it is designed in-house), but there's 4x the theoretical bandwidth available to the A5X as there is to NVIDIA's Tegra 3.

There's a ton of memory bandwidth here, but Apple got to this point by building a huge, very power hungry SoC. Too power hungry for use in a smartphone. As I mentioned at the start of this article, the SoC alone in the new iPad can consume more power than the entire iPhone 4S (e.g. A5X running Infinity Blade 2 vs. iPhone 4S loading a web page):

Power Consumption Comparison
  Apple A5X (SoC + mem interface) Apple iPhone 4S (entire device)
Estimated Power Consumption 2.6W—Infinity Blade 2 1.6W—Web Page Loading

There's no question that we need this much (and more) memory bandwidth, but the A5X's route to delivering it is too costly from a standpoint of power. There is a solution to this problem however: Wide IO DRAM.

Instead of using wires to connect DRAM to solder balls on a package that's then stacked on top of your SoC package, Wide IO DRAM uses through-silicon-vias (TSVs) to connect a DRAM die directly to the SoC die. It's an even more costly packaging technique, but the benefits are huge.

Just as we saw in our discussion of flip-chip vs. wire bonded packages, conventional PoP solutions have limits to how many IO pins you can have in the stack. If you can use the entire silicon surface for direct IO however, you can build some very wide interfaces. It also turns out that these through silicon interfaces are extremely power efficient.

The first Wide IO DRAM spec calls for a 512-bit, 200MHz SDR (single data rate) interface delivering an aggregate of 12.8GB/s of bandwidth. The bandwidth comes at much lower power consumption, while delivering all of the integration benefits of a traditional PoP stack. There are still cooling concerns, but for lower wattage chips they are less worrisome.

Intel originally predicted that by 2015 we'd see 3D die stacking using through-silicon-vias. Qualcomm's roadmaps project usage of TSVs by 2015 as well. The iPhone won't need this much bandwidth in its next generation thanks to a lower resolution display, but when the time comes, there will be a much lower power solution available thanks to Wide IO DRAM.

Oh and 2015 appears to be a very conservative estimate. I'm expecting to see the first Wide IO memory controllers implemented long before then...

The GPU & Apple Builds a Quad-Channel Memory Controller The Impact of Larger Memory
Comments Locked

234 Comments

View All Comments

  • Ammaross - Wednesday, March 28, 2012 - link

    "It has the fastest and best of nearly every component inside and out."

    Except the CPU is the same as in the iPad2, and by far not the "best" by any stretch of the imagination. Hey, what's the problem though? I have this nice shiny new tower, loads of RAM, bluray, SSD, and terabytes of hard drive space. Oh, don't mind that Pentium D processor, it's "good enough," or you must be using it wrong.
  • tipoo - Wednesday, March 28, 2012 - link

    What's better that's shipping today? Higher clocked A9s, or quad core ones like the T3? Either would mean less battery life, worse thermal issues, or higher costs. Krait isn't in a shipping product yet. Tegra 3's additional cores still have dubious benefit. These operating systems don't have true multitasking, you basically have one thing running at a time plus some background services like music, and even on desktops after YEARS few applications scale well past four cores outside of the professional space. The next iPad will be out before quad core on tablets becomes useful, that I assure you of.
  • zorxd - Wednesday, March 28, 2012 - link

    I'd gladly trade GPU power for CPU power.
    That GPU is power hungry too, probably more than two extra A9 cores, and the benefit is even more dubious unless you are a hardcore tablet gamer.
  • TheJian - Wednesday, March 28, 2012 - link

    LOL, the problem is you'll have to buy that new ipad to take advantage because YOURS doesn't have those cores now. Once apps become available that utilize these cores (trust me their coming, anyone making an app today knows they'll have at least quad cpu and gpu in their phones their programming for next year, heck end of this year), the tegra 3 won't need to be thrown away to multitask. Google just has to put out the next rev of android and these tegra3's etc should become even better (I say etc because everyone else has quad coming at 28nm).

    The writing is on the wall for single/dual. The quad race on phones/tables is moving FAR faster than it did on PC's. After win8 these things will start playing a lot more nicely with our current desktops. Imagine an Intel x86 based quad (hopefully) with someone else's graphics running the same stuff as your desktop without making you cringe over the performance hit.

    I'm not quite sure how you get to Tegra3 costing more, having higher thermals (umm, ipad 3 is hot, not tegra3). The die is less than 1/2 the size of A5x. Seems they could easily slap double the gpus and come out about even with QUAD cpu too. IF NV double the gpus what would the die size be? 162mm or smaller I'd say. They should have went 1920x1200 which would have made it faster than ipad 2 no matter what game etc you ran. Unfortunately the retina screen makes it slower (which is why apple isn't pushing TEGRA ZONE quality graphics in their games for the most part...Just blade?). They could have made this comparison a no brainer if they would have went 1920x1200. I'm still waiting to see how long these last running HOT for a lot of people. I'm not a fan of roasted nuts :) Too bad they didn't put it off for 3 months and die shrink it to at least 32nm or even 40nm would have helped the heat issue, upclock the cpu a bit to make up for 2 core etc. More options to even things out. Translation everything at xmas or later will be better...Just wait if you can no matter what you want. I'm salivating over a galaxy S2 but it's just not quite powerful enough until the shrinks for s3 etc.
  • tipoo - Wednesday, March 28, 2012 - link

    I didn't say the Tegra 3 is more expensive or has higher thermals; I said the A5X, with higher clocked cores or more cores would be, and we all know Apple likes comfortable margins. Would I like a quad core A5X? Sure. Would I pay more for it? Nope. Would I switch for reduced battery life and an even hotter chip than what Apple already made? Nope. With the retina display, the choice to put more focus on the GPU made sense, with Android tablets resolution maybe Tegra 3 makes more sense, so you can stop attacking straw man arguments I never made. There are still only a handful of apps that won't run on the first iPad and that's two years old, "only" two cores won't hold you back for a while, plus iOS devs have less variation of specs to deal with so I'm sure compatibility with this iPad will be assured for at least two or three years. If I was buying one today, which I am not, I wouldn't be worried about that.

    Heck, even the 3GS runs most apps still and gets iOS updates.
  • pickica - Monday, April 2, 2012 - link

    The New Ipad 2 is probably gonna have a dual A15, which means dual cores will stay.
  • Peter_St - Monday, April 2, 2012 - link

    The problem here is that most people have no idea what they are talking about. It was just few years ago that we all used Dual Core CPUs on our Desktop Computers and we ran way more CPU load intensive applications, and now all of a sudden some marketing bonzo from HTC and Samsung is telling me that I need Quasd Core CPU for Tablets and mobile devices, and 2+ GB of RAM,
    If you really need that hardware to run your mobile OS, then I would recommend you to fire all your OS developers, get a new crew, and start from scratch...
  • BSMonitor - Wednesday, March 28, 2012 - link

    If you were to run the same applications a tablet is designed to, then yes, your Pentium D would actually be overkill.
  • PeteH - Wednesday, March 28, 2012 - link

    The point is made in the article is that it would be impossible provide the quad GPUs (necessary to handle that display) AND quad CPUs. Given you can only do one or the other, quad GPUs is the right choice.
  • zorxd - Wednesday, March 28, 2012 - link

    was it also the right choice to NOT upgrade the GPU when going from the iPhone 3GS to iPhone 4?

Log in

Don't have an account? Sign up now