AMD Discloses Bobcat & Bulldozer Architectures at Hot Chips 2010by Anand Lal Shimpi on August 24, 2010 1:33 AM EST
It’s an Out of Order Atom
Ever since the Pentium Pro (P6), we have been blessed with out of order microprocessor architectures - these being designs that can execute instructions out of program order to improve performance. Out of order architectures let you schedule independent instructions ahead of others that are either waiting for data from main memory or waiting for specific execution resources to free up. The resulting performance boost comes at the expense of power and die size. All of the tracking logic to make sure that instructions executed out of order still retire in order eats up die area as well as more power.
When Intel designed the Atom processor it went back to an in-order design as a way of reducing power. Intel has committed to using in-order architectures in Atom for 4 - 5 years post introduction (that would end sometime in the 2012 - 2013 time frame).
For smartphones, Intel’s commitment to in-order makes sense. Average power consumption under load needs to remain at less than 1W and you simply can’t hit that with an out-of-order Atom at 45nm.
For netbooks and notebooks however, the tradeoff makes less sense. Jarred has often argued that a CULV notebook is a far better performer than a netbook at very similar price/battery life metrics. No one is pleased with Atom’s performance in a netbook, but there’s clearly demand for the form factor and price point. Where there’s an architectural opportunity like this, AMD is usually there to act.
Over the past decade AMD has refrained from copying an Intel design, instead AMD usually looks to leapfrog Intel by implementing forward looking technologies earlier than its competitor. We saw this with the 64-bit K8 and the cache hierarchy of the original Phenom and Phenom II processors. Both featured design decisions that Intel would later adopt, they were simply ahead of their time.
With Atom stuck in an in-order world for the near future, AMD’s opportunity to innovate is clear.
Admittedly I was caught off guard by Bobcat’s architecture: it’s a dual-issue design, the first AMD has introduced since the K6 and also the same issue width Intel chose for Atom. Where AMD and Intel diverge however is in the execution side: Bobcat is a fully out of order architecture.
The move to out of order should provide a healthy single threaded performance boost over Atom, assuming AMD can ramp clocks up. Bobcat has a 15 stage integer pipeline, very close to Atom's 16 stage pipe. The two pipeline diagrams are below:
Intel's Atom pipeline
You’ll note that there are technically six fetch stages, although only the first three are included in the 15 stage number I mentioned above. AMD mentioned that the remaining three stages are used for branch prediction, but in a manner it is unwilling to disclose at this time due to competitive concerns.
Bobcat has two independent, dual ported integer scheduler. One feeds two ALUs (one of which can perform integer multiplies) while the other feeds two AGUs (one for loads and one for stores).
The FPU has a single dual ported scheduler that feeds two independent FPUs. Similar to the Atom processor, only one of the ports can handle floating point multiplies. The FP mul and add units can perform two single precision (32-bit) multiplies/adds per cycle. Like the integer side, the FPU uses a physical register file to reduce power.
Bobcat supports SSE1-3, with future versions adding more instructions as necessary.
Bobcat supports out of order loads and stores similar to Intel’s Core architecture as well.
The Bobcat core has a 3-cycle 64KB L1 (32KB instruction + 32KB data cache) that’s 8-way set associative. The L2 cache is a 17-cycle, 512KB 16-way set associative cache. I originally measured Atom’s L1 and L2 at 3 and 18 cycles respectively (I’ve heard numbers as low as 15 for Atom’s L2) so AMD is definitely in the right ballpark here.
Intel's Atom Microarchitecture
Unlike the original Atom, Bobcat will never ship as a standalone microprocessor. Instead it will be integrated with other cores and a GPU and sold as a single SoC. The first incarnation of Bobcat will be a processor due out in early 2011 for netbooks and thin and light notebooks called Ontario. Ontario will integrate two Bobcat cores with an AMD GPU manufactured on TSMC’s 40nm process (Bobcat will be the first x86 core made at TSMC). This will be the first Fusion product to hit the market.
Note that there's an on-die memory controller but it's actually housed in between the CPU and GPU in order to equally serve both masters.