CPU & GPU Hardware Analyzed

Although Microsoft did its best to minimize AMD’s role in all of this, the Xbox One features a semi-custom 28nm APU designed with AMD. If this sounds familiar it’s because the strategy is very similar to what Sony employed for the PS4’s silicon.

The phrase semi-custom comes from the fact that AMD is leveraging much of its already developed IP for the SoC. On the CPU front we have two Jaguar compute units, each one with four independent processor cores and a shared 2MB L2 cache. The combination of the two give the Xbox One its 8-core CPU. This is the same basic layout of the PS4‘s SoC.

If you’re not familiar with it, Jaguar is the follow-on to AMD’s Bobcat core - think of it as AMD’s answer to the Intel Atom. Jaguar is a 2-issue OoO architecture, but with roughly 20% higher IPC than Bobcat thanks to a number of tweaks. In ARM terms we’re talking about something that’s faster than a Cortex A15. I expect Jaguar to be close but likely fall behind Intel’s Silvermont, at least at the highest shipping frequencies. Jaguar is the foundation of AMD’s Kabini and Temash APUs, where it will ship first. I’ll have a deeper architectural look at Jaguar later this week. Update: It's live!

Inside the Xbox One, courtesy Wired

There’s no word on clock speed, but Jaguar at 28nm is good for up to 2GHz depending on thermal headroom. Current rumors point to both the PS4 and Xbox One running their Jaguar cores at 1.6GHz, which sounds about right. In terms of TDP, on the CPU side you’re likely looking at 30W with all cores fully loaded.

The move away from PowerPC to 64-bit x86 cores means the One breaks backwards compatibility with all Xbox 360 titles. Microsoft won’t be pursuing any sort of a backwards compatibility strategy, although if a game developer wanted to it could port an older title to the new console. Interestingly enough, the first Xbox was also an x86 design - from a hardware/ISA standpoint the new Xbox One is backwards compatible with its grandfather, although Microsoft would have to enable that as a feature in software - something that’s quite unlikely.

Microsoft Xbox One vs. Sony PlayStation 4 Spec comparison
  Xbox 360 Xbox One PlayStation 4
CPU Cores/Threads 3/6 8/8 8/8
CPU Frequency 3.2GHz 1.6GHz (est) 1.6GHz (est)
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB
GPU Cores   768 1152
Peak Shader Throughput 0.24 TFLOPS 1.23 TFLOPS 1.84 TFLOPS
Embedded Memory 10MB eDRAM 32MB eSRAM -
Embedded Memory Bandwidth 32GB/s 102GB/s -
System Memory 512MB 1400MHz GDDR3 8GB 2133MHz DDR3 8GB 5500MHz GDDR5
System Memory Bus 128-bits 256-bits 256-bits
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 176.0 GB/s
Manufacturing Process   28nm 28nm

On the graphics side it’s once again obvious that Microsoft and Sony are shopping at the same store as the Xbox One’s SoC integrates an AMD GCN based GPU. Here’s where things start to get a bit controversial. Sony opted for an 18 Compute Unit GCN configuration, totaling 1152 shader processors/cores/ALUs. Microsoft went for a far smaller configuration: 768 (12 CUs).

Microsoft can’t make up the difference in clock speed alone (AMD’s GCN seems to top out around 1GHz on 28nm), and based on current leaks it looks like both MS and Sony are running their GPUs at the same 800MHz clock. The result is a 33% reduction in compute power, from 1.84 TFLOPs in the PS4 to 1.23 TFLOPs in the Xbox One. We’re still talking about over 5x the peak theoretical shader performance of the Xbox 360, likely even more given increases in efficiency thanks to AMD’s scalar GCN architecture (MS quotes up to 8x better GPU performance) - but there’s no escaping the fact that Microsoft has given the Xbox One less GPU hardware than Sony gave the PlayStation 4. Note that unlike the Xbox 360 vs. PS3 era, Sony's hardware advantage here won't need any clever developer work to extract - the architectures are near identical, Sony just has more resources available to use.

Remember all of my talk earlier about a slight pivot in strategy? Microsoft seems to believe that throwing as much power as possible at the next Xbox wasn’t the key to success and its silicon choices reflect that.

Introduction Memory Subsystem
POST A COMMENT

244 Comments

View All Comments

  • Niabureth - Wednesday, May 29, 2013 - link

    And just how do you expect them to do that? Decisions on what hardware to use was made a lot earlier than Sony's PS4 presentation, meaning that train has already left the station. I'm guessing AMD is massproducing the hardware by now. Mircosoft: Oh we saw that Sony is going for a much more powerful architecture and we don't want any of the million of APU's u've just produced for us! Reply
  • JDG1980 - Wednesday, May 22, 2013 - link

    If AMD is using Jaguar here, isn't that basically an admission that Bulldozer/Piledriver is junk, at least for gaming/desktop usage? Why don't they use a scaled-up Jaguar in their desktop APUs instead of Piledriver? The only thing Bulldozer/Piledriver seems to be good for is very heavily threaded loads - i.e. servers. Most desktop users are well served by even 4 cores, and it looks like they've already scaled Jaguar to 8. And AMD is getting absolutely killed on the IPC front on the desktop - if Jaguar is a step in the right direction then by all means it should be taken. BD/PD is a sunk cost, it should be written off, or restricted to Opterons only. Reply
  • tipoo - Wednesday, May 22, 2013 - link

    Too big. Reply
  • Slaimus - Wednesday, May 22, 2013 - link

    Bulldozer/Piledriver needs SOI. Steamroller is not ready yet, and it is not portable outside of Globalfoundries gate-first 28nm architecture. Jaguar is bulk 28nm and gate-last, which can be made by TSMC in large quantities at lower cost per wafer. Reply
  • JDG1980 - Wednesday, May 22, 2013 - link

    All the more reason for AMD to switch to Jaguar in their mass-market CPUs and APUs.
    I'd be willing to bet money that a 4-core Jaguar clocked up to 3 GHz would handily beat a 4-module ("8-core") Piledriver clocked to 4 GHz. BD/PD is AMD's Netburst, a total FAIL of an architecture that needs to be dropped before it takes the whole company down with it.
    Reply
  • Exophase - Wednesday, May 22, 2013 - link

    Jaguar can't be clocked at 3GHz - 2GHz is closer to the hard limit as far as we currently know. It's clock limited by design, just look at the clock latency of FPU operations. IPC is at best similar to Piledriver (in practice probably a little worse), so in tasks heavily limited by single threaded performance Jaguar will do much worse. Consoles can bear limited single threaded performance to some extent but PCs can't. Reply
  • Spunjji - Wednesday, May 22, 2013 - link

    It's effectively a low-power optimised Athlon 64 with added bits, so it's not going to scale any higher than Phenom did. That already ran out of steam on the desktop. Bulldozer/Piledriver may not have been the knockout blow AMD needed but they're scaling better than die-shrinking the same architecture yet again would have. Reply
  • JDG1980 - Wednesday, May 22, 2013 - link

    Bobcat/Jaguar is a new architecture specifically designed for low-power usage. It's not the same as the K10 design, though it wouldn't surprise me if they did share some parts.
    And even just keeping K10 with tweaks and die-shrinks would have worked better on the desktop than the Faildozer series. Phenom II X6 1100T was made on an outdated 45nm process, and still beat the top 32nm Bulldozer in most benchmarks. A die-shrink to 28nm would not only be much cheaper to manufacture per chip than Bulldozer/Piledriver, but would perform better as well. It's only pride and the refusal to admit sunk costs that has kept AMD on their trail of fail.
    Reply
  • kyuu - Wednesday, May 22, 2013 - link

    That's a nice bit of FUD there. K10 had pretty much been pushed as far as it was going to go. Die-shrinking and tweaking it was not going to cut it. AMD needed a new architecture.

    Piledriver already handily surpasses K10 in every metric, including single-threaded performance.
    Reply
  • JDG1980 - Wednesday, May 22, 2013 - link

    In terms of single-threaded performance *per clock*, Thuban > Piledriver. Sure, if you crank up the clock rate *and the heat and power consumption* on Piledriver, you can barely edge out Deneb and Thuban on single-threaded benchmarks. But if you clock them the same, the Thuban uses less power, generates less heat, and performs better. Tom's Hardware once ran a similar test with Netburst vs Pentium M, and his conclusion was quite blunt: the test called into question the P4's "right to exist". The same is true of the Bulldozer/Piledriver line.
    And I don't buy the argument that K10 is too old to be fixable. Remember that Ivy Bridge and Haswell are part of a line stretching all the way back to the original Pentium Pro. The one time Intel tried a clean break with the past (Netburst) it was an utter fail. The same is true of AMD's excavation equipment line and for the same reason - IPC is terrible so the only way to get acceptable performance is to crank up clock rate, power, noise, and thermals.
    Reply

Log in

Don't have an account? Sign up now