CPU & GPU Hardware Analyzed

Although Microsoft did its best to minimize AMD’s role in all of this, the Xbox One features a semi-custom 28nm APU designed with AMD. If this sounds familiar it’s because the strategy is very similar to what Sony employed for the PS4’s silicon.

The phrase semi-custom comes from the fact that AMD is leveraging much of its already developed IP for the SoC. On the CPU front we have two Jaguar compute units, each one with four independent processor cores and a shared 2MB L2 cache. The combination of the two give the Xbox One its 8-core CPU. This is the same basic layout of the PS4‘s SoC.

If you’re not familiar with it, Jaguar is the follow-on to AMD’s Bobcat core - think of it as AMD’s answer to the Intel Atom. Jaguar is a 2-issue OoO architecture, but with roughly 20% higher IPC than Bobcat thanks to a number of tweaks. In ARM terms we’re talking about something that’s faster than a Cortex A15. I expect Jaguar to be close but likely fall behind Intel’s Silvermont, at least at the highest shipping frequencies. Jaguar is the foundation of AMD’s Kabini and Temash APUs, where it will ship first. I’ll have a deeper architectural look at Jaguar later this week. Update: It's live!

Inside the Xbox One, courtesy Wired

There’s no word on clock speed, but Jaguar at 28nm is good for up to 2GHz depending on thermal headroom. Current rumors point to both the PS4 and Xbox One running their Jaguar cores at 1.6GHz, which sounds about right. In terms of TDP, on the CPU side you’re likely looking at 30W with all cores fully loaded.

The move away from PowerPC to 64-bit x86 cores means the One breaks backwards compatibility with all Xbox 360 titles. Microsoft won’t be pursuing any sort of a backwards compatibility strategy, although if a game developer wanted to it could port an older title to the new console. Interestingly enough, the first Xbox was also an x86 design - from a hardware/ISA standpoint the new Xbox One is backwards compatible with its grandfather, although Microsoft would have to enable that as a feature in software - something that’s quite unlikely.

Microsoft Xbox One vs. Sony PlayStation 4 Spec comparison
  Xbox 360 Xbox One PlayStation 4
CPU Cores/Threads 3/6 8/8 8/8
CPU Frequency 3.2GHz 1.6GHz (est) 1.6GHz (est)
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB
GPU Cores   768 1152
Peak Shader Throughput 0.24 TFLOPS 1.23 TFLOPS 1.84 TFLOPS
Embedded Memory 10MB eDRAM 32MB eSRAM -
Embedded Memory Bandwidth 32GB/s 102GB/s -
System Memory 512MB 1400MHz GDDR3 8GB 2133MHz DDR3 8GB 5500MHz GDDR5
System Memory Bus 128-bits 256-bits 256-bits
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 176.0 GB/s
Manufacturing Process   28nm 28nm

On the graphics side it’s once again obvious that Microsoft and Sony are shopping at the same store as the Xbox One’s SoC integrates an AMD GCN based GPU. Here’s where things start to get a bit controversial. Sony opted for an 18 Compute Unit GCN configuration, totaling 1152 shader processors/cores/ALUs. Microsoft went for a far smaller configuration: 768 (12 CUs).

Microsoft can’t make up the difference in clock speed alone (AMD’s GCN seems to top out around 1GHz on 28nm), and based on current leaks it looks like both MS and Sony are running their GPUs at the same 800MHz clock. The result is a 33% reduction in compute power, from 1.84 TFLOPs in the PS4 to 1.23 TFLOPs in the Xbox One. We’re still talking about over 5x the peak theoretical shader performance of the Xbox 360, likely even more given increases in efficiency thanks to AMD’s scalar GCN architecture (MS quotes up to 8x better GPU performance) - but there’s no escaping the fact that Microsoft has given the Xbox One less GPU hardware than Sony gave the PlayStation 4. Note that unlike the Xbox 360 vs. PS3 era, Sony's hardware advantage here won't need any clever developer work to extract - the architectures are near identical, Sony just has more resources available to use.

Remember all of my talk earlier about a slight pivot in strategy? Microsoft seems to believe that throwing as much power as possible at the next Xbox wasn’t the key to success and its silicon choices reflect that.

Introduction Memory Subsystem
Comments Locked

245 Comments

View All Comments

  • JDG1980 - Wednesday, May 22, 2013 - link

    In terms of single-threaded performance *per clock*, Thuban > Piledriver. Sure, if you crank up the clock rate *and the heat and power consumption* on Piledriver, you can barely edge out Deneb and Thuban on single-threaded benchmarks. But if you clock them the same, the Thuban uses less power, generates less heat, and performs better. Tom's Hardware once ran a similar test with Netburst vs Pentium M, and his conclusion was quite blunt: the test called into question the P4's "right to exist". The same is true of the Bulldozer/Piledriver line.
    And I don't buy the argument that K10 is too old to be fixable. Remember that Ivy Bridge and Haswell are part of a line stretching all the way back to the original Pentium Pro. The one time Intel tried a clean break with the past (Netburst) it was an utter fail. The same is true of AMD's excavation equipment line and for the same reason - IPC is terrible so the only way to get acceptable performance is to crank up clock rate, power, noise, and thermals.
  • silverblue - Wednesday, May 22, 2013 - link

    It's true that K10 is generally more effective per clock, but look at it this way - AMD believed that the third AGU was unnecessary as it was barely used, much like when VLIW4 took over from VLIW5 as the average slot utilisation within a streaming processor was 3.4 at any given time. Put simply, they made trade-offs where it made sense to make them. Additionally, K10 was most likely hampered by its 3-issue front end, but it also lacked a whole load of ISAs - SSE4.1 and 4.2 are good examples.

    Thuban compares well with the FX-8150 in most cases and favourably so when we're considering lighter workloads. The work done to rectify some of Bulldozer's ills shows that Piledriver is not only about 7% faster per clock, but can clock higher within the same power envelope. AMD was obviously aiming for more performance within a given TDP. The FX-83xx series is out of reach of Thuban in terms of performance.

    The 6300 compares with the 1100T BE as such:

    http://www.cpu-world.com/Compare/316/AMD_FX-Series...

    Oddly, one of your arguments for having a Thuban in the first place was power consumption. The very reason a Thuban isn't clocked as high as the top X4s is to keep power consumption in check. Those six cores perform very admirably against even a 2600K in some circumstances, and generally with Bulldozer and Piledriver you'd look to the FX-8xxx CPUs if comparing with Thuban, however I expect the FX-6350 will be just enough to edge the 1100T BE in pretty much any area:

    http://www.cpu-world.com/Compare/321/AMD_FX-Series...

    The two main issues with the current "excavation equipment line" as you put it is a lack of single threaded power, plus the inherent inability to switch between threads more than once per clock - clocking Bulldozer high may offset the latter in some way but at the expense of power usage. The very idea that Steamroller fixes the latter with some work done to help the former, and that Excavator improves IPC whilst (supposedly) significantly reducing power consumption should be evidence enough that whilst it started off bad, AMD truly believes it will get better. In any case, how much juice does anybody expect eight cores to use at 4GHz with a shedload of cache? Does anybody remember how hungry Nehalem was, let along P4?

    I doubt that Jaguar could come anywhere near even a downclocked A10-4600M. The latter has a high-speed dual channel architecture and a 4-issue front end; to be perfectly honest, I think that even with its faults, it would easily beat Jaguar at the same clock speed.

    Tacking bits onto K10 is a lost cause. AMD doesn't have the money, and even if it did, Bulldozer isn't actually a bad idea. Give them a chance - how much faster was Phenom II over the original Phenom once AMD worked on the problem for a year?
  • Shadowmaster625 - Wednesday, May 22, 2013 - link

    Yeah but AMD would not have stood still with K10. Look at how much faster Regor is compared to the previous athlon:

    http://www.anandtech.com/bench/Product/121?vs=27

    The previous athlon had a higher clock speed and the same amount of cache, but regor crushes it by almost 30% in Far Cry 2. It is 10% faster across the board despite being lower clocked and consuming far less power. Had they continued with Thuban it is possible they would have continued to squeeze 10% per year out of it as well as reduce power consumption by 15%, which if you do the math that leaves us with something relatively competitive today. Not to mention they would have saved a LOT of money. They could have easily added AVX or any other extensions to it.
  • Hubb1e - Wednesday, May 22, 2013 - link

    Per clock Thuban > Piledriver, but power consumption favors Piledriver. Compare two chips of similar performance. The PhII 965 is a 125W CPU and the FX4300 is a 95W CPU and they perform similarly with the FX4300 actually beating the PhII by a small margin.
  • kyuu - Wednesday, May 22, 2013 - link

    ... Lol? You can't simply clock a low-power architecture up to 4GHz. Even if you could, a 4GHz Jaguar-based CPU would still be slower than a 4GHz Piledriver-based one.

    Jaguar is a low-power architecture. It's not able (or meant to) compete with full-power CPUs in raw processing power. It's being used in the Xbox One and PS4 for two reasons: power efficiency, and cost. It's not because of its processing power (although it's still a big step up from the CPUs in the 360/PS3).
  • plcn - Wednesday, May 22, 2013 - link

    BD/PD have plenty of viability in big power envelope, big/liquid cooler, desktop PC arrangements. consoles aspire to be much quieter, cooler, energy efficient - thus the sensible jaguar selection. even the best ITX gaming builds out there are still quite massive and relatively unsightly vs what seems achievable with jaguar... now for laptops on the other hand, a dual jaguar 'netbook' could be very very interesting. you can probably cook your eggs on it, too, but still interesting..
  • lmcd - Wednesday, May 22, 2013 - link

    It isn't a step in the right direction in IPC. Piledriver 40% faster than Jaguar at the same clocks and also clocks higher.

    Stop spreading the FUD about Piledriver -- my A8-4500m is a very solid processor with very strong graphics performance and excellent CPU performance for all but the most taxing tasks.
  • lightsout565 - Wednesday, May 22, 2013 - link

    Pardon my ignorance, What is the "Embedded Memory" used for?
  • tipoo - Wednesday, May 22, 2013 - link

    It's a fast memory pool for the GPU. It could help by holding the framebuffer or caching textures etc.
  • BSMonitor - Wednesday, May 22, 2013 - link

    Embedded memory latency is MUCH closer to L1/L2 cache latency than system memory. System memory is Brian and Stewie taking the airline to Vegas vs the Teleporter to Vegas that would be cache/embedded memory...

Log in

Don't have an account? Sign up now