Performance - An Update

The Chipworks PS4 teardown last week told us a lot about what’s happened between the Xbox One and PlayStation 4 in terms of hardware. It turns out that Microsoft’s silicon budget was actually a little more than Sony’s, at least for the main APU. The Xbox One APU is a 363mm^2 die, compared to 348mm^2 for the PS4’s APU. Both use a similar 8-core Jaguar CPU (2 x quad-core islands), but they feature different implementations of AMD’s Graphics Core Next GPUs. Microsoft elected to implement 12 compute units, two geometry engines and 16 ROPs, while Sony went for 18 CUs, two geometry engines and 32 ROPs. How did Sony manage to fit in more compute and ROP partitions into a smaller die area? By not including any eSRAM on-die.

While both APUs implement a 256-bit wide memory interface, Sony chose to use GDDR5 memory running at a 5.5GHz data rate. Microsoft stuck to more conventionally available DDR3 memory running at less than half the speed (2133MHz data rate). In order to make up for the bandwidth deficit, Microsoft included 32MB of eSRAM on its APU in order to alleviate some of the GPU bandwidth needs. The eSRAM is accessible in 8MB chunks, with a total of 204GB/s of bandwidth offered (102GB/s in each direction) to the memory. The eSRAM is designed for GPU access only, CPU access requires a copy to main memory.

Unlike Intel’s Crystalwell, the eSRAM isn’t a cache - instead it’s mapped to a specific address range in memory. And unlike the embedded DRAM in the Xbox 360, the eSRAM in the One can hold more than just a render target or Z-buffer. Virtually any type of GPU accessible surface/buffer type can now be stored in eSRAM (e.g. z-buffer, G-buffer, stencil buffers, shadow buffer, etc…). Developers could also choose to store things like important textures in this eSRAM as well, there’s nothing that states it needs to be one of these buffers just anything the developer finds important. It’s also possible for a single surface to be split between main memory and eSRAM.

Obviously sticking important buffers and other frequently used data here can definitely reduce demands on the memory interface, which should help Microsoft get by with only having ~68GB/s of system memory bandwidth. Microsoft has claimed publicly that actual bandwidth to the eSRAM is somewhere in the 140 - 150GB/s range, which is likely equal to the effective memory bandwidth (after overhead/efficiency losses) to the PS4’s GDDR5 memory interface. The difference being that you only get that bandwidth to your most frequently used data on the Xbox One. It’s still not clear to me what effective memory bandwidth looks like on the Xbox One, I suspect it’s still a bit lower than on the PS4, but after talking with Ryan Smith (AT’s Senior GPU Editor) I’m now wondering if memory bandwidth isn’t really the issue here.

Microsoft Xbox One vs. Sony PlayStation 4 Spec comparison
  Xbox 360 Xbox One PlayStation 4
CPU Cores/Threads 3/6 8/8 8/8
CPU Frequency 3.2GHz 1.75GHz 1.6GHz
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB
GPU Cores   768 1152
GCN Geometry Engines   2 2
GCN ROPs   16 32
GPU Frequency   853MHz 800MHz
Peak Shader Throughput 0.24 TFLOPS 1.31 TFLOPS 1.84 TFLOPS
Embedded Memory 10MB eDRAM 32MB eSRAM -
Embedded Memory Bandwidth 32GB/s 102GB/s bi-directional (204GB/s total) -
System Memory 512MB 1400MHz GDDR3 8GB 2133MHz DDR3 8GB 5500MHz GDDR5
System Memory Bus 128-bits 256-bits 256-bits
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 176.0 GB/s
Manufacturing Process   28nm 28nm

In order to accommodate the eSRAM on die Microsoft not only had to move to a 12 CU GPU configuration, but it’s also only down to 16 ROPs (half of that of the PS4). The ROPs (render outputs/raster operations pipes) are responsible for final pixel output, and at the resolutions these consoles are targeting having 16 ROPs definitely puts the Xbox One as the odd man out in comparison to PC GPUs. Typically AMD’s GPU targeting 1080p come with 32 ROPs, which is where the PS4 is, but the Xbox One ships with half that. The difference in raw shader performance (12 CUs vs 18 CUs) can definitely creep up in games that run more complex lighting routines and other long shader programs on each pixel, but all of the more recent reports of resolution differences between Xbox One and PS4 games at launch are likely the result of being ROP bound on the One. This is probably why Microsoft claimed it saw a bigger increase in realized performance from increasing the GPU clock from 800MHz to 853MHz vs. adding two extra CUs. The ROPs operate at GPU clock, so an increase in GPU clock in a ROP bound scenario would increase performance more than adding more compute hardware.

The PS4's APU - Courtesy Chipworks

Microsoft’s admission that the Xbox One dev kits have 14 CUs does make me wonder what the Xbox One die looks like. Chipworks found that the PS4’s APU actually features 20 CUs, despite only exposing 18 to game developers. I suspect those last two are there for defect mitigation/to increase effective yields in the case of bad CUs, I wonder if the same isn’t true for the Xbox One.

At the end of the day Microsoft appears to have ended up with its GPU configuration not for silicon cost reasons, but for platform power/cost and component availability reasons. Sourcing DDR3 is much easier than sourcing high density GDDR5. Sony managed to obviously launch with a ton of GDDR5 just fine, but I can definitely understand why Microsoft would be hesitant to go down that route in the planning stages of Xbox One. To put some numbers in perspective, Sony has shipped 1 million PS4s thus far. That's 16 million GDDR5 chips, or 7.6 Petabytes of RAM. Had both Sony and Microsot tried to do this, I do wonder if GDDR5 supply would've become a problem. That's a ton of RAM in a very short period of time. The only other major consumer of GDDR5 are video cards, and the number of cards sold in the last couple of months that would ever use that RAM is a narrow list. 

Microsoft will obviously have an easier time scaling its platform down over the years (eSRAM should shrink nicely at smaller geometry processes), but that’s not a concern to the end user unless Microsoft chooses to aggressively pass along cost savings.

Introduction, Hardware, Controller & OS Image Quality - Xbox 360 vs. Xbox One
Comments Locked

286 Comments

View All Comments

  • kyuu - Wednesday, November 20, 2013 - link

    I don't care. Why should I? The only thing that goes on in my living room is playing games and watching TV. So even in the unlikely event that the Kinect camera is feeding somebody (NSA? Microsoft interns? Who exactly am I supposed to be afraid of again?) a 24/7 feed of my living room and somebody is actually looking at it, big whoop.

    I'm not planning on purchasing either console, btw. Just irritated by the tin-foil hat brigade pretending it's reasonable to be scared by the Kinect.
  • kyuu - Wednesday, November 20, 2013 - link

    Oh, and not to mention that if that is actually taking place, it'll be found out pretty quickly and there'll be a huge backlash against Microsoft. The huge potential for negative press and lost sales for absolutely no gain makes me pretty sure it's not going on, though.
  • prophet001 - Thursday, November 21, 2013 - link

    How sad.

    Microsoft, Google, Sony, and any other corporation out there has absolutely zero right to my privacy. Whether I am or am not doing anything "wrong." You my friend will not know what you've lost until it is truly gone.
  • mikato - Monday, November 25, 2013 - link

    I don't think it will be a problem (see kyuu), but I really disagree with your "nothing to hide" attitude.
    http://en.wikipedia.org/wiki/Nothing_to_hide_argum...
  • Floew - Wednesday, November 20, 2013 - link

    I recently build a Steam box. With a 360 controller/wireless adapter and Steam Big Picture set to launch on startup, it's a surprisingly console-like experience. Works much better than I had expected, frankly. My motivation to plunk down cash for the new consoles is now very low.
  • Quidam67 - Wednesday, November 20, 2013 - link

    Anand, just wondering if the Xbox One controller works with a Windows based PC (as per the 360 controller)? Would be great if you could try that out and let us know :)
  • The Von Matrices - Wednesday, November 20, 2013 - link

    The wireless XBOX 360 controller required a special USB receiver to work with a PC, and that took a few years to be released. I don't know if XBOX One controllers are compatible with the 360 wireless controller receiver or if a new one is required. I actually liked the wired XBOX 360 controller for certain PC games, and I'm curious to know if Microsoft will make wired XBOX One controllers.
  • Quidam67 - Sunday, November 24, 2013 - link

    Targetted to work with PC in 2014 apparently http://www.polygon.com/2013/8/12/4615454/xbox-one-...
  • errorr - Wednesday, November 20, 2013 - link

    There is a lot of discussion about the memory bandwidth issues but what I want to know is how latency affects the performance picture. That SRAM latency might be an order of magnitude quicker even if it is small. What workloads are more latency dependant to where the Xbox design might have a performance advantage?
  • khanov - Wednesday, November 20, 2013 - link

    It is important to understand that GPUs work in a fundamentally different way to CPUs. The main difference when it comes to memory access is how they deal with latency.

    CPUs require cache to hide memory access latency. If the required instructions/data are not in cache there is a large latency penalty and the CPU core sits there doing nothing useful for hundreds of clock cycles. For this reason CPU designers pay close attention to cache size and design to ensure that cache hit rates stay north of 99% (on any modern CPU).

    GPUs do it differently. Any modern GPU has many thousands of threads in flight at once (even if it has, for example, only 512 shader cores) . When a memory access is needed, it is queued up and attended to by the memory controller in a timely fashion, but there is still the latency of hundreds of clock cycles to consider. So what the GPU does is switch to a different group of threads and process those other threads while it waits for the memory access to complete.

    In fact, whenever the needed data is not available, the GPU will switch thread groups so that it can continue to do useful work. If you consider that any given frame of a game contains millions of pixels, and that GPU calculations need to be performed for each and every pixel, then you can see how there would almost always be more threads waiting to switch over to. By switching threads instead of waiting and doing nothing, GPUs effectively hide memory latency very well. But they do it in a completely different way to a CPU.

    Because a GPU has many thousands of threads in flight at once, and each thread group is likely at some point to require some data fetched from memory, the memory bandwidth becomes a much more important factor than memory latency. Latency can be hidden by switching thread groups, but bandwidth constraints limit the overall amount of data that can be processed by the GPU per frame.

    This is, in a nutshell, why all modern pc graphics cards at the mid and high end use GDDR5 on a wide bus. Bandwidth is king for a GPU.

    The Xbox One attempts to offset some of its apparent lack of memory bandwidth by storing frequently used buffers in eSRAM. The eSRAM has a fairly high effective bandwidth, but its size is small. It still remains to be seen how effectively it can be used by talented developers. But you should not worry about its latency. Latency is really not important to the GPU.

    I hope this helps you to understand why everyone goes on and on about bandwidth. Sorry if it is a little long-winded.

Log in

Don't have an account? Sign up now