Performance - An Update

The Chipworks PS4 teardown last week told us a lot about what’s happened between the Xbox One and PlayStation 4 in terms of hardware. It turns out that Microsoft’s silicon budget was actually a little more than Sony’s, at least for the main APU. The Xbox One APU is a 363mm^2 die, compared to 348mm^2 for the PS4’s APU. Both use a similar 8-core Jaguar CPU (2 x quad-core islands), but they feature different implementations of AMD’s Graphics Core Next GPUs. Microsoft elected to implement 12 compute units, two geometry engines and 16 ROPs, while Sony went for 18 CUs, two geometry engines and 32 ROPs. How did Sony manage to fit in more compute and ROP partitions into a smaller die area? By not including any eSRAM on-die.

While both APUs implement a 256-bit wide memory interface, Sony chose to use GDDR5 memory running at a 5.5GHz data rate. Microsoft stuck to more conventionally available DDR3 memory running at less than half the speed (2133MHz data rate). In order to make up for the bandwidth deficit, Microsoft included 32MB of eSRAM on its APU in order to alleviate some of the GPU bandwidth needs. The eSRAM is accessible in 8MB chunks, with a total of 204GB/s of bandwidth offered (102GB/s in each direction) to the memory. The eSRAM is designed for GPU access only, CPU access requires a copy to main memory.

Unlike Intel’s Crystalwell, the eSRAM isn’t a cache - instead it’s mapped to a specific address range in memory. And unlike the embedded DRAM in the Xbox 360, the eSRAM in the One can hold more than just a render target or Z-buffer. Virtually any type of GPU accessible surface/buffer type can now be stored in eSRAM (e.g. z-buffer, G-buffer, stencil buffers, shadow buffer, etc…). Developers could also choose to store things like important textures in this eSRAM as well, there’s nothing that states it needs to be one of these buffers just anything the developer finds important. It’s also possible for a single surface to be split between main memory and eSRAM.

Obviously sticking important buffers and other frequently used data here can definitely reduce demands on the memory interface, which should help Microsoft get by with only having ~68GB/s of system memory bandwidth. Microsoft has claimed publicly that actual bandwidth to the eSRAM is somewhere in the 140 - 150GB/s range, which is likely equal to the effective memory bandwidth (after overhead/efficiency losses) to the PS4’s GDDR5 memory interface. The difference being that you only get that bandwidth to your most frequently used data on the Xbox One. It’s still not clear to me what effective memory bandwidth looks like on the Xbox One, I suspect it’s still a bit lower than on the PS4, but after talking with Ryan Smith (AT’s Senior GPU Editor) I’m now wondering if memory bandwidth isn’t really the issue here.

Microsoft Xbox One vs. Sony PlayStation 4 Spec comparison
  Xbox 360 Xbox One PlayStation 4
CPU Cores/Threads 3/6 8/8 8/8
CPU Frequency 3.2GHz 1.75GHz 1.6GHz
CPU µArch IBM PowerPC AMD Jaguar AMD Jaguar
Shared L2 Cache 1MB 2 x 2MB 2 x 2MB
GPU Cores   768 1152
GCN Geometry Engines   2 2
GCN ROPs   16 32
GPU Frequency   853MHz 800MHz
Peak Shader Throughput 0.24 TFLOPS 1.31 TFLOPS 1.84 TFLOPS
Embedded Memory 10MB eDRAM 32MB eSRAM -
Embedded Memory Bandwidth 32GB/s 102GB/s bi-directional (204GB/s total) -
System Memory 512MB 1400MHz GDDR3 8GB 2133MHz DDR3 8GB 5500MHz GDDR5
System Memory Bus 128-bits 256-bits 256-bits
System Memory Bandwidth 22.4 GB/s 68.3 GB/s 176.0 GB/s
Manufacturing Process   28nm 28nm

In order to accommodate the eSRAM on die Microsoft not only had to move to a 12 CU GPU configuration, but it’s also only down to 16 ROPs (half of that of the PS4). The ROPs (render outputs/raster operations pipes) are responsible for final pixel output, and at the resolutions these consoles are targeting having 16 ROPs definitely puts the Xbox One as the odd man out in comparison to PC GPUs. Typically AMD’s GPU targeting 1080p come with 32 ROPs, which is where the PS4 is, but the Xbox One ships with half that. The difference in raw shader performance (12 CUs vs 18 CUs) can definitely creep up in games that run more complex lighting routines and other long shader programs on each pixel, but all of the more recent reports of resolution differences between Xbox One and PS4 games at launch are likely the result of being ROP bound on the One. This is probably why Microsoft claimed it saw a bigger increase in realized performance from increasing the GPU clock from 800MHz to 853MHz vs. adding two extra CUs. The ROPs operate at GPU clock, so an increase in GPU clock in a ROP bound scenario would increase performance more than adding more compute hardware.

The PS4's APU - Courtesy Chipworks

Microsoft’s admission that the Xbox One dev kits have 14 CUs does make me wonder what the Xbox One die looks like. Chipworks found that the PS4’s APU actually features 20 CUs, despite only exposing 18 to game developers. I suspect those last two are there for defect mitigation/to increase effective yields in the case of bad CUs, I wonder if the same isn’t true for the Xbox One.

At the end of the day Microsoft appears to have ended up with its GPU configuration not for silicon cost reasons, but for platform power/cost and component availability reasons. Sourcing DDR3 is much easier than sourcing high density GDDR5. Sony managed to obviously launch with a ton of GDDR5 just fine, but I can definitely understand why Microsoft would be hesitant to go down that route in the planning stages of Xbox One. To put some numbers in perspective, Sony has shipped 1 million PS4s thus far. That's 16 million GDDR5 chips, or 7.6 Petabytes of RAM. Had both Sony and Microsot tried to do this, I do wonder if GDDR5 supply would've become a problem. That's a ton of RAM in a very short period of time. The only other major consumer of GDDR5 are video cards, and the number of cards sold in the last couple of months that would ever use that RAM is a narrow list. 

Microsoft will obviously have an easier time scaling its platform down over the years (eSRAM should shrink nicely at smaller geometry processes), but that’s not a concern to the end user unless Microsoft chooses to aggressively pass along cost savings.

Introduction, Hardware, Controller & OS Image Quality - Xbox 360 vs. Xbox One
Comments Locked

286 Comments

View All Comments

  • psychobriggsy - Wednesday, November 20, 2013 - link

    Shame that it can only use that ESRAM bandwidth on a total of 1/256th of the system's memory... so you need to account for that in your sums. I.e., it's useless for most things except small data areas that are accessed a lot (framebuffer, z-buffer, etc).
  • smartypnt4 - Wednesday, November 20, 2013 - link

    Except you just said it... You store what's used the most, and you get to realize a huge benefit from it. It's the same theory as a cache, but it gives programmers finer control over what gets stored there. Giving the developers the ability to choose what they want to put in the super low-latency, high bandwidth eSRAM is really a good idea too.

    Computer architecture is mainly about making the common case fast, or in other words, making the things that are done the most the fastest operations in the system. In this case, accessing the z-buffer, etc. is done constantly, making it a good candidate for optimization via placing it in a lower latency, higher bandwidth storage space.
  • cupholder - Thursday, November 21, 2013 - link

    LOL. No. The majority of things that actually affect quality and frame rate are going to be larger in size than the ESRAM. 192 ENTIRE 8GB vs. 204 for a dinky amount of that... It's painfully obvious what the bottlenecks will be. Oh... Forgot the whole PS4 running a 7850 compared to the XB1's 7770.. Oh, and the 8GB ram vs. 5 true GB of ram(3 OSs take up 3GB).

    With that said, get the console that your friends will play, or has the games you want... Anyone pretending the XB1 is better in raw power is deluding themselves(it's hardly even close).
  • smartypnt4 - Friday, November 22, 2013 - link

    I'm simply describing how the eSRAM should work, given that this should be a traditional PC architecture. Nowhere did I comment on which is the more powerful console. I really don't feel I'm qualified in saying which is faster, but the GPU seems to indicate it's the PS4, as you rightly said.

    Now, it is true that the PS4 has larger bandwidth to main memory. My point was that if the eSRAM has a good hit rate, let's say 80%, you'll see an effective speed of 0.8*204 = 163GB/s. This is a horrible measure, as it's just theoretically what you'll see, not accounting for overhead.

    The other difference is that GDDR5's timings make it higher latency than traditional DDR3, and it will be an order of magnitude higher in latency than the eSRAM in the XB1. Now, that's not to say that it will make a big difference in games because memory access latency can be hidden by computing something else while you wait, but still. My point being that the XB1 likely won't be memory bandwidth bound. That was literally my only point. ROP/memory capacity/shader bound is a whole other topic that I'm not going to touch with a 10-foot pole without more results from actual games.

    But yes, buy the console your friends play, or buy the one with the exclusives you want.
  • rarson - Saturday, November 23, 2013 - link

    It's not even close to a traditional PC architecture. I mean, it totally is, if you completely ignore the eSRAM and custom silicon on the die.

    Test after test after test after test has shown that latency makes practically zero impact on performance, and that the increased speed and bandwidth of GDDR5 is much more important, at least when it comes to graphics (just compare any graphics card that has a DDR3 and GDDR5 variant). Latency isn't that much greater for GDDR5, anway.

    The eSRAM is only accessible via the GPU, so anything in it that the CPU needs has to be copied to DDR anyway. Further, in order to even use the eSRAM, you still have to put the data in there, which means it's coming from that slow-ass DDR3. The only way you'll get eSRAM bandwidth 80% of the time is if 80% of your RAM access is a static 32 MB of data. Obviously that's not going to be the majority of your graphics data, so you're not going to get anywhere near 80%.

    The most important part here is that in order for anyone to actually use the eSRAM effectively, they're going to have to do the work. Sony's machine is probably going to be more developer-friendly because of this. I can see how the eSRAM could help, but I don't see how it could possibly alleviate the DDR3 bottleneck. All of this is probably a moot point anyway, since the eSRAM seems to be tailored more towards all the multimedia processing stuff (the custom bits on the SoC) and has to be carefully optimized for developers to even use it anyway (nobody is going to bother to do this on cross-platform games).
  • 4thetimebeen - Saturday, November 23, 2013 - link

    I'm sorry to burst your bubble and I'm sorry to butt in but you are wrong about the eSRAM only available to the GPU cause if you look and read the digital foundry interview of the Microsoft Xbox One architectures and creators and the hot chips diagram IT SHOWS AND THEY SAID that the CPU has access to the eSRAM as well.
  • smartypnt4 - Monday, November 25, 2013 - link

    Yes, latency has very little impact on graphics workloads due to the ability to hide the latency by doing other work. Which is exactly what I said in my comment, so I'm confused as to why you're bringing it up...

    As far as the CPU getting access, I was under the impression that the XB1 and PS4 both have unified memory access, so the GPU and CPU share memory. If that's the case, then yes, the CPU does get access to the eSRAM.

    As far as the hit rate on that eSRAM, if the developer optimizes properly, then they should be able to get significant benefits from it. Cross platform games, as you rightly said, likely won't get optimized to use the eSRAM has effectively, so they won't realize much of a benefit.

    And yes, you do incur a set of misses in the eSRAM corresponding to first accesses. That's assuming the XB1's prefetcher doesn't request the data from memory before you need it.

    A nontrivial number of accesses from a GPU are indeed static. Things like the frame buffer and z-buffer are needed by every separate rendering thread, and hence may well be useful. 32MB is also a nontrivial amount when it comes to caching textures as well. Especially if the XB1 compresses the textures in memory and decodes them on the fly. If I recall correctly, that's actually how most textures are stored by GPUs anyway (compressed and then uncompressed on the fly as they're needed). I'm not saying that's definitely the case, because that's not how every GPU works, but still. 32MB is enough for the frame buffers at a minimum, so maybe that will help more than you think; maybe it will help far less than I think. It's incredibly difficult to tell how it will perform given that we know basically nothing about it.

    To actually say if eSRAM sucks, we need to know how often you can hit in the eSRAM. To know that, we need to know lots of things we have no clue about: prefetcher performance, how the game is optimized to make use of the eSRAM, etc.

    In general though, I do agree that the PS4 has more raw GPU horsepower and more raw memory bandwidth exposed to naive developers. My only point that I made was that the XB1 likely won't be that far off in memory bandwidth compared to the PS4 in games that properly optimize for the platform.

    There's a whole other thing about CPUs being very latency sensitive, etc., that I won't go into because I don't know nearly enough about it, but I think there's going to be a gap in CPU performance as well because things that are optimized to work on the XB1's CPU aren't going to perform the same on the PS4's, especially if they're using the CPU to decompress textures (which is something the 360 did).

    And with that, I reiterate: buy the console your friends buy or the one with the exclusives you want to play. Or if you're really into the Kinect or something.
  • Andromeduck - Wednesday, November 27, 2013 - link

    163 GB/s and hogging the main memory bandwidth - that data doesn't jut magically appear
  • smartypnt4 - Wednesday, November 20, 2013 - link

    Also, not saying the guy above you isn't an idiot for adding the two together. The effective rate Anand quotes takes into account approximately how often you go to the eSRAM vs. going all the way out to main memory. The dude above you doesn't get it.
  • bill5 - Wednesday, November 20, 2013 - link

    yes i do get it, dork.

    small caches of high speed memory are the norm in console design. ps2, gamecube, wii, x360, wii u, on and on.

    the gpu can read from both pools at once so technically they can be added. even if it's not exactly the same thing.

    peak bw, xone definitely has an advantage on ps4, especially on a per-flop basis due to feeding a weaker gpu to begin with.

Log in

Don't have an account? Sign up now