The PCIe Layout

Ask anyone at Apple why they need Ivy Bridge EP vs. a conventional desktop Haswell for the Mac Pro and you’ll get two responses: core count and PCIe lanes. The first one is obvious. Haswell tops out at 4 cores today. Even though each of those cores is faster than what you get with an Ivy Bridge EP, for applications that can spawn more than 4 CPU intensive threads you’re better off taking the IPC/single threaded hit and going with an older architecture that supports more cores. The second point is a connectivity argument.

Here’s what a conventional desktop Haswell platform looks like in terms of PCIe lanes:

You’ve got a total of 16 PCIe 3.0 lanes that branch off the CPU, and then (at most) another 8 PCIe 2.0 lanes hanging off of the Platform Controller Hub (PCH). In a dual-GPU configuration those 16 PCIe 3.0 lanes are typically divided into an 8 + 8 configuration. The 8 remaining lanes are typically more than enough for networking and extra storage controllers.

Ivy Bridge E/EP on the other hand doubles the total number of PCIe lanes compared to Intel’s standard desktop platform:

Here the CPU has a total of 40 PCIe 3.0 lanes. That’s enough for each GPU in a dual-GPU setup to get a full 16 lanes, and to have another 8 left over for high-bandwidth use. The PCH also has another 8 PCIe 2.0 lanes, just like in the conventional desktop case.

I wanted to figure out how these PCIe lanes were used by the Mac Pro, so I set out to map everything out as best as I could without taking apart the system (alas, Apple tends to frown upon that sort of behavior when it comes to review samples). Here’s what I was able to come up with. Let’s start off of the PCH:

Here each Gigabit Ethernet port gets a dedicated PCIe 2.0 x1 lane, the same goes for the 802.11ac controller. All Mac Pros ship with a PCIe x4 SSD, and those four lanes also come off the PCH. That leaves a single PCIe lane unaccounted for in the Mac Pro. Here we really get to see how much of a mess Intel’s workstation chipset lineup is: the C600/X79 PCH doesn’t natively support USB 3.0. That’s right, it’s nearly 2014 and Intel is shipping a flagship platform without USB 3.0 support. The 8th PCIe lane off of the PCH is used by a Fresco Logic USB 3.0 controller. I believe it’s the FL1100, which is a PCIe 2.0 to 4-port USB 3.0 controller. A single PCIe 2.0 lane offers a maximum of 500MB/s of bandwidth in either direction (1GB/s aggregate), which is enough for the real world max transfer rates over USB 3.0. Do keep this limitation in mind if you’re thinking about populating all four USB 3.0 ports with high-speed storage with the intent of building a low-cost Thunderbolt alternative. You’ll be bound by the performance of a single PCIe 2.0 lane.

That takes care of the PCH, now let’s see what happens off of the CPU:

Of the 40 PCIe 3.0 lanes, 32 are already occupied by the two AMD FirePro GPUs. Having a full x16 interface to the GPUs isn’t really necessary for gaming performance, but if you want to treat each GPU as a first class citizen then this is the way to go. That leaves us with 8 PCIe 3.0 lanes left.

The Mac Pro has a total of six Thunderbolt 2 ports, each pair is driven by a single Thunderbolt 2 controller. Each Thunderbolt 2 controller accepts four PCIe 2.0 lanes as an input and delivers that bandwidth to any Thunderbolt devices downstream. If you do the math you’ll see we have a bit of a problem: 3 TB2 controllers x 4 PCIe 2.0 lanes per controller = 12 PCIe 2.0 lanes, but we only have 8 lanes left to allocate in the system.

I assumed there had to be a PCIe switch sharing the 8 PCIe input lanes among the Thunderbolt 2 controllers, but I needed proof. Our Senior GPU Editor, Ryan Smith, did some digging into the Mac Pro’s enumerated PCIe devices and discovered a very familiar vendor id: 10B5, the id used by PLX Technology. PLX is a well known PCIe bridge/switch manufacturer. The part used in the Mac Pro (PEX 8723) is of course not listed on PLX’s website, but it’s pretty close to another one that PLX is presently shipping: the PEX 8724. The 8724 is a 24-lane PCIe 3.0 switch. It can take 4 or 8 PCIe 3.0 lanes as an input and share that bandwidth among up to 16 (20 in the case of a x4 input) downstream PCIe lanes. Normally that would create a bandwidth bottleneck but remember that Thunderbolt 2 is still based on PCIe 2.0. The switch provides roughly 15GB/s of bandwidth to the CPU and 3 x 5GB/s of bandwidth to the Thunderbolt 2 controllers.

Literally any of the 6 Thunderbolt 2 ports on the back of the Mac Pro will give you access to the 8 remaining PCIe 3.0 lanes living off of the CPU. It’s pretty impressive when you think about it, external access to a high-speed interface located on the CPU die itself.

The part I haven’t quite figured out yet is how Apple handles DisplayPort functionality. All six Thunderbolt 2 ports are capable of outputting to a display, which means that there’s either a path from the FirePro to each Thunderbolt 2 controller or the PEX 8723 switch also handles DisplayPort switching. It doesn’t really matter from an end user perspective as you can plug a monitor into any port and have it work, it’s more of me wanting to know how it all works.

Mac Pro vs. Consumer Macs GPU Choices
Comments Locked

267 Comments

View All Comments

  • zepi - Wednesday, January 1, 2014 - link

    How about virtualization and for example VT-d support with multiple gpu's and thunderbolts etc?

    Ie. Running windows in a virtual machine with half a dozen cores + another GPU while using rest for the OSX simultaneously?

    I'd assume some people would benefit of having both OSX and Windows content creation applications and development environments available to them at the same time. Not to mention gaming in a virtual machine with dedicated GPU instead of virtual machine overhead / incompatibility etc.
  • japtor - Wednesday, January 1, 2014 - link

    This is something I've wondered about too, for a while now really. I'm kinda iffy on this stuff, but last I checked (admittedly quite a while back) OS X wouldn't work as the hypervisor and/or didn't have whatever necessary VT-d support. I've heard of people using some other OS as the hypervisor with OS X and Windows VMs, but then I think you'd be stuck with hard resource allocation in that case (without restarting at least). Fine if you're using both all the time but a waste of resources if you predominantly use one vs the other.
  • horuss - Thursday, January 2, 2014 - link

    Anyway, I still would like to see some virtualization benchs. In my case, I can pretty much make it as an ideal home server with external storage while taking advantage of the incredible horse power to run multiple vms for my tests, for development, gaming and everything else!
  • iwod - Wednesday, January 1, 2014 - link

    I have been how likely we get a Mac ( Non Pro ) Spec.
    Nvidia has realize those extra die space wasted for GPGPU wasn't worth it. Afterall their main target are gamers and gaming benchmarks. So they decided for Kepler they have two line, one for GPGPU and one on the mainstream. Unless they change course again I think Maxwell will very likely follow the same route. AMD are little difference since they are betting on their OpenCL Fusion with their APU, therefore GPGPU are critical for them.
    That could means Apple diverge their product line with Nvidia on the non Professional Mac like iMac and Macbook Pro ( Urg.. ) while continue using AMD FirePro on the Mac Pro Line.

    Last time it was rumoured Intel wasn't so interested in getting a Broadwell out for Desktop, the 14nm die shrink of Haswell. Mostly because Mobile / Notebook CPU has over taken Desktop and will continue to do so. It is much more important to cater for the biggest market. Not to mention die shrink nowadays are much more about Power savings then Performance Improvements. So Intel could milk the Desktop and Server Market while continue to lead in Mobile and try to catch up with 14nm Atom SoC.

    If that is true, the rumor of Haswell-Refresh on Desktop could mean Intel is no longer delaying Server Product by a single cycle. They will be doing the same for Desktop as well.

    That means there could be a Mac Pro with Haswell-EP along with Mac with a Haswell-Refresh.
    And by using Nvidia Gfx instead of AMD Apple dont need to worry about Mac eating into Mac Pro Market. And there could be less cost involve with not using a Pro Gfx card, only have 3 TB display, etc.
  • words of peace - Wednesday, January 1, 2014 - link

    I keep thinking that if the MP is a good seller, maybe Apple could enlarge the unit so it contains a four sided heatsink, this could allow for dual CPU.
  • Olivier_G - Wednesday, January 1, 2014 - link

    Hi,

    I don't understand the comment about the lack of HiDPI mode here?

    I would think it's simply the last one down the list, listed as 1920x1080 HiDPI, it does make the screen be perceived as such for apps, yet photos and text render at 4x resolution, which is what we're looking for i believe?

    i tried such mode on my iMac out of curiosity and while 1280x720 is a bit ridiculously small it allowed me to confirm it does work since OSX mavericks. So I do expect the same behaviour to use my 4K monitor correctly with mac pro?

    Am I wrong?
  • Gigaplex - Wednesday, January 1, 2014 - link

    The article clearly states that it worked at 1920 HiDPI but the lack of higher resolutions in HiDPI mode is the problem.
  • Olivier_G - Wednesday, January 1, 2014 - link

    Well no it does not state that at all I read again and he did not mention trying the last option in the selector.
  • LumaForge - Wednesday, January 1, 2014 - link

    Anand,

    Firstly, thank you very much for such a well researched and well thought out piece of analysis - extremely insightful. I've been testing a 6 core and 12 core nMP all week using real-life post-production workflows and your scientific analysis helps explain why I've gotten good and OK results in some situations and not always seen the kinds of real-life improvements I was expecting in others.

    Three follow up questions if I may:

    1) DaVinci Resolve 10.1 ... have you done any benchmarking on Resolve with 4K files? ... like FCP X 10.1, BMD have optimized Resolve 10.1 to take full advantage of split CPU and GPU architecture but I'm not seeing the same performance gains as with FCP x 10.1 .... wondering if you have any ideas on system optimization or the sweet spot? I'm still waiting for my 8 core to arrive and that may be the machine that really takes advantage of the processor speed versus cores trade-off you identify.

    2) Thunderbolt 2 storage options? ... external storage I/O also plays a significant role in overall sustained processing performance especially with 4K workflows ... I posted a short article on Creative Cow SAN section detailing some of my findings (no where as detailed or scientific as your approach I'm afraid) ... be interested to know your recommendations on Tbolt2 storage.

    http://forums.creativecow.net/readpost/197/859961

    3) IP over Tbolt2 as peer-to-peer networking topology? ... as well as running the nMPs in DAS, NAS and SAN modes I've also been testing IP over Tbolt2 .... only been getting around 500 MB/s sustained throughput between two nMPs ... if you look at the AJA diskwhack tests I posted on Creative Cow you'll see that the READ speeds are very choppy ... looks like a read-ahead caching issue somewhere in the pipeline or lack of 'Jumbo Frames' across the network ... have you played with TCP/IP over Thunderbolt2 yet and come to any conclusions on how to optimize throughput?

    Keep up the good work and all the best for 2014.

    Cheers,
    Neil
  • modeleste - Wednesday, January 1, 2014 - link

    I noticed that the Toshiba 65" 4k TV is about the same price as the Sharp 32" The reviews seem nice.

    Does anyone have any ide what the issues would be with using this display?

Log in

Don't have an account? Sign up now