The year was 2004. I was enrolled in ECE 466 at NCSU, a compiler optimization/scheduling class. I remember walking into the lecture hall and seeing far too many PowerBooks and white iBooks. This was the computer engineering department right? It wasn’t much later that I started my month with a Mac experiment. I spent most of my life at that point staying away from Apple hardware. I wanted to give the platform a fair shake so I bought the fastest thing Apple offered back then: a 2GHz PowerMac G5.

More recently, in 2012, I was talking to my friend Lyle who was setting out to build a new gaming PC. Without any coercion on my part, he opted for a mini-ITX build. I’d been on a mini-ITX kick for a while, but motherboard and case vendors kept reiterating as exciting as mini-ITX was, the sales volumes just weren’t there. I was surprised when my gamer friend settled on building a new desktop that was seriously small. He used a BitFenix Prodigy case, a great choice.

The last Mac Pro I reviewed was in 2010. Little had changed externally since the PowerMac G5 I bought years ago. I lamented the chassis’ lack of support for 2.5” drives. A year later I abandoned the Mac Pro entirely for a Sandy Bridge MacBook Pro. I was a late adopter for the notebook as a desktop usage model, but a lack of progress on the Mac Pro drove me away from the design.


From left to right: Apple PowerMac Dual G5, Apple Mac Pro (Mid 2006), Apple Mac Pro (Early 2009), Apple Mac Pro (Late 2013)

Apple tends to be pretty early to form factor revolution, but given the company’s obsession with mobile it’s understandable that the same didn’t hold true for the Mac Pro. When it finally came time to redesign the system, I’m reminded of the same realization Lyle came to when building his most recent desktop: why does a modern desktop need to be big?

The answer is, for a lot of users, that it really doesn’t. Notebooks already outsell desktops by a healthy margin, and there’s no room for expansion inside a notebook. You may be able to swap out a drive or fiddle around with some sticks of DRAM, but no one is adding discrete cards (at least internally) to a notebook.

The situation for Mac desktops is even more cut and dry. With the exception of the occasional aftermarket Mac video card and the more adventurous users who are fine with modifying/flashing PC video cards to work on a Mac, I suspect there’s little GPU upgrading going on in the Mac desktop market. That leaves other PCIe devices that get cut out if you go to a design with less internal flexibility. In the spirit of the of the MasterCard commercials: for everything else, there’s Thunderbolt.

You can do roughly 1.5GB/s over a single Thunderbolt 2 connector. The protocol passes unmodified PCIe and it’s a technology that Apple has strongly backed since its introduction. Other than a GPU, virtually anything you’d want to connect over PCIe you can do externally via Thunderbolt 2.

I think you can adequately make the argument for a smaller form factor Mac Pro desktop, after all, that’s where the market is headed. I remember coveting (and eventually owning) Super Micro’s SC830 chassis for my personal machine years ago. I wanted a huge desktop. Regardless of whether I’m talking about a Mac or PC today, I no longer want something massive.

The argument for even building a high-end desktop is easy to make. It all boils down to TDP. Regardless of what device you’re building for, assuming you have competent architects, you’re limited by power. The bigger the device, the greater your ability to remove heat and the more performance you can unlock. I’m surprised by how much performance you can cram into a 15-inch MacBook Pro, but there’s still room for more - particularly if you care about CPU and GPU performance.

Given how power limited everything else, it’s no surprise that Apple focuses so heavily on the new Mac Pro’s thermal core. It’s single, unified heatsink that is directly responsible for cooling the three major processors in the new Mac Pro: CPU and two GPUs. The thermal core is in the shape of a triangular prism, with each lateral surface attaching directly to one of the three processors. The shared heatsink makes a lot of sense once you consider how Apple handles dividing compute/display workloads among all three processors (more on this later).

A single fan at the top of the Mac Pro’s cylindrical chassis pulls in cool air from the bottom of the machine and exhausts it, quietly, out the top.

Ultimately it’s the thermal core that the new Mac Pro is designed around. It’s the most area efficient dual-GPU setup I’ve ever seen. There’s little functional benefit to having a desktop chassis that small, but you could say the same about Apple’s recent iMac redesign that focused on making a thinner all-in-one. If the desktop market is to not just stick around but grow as well, it needs to evolve - and that also includes design.


Mac Pro Thermal Core - iFixit

The new Mac Pro is a dramatic departure from its predecessors. The chassis is still all aluminum (with the exception of a plastic cover over the fan) but it features a dark anodized finish vs. the bright silver finish of its predecessors. It’s a glossy finish but the good news is that unlike a mobile device it’s pretty easy to ensure that the system remains looking clean. The surface of the new Mac Pro is also incredibly smooth. There's a heft and quality to the design that is at odds with how small and portable it is. I'm hardly an art critic but I do feel like there's a lot to appreciate about the design and construction of the new Mac Pro. I needed to move the system closer to my power testing rig so it ended up immediately to the left of me. I have to admit that I've been petting it regularly ever since. It's really awesomely smooth. It's actually the first desktop in a very long time that I want very close to me. It feels more like a desk accessory than a computer, which is funny to say given just how much power is contained within this tiny package.

Thanks to its small size (9.9” tall with a 6.6” diameter), the Mac Pro belongs on your desk - not underneath it. The design doesn’t attempt to hide IO, but rather draws careful attention to it. All IO ports are located on the same side of the machine. There’s an integrated sensor that can detect tilt/rotation and illuminates the IO panel on the Mac Pro to help you plug in cables. Admittedly port density is so high back there that I don’t know if illuminating it helps all that much, but it’s a nice effect nonetheless. Otherwise there’s only the power button LED that indicates the system is on.

Internal expansion is more or less out of the question. The Mac Pro remains the easiest Mac to get into. There’s no special screwdriver needed, just a simple latch on the back that unlocks the external housing.

Lift it up and you’re presented with the backs of the CPU and GPU cards. Behind one of the GPUs is the removable PCIe SSD, and flanking the IO panel are four user accessible DDR3 DIMM slots. Push down on the lever marked with an arrow and you’ll release the angled DIMM slots, giving you access to remove/upgrade memory. The Mac Pro supports up to 64GB of memory, which you’ll want to install in groups of four in order to populate all memory channels stemming from the Ivy Bridge-EP CPU.


Mac Pro custom FirePro D300 GPU board - iFixit

Both GPU boards are custom, so it would appear that Apple has effectively killed the limited 3rd party Mac GPU upgrade market. It’s entirely possible that someone will clone Apple’s GPU card design here, but that seems like a lot of effort for very limited potential sales.

The CPU board is the only one fully obscured from view; it’s behind the IO panel. There’s only a single LGA-2011 CPU socket on that board, capable of supporting Intel’s latest Ivy Bridge-EP Xeon CPUs. Interestingly enough, Apple appears to be using unmodified Xeon processors with their integrated heat spreader attached. Long time readers of our Mac Pro reviews will remember that the Nehalem Mac Pro actually featured Xeons sans IHS, which made aftermarket upgrades a little trickier (and potentially dangerous). What this means is that you should, in theory, be able to upgrade the Mac Pro’s CPU down the road should you want to. It’s definitely not a simple task but at least feasible. Especially as Xeon pricing drops down the road, this may be a good way of extending the lifespan of your Mac Pro.


Mac Pro Main Logic Board - iFixit

All three boards connect to the main logic board (MLB) at the bottom of the mini tower. It’s on the MLB that you’ll find Intel’s C602 PCH (Platform Controller Hub) along with high density connectors (CPU board) and flex cables (GPUs) for all of the daughter boards.

The new Mac Pro still has an internal speaker. There's not much to say about it, it's ok in a pinch if you need audio and don't want to hook up external speakers. I've had one weird issue with the internal speaker: it occasionally produces a high pitched noise, requiring a power cycle to clear. I haven't been able to root cause the problem yet, it seems to happen while the speaker is muted (only to surface once I've unmuted the speaker) and after I've been torturing/benchmarking the machine. I'm not sure if it's tied to plugging/unplugging Thunderbolt 2 devices while it's on or if it's something in software that's triggering it. Either way if you see it on your system, know that you can clear it by a full power cycle (not soft reset).

Pricing and Configurations


Mac Pro (Late 2013) Default Configurations
  4-Core Config 6-Core Config
CPU Intel Xeon E5-1620 v2 Intel Xeon E5-1650 v2
Base CPU Clock 3.7GHz 3.5GHz
Max Turbo 3.9GHz 3.9GHz
Cores / Threads 4 / 8 6 / 12
L3 Cache 10MB 12MB
Memory 12GB ECC DDR3-1866 (3 x 4GB) 16GB ECC DDR3-1866 (4 x 4GB)
SSD 256GB PCIe SSD 256GB PCIe SSD
GPU Dual AMD FirePro D300 Dual AMD FirePro D500
GPU Memory 2GB GDDR5 per card 3GB GDDR5 per card
Network Dual Gigabit LAN + 3-stream 802.11ac Dual Gigabit LAN + 3-stream 802.11ac
Thunderbolt 2 6 x Thunderbolt 2 Ports 6 x Thunderbolt 2 Ports
Display Support 2 x 4K/60Hz + 1 x 4K/30Hz or up to 6 x 2560 x 1440 Thunderbolt/DisplayPort Displays 2 x 4K/60Hz + 1 x 4K/30Hz or up to 6 x 2560 x 1440 Thunderbolt/DisplayPort Displays
USB 3.0 4 x USB 3.0 Ports 4 x USB 3.0 Ports
Other IO Optical digital/analog audio out
Headphone jack w/ headset+mic support
Integrated speaker
Optical digital/analog audio out
Headphone jack w/ headset+mic support
Integrated speaker
Dimensions 6.6 x 6.6 x 9.9" (L x W x H) 6.6 x 6.6 x 9.9" (L x W x H)
Weight 11 lbs 11 lbs
Warranty 1 Year Limited 1 Year Limited
Price $2999 $3999

Apple offers two Mac Pro configurations with several upgrade options from the factory. The entry level machine remains a quad-core configuration with 12GB of RAM, while the high end model moves to a 6-core design with 16GB of RAM. Both ship with two GPUs by default, but you can upgrade the pair’s potency.

The Mac Pro’s pricing is point of contention given that the cheapest configuration starts at $2999 and can go all the way up to $9848 before adding in a display. Given the lower volumes we’re talking about here and the fact that Apple continues to only spec workstation hardware for the Mac Pro on the CPU (and somewhat on the GPU side, more on that later) I’m not sure we’ll see the same aggressive price drops that we’ve seen in other Mac segments.

The last time I did a Mac Pro vs. OEM PC comparison, Apple came out quite competitive on pricing although a DIY system wins by a huge margin. The same is true for the new Mac Pro. I poked around Dell, HP and Lenovo websites looking for comparable systems. It seems like Ivy Bridge EP systems are still a bit rare, with Dell not offering any. Both HP and Lenovo offered fairly comparable systems:

Mac Pro vs. HP Z420 vs Lenovo S30 Pricing Comparison
  Entry Level Mac Pro HP Z420 Lenovo ThinkStation S30
CPU Intel Xeon E5-1620 v2 Intel Xeon E5-1620 v2 Intel Xeon E5-1620 v2
Memory 12GB ECC DDR3-1866 (3 x 4GB) 12GB ECC DDR3-1866 (3 x 4GB) 12GB ECC DDR3-1600 (3 x 4GB)
SSD 256GB PCIe SSD 256GB SATA SSD 128GB SATA SSD
GPU Dual AMD FirePro D300 Dual AMD FirePro W7000* Dual NVIDIA Quadro K4000
GPU Memory 2GB GDDR5 per card 4GB GDDR5 per card 3GB GDDR5 per card
Network Dual Gigabit LAN + 3-stream 802.11ac Dual Gigabit LAN Dual Gigabit LAN
Thunderbolt 2 6 x Thunderbolt 2 Ports 1 x Thunderbolt 2 Port -
USB 3.0 4 x USB 3.0 Ports 4 x USB 3.0 Ports + 5 x USB 2.0 Ports 2 x USB 3.0 Ports + 10 x USB 2.0 Ports
Dimensions
(D x W x H)
6.6 x 6.6 x 9.9" 17.5 x 7.0 x 17.63" 19.0 x 6.89 x 18.8"
Weight 11 lbs ? 38.5 lbs
Warranty 3 Year Limited (w/ AppleCare) 3 Year Limited 3 Year Limited
Price $3248 $3695 + $795 for second W7000 GPU $4373

As I learned last time, there are typically some hefty discounts associated with workstation orders so take this pricing with a grain of salt. I also had to fudge the HP numbers a bit as I can only get a single FirePro W7000 in the Z420 configuration - I just doubled the W7000 adder in order to simulate what a theoretical dual GPU version would cost. There are other imbalances between the comparison (HP supports more displays, Apple features more Thunderbolt 2 ports, FirePro W7000 features ECC GDDR5, etc…), but the point here is to see if Apple’s pricing is out of touch with reality. It’s not.

The DIY PC route is still going to be more affordable. If we go the Ivy Bridge E route and opt for a Core i7-4930K, you get more cores than either of the options above for around $600 for the CPU. Adding in another $330 for a motherboard, $180 for 12GB of DDR3-1866 memory, $1400 for two W7000 GPUs and $220 for a fast SATA SSD (Samsung 840 Pro) we’re at $2730 for a configuration that would cost at least $3499 from Apple. That’s excluding case, PSU and OS, but adding another ~$350 takes care of that and still saves you some money. If you opt for Radeon HD 7870s instead of the W7000 you can knock another $1000 off of that total price. All of that being said, I don’t expect there to be a lot of cross shopping between DIY builders and those looking for a Mac Pro.

Setting Expectations: A Preview of What's to Come in Mobile
Comments Locked

267 Comments

View All Comments

  • uhuznaa - Wednesday, January 1, 2014 - link

    For whatever it's worth: I'm supporting a video pro and what I can see in that crowd is that NOBODY cares for internal storage. Really. Internal storage is used for the software and of course the OS and scratch files and nothing else. They all use piles of external drives which are much closer to actual "media" you can carry around and work with in projects with others and archive.

    I fact I tried for a while to convince him of the advantages of big internal HDDs and he wouldn't have any of it. He found the flood of cheap USB drives you can even pick up at the gas station in the middle of the night the best thing to happen and USB3 a gift from heaven. They're all wired this way. Compact external disks that you can slap paper labels on with the name of the project on it and the version of that particular edit and that you can carry around are the best thing since sliced bread for them. And after a short while I had to agree that they're perfectly right with that for what they do.

    Apple is doing this quite right. Lots of bays are good for servers, but this is not a server. It's a workstation and work here means mostly work with lots of data that wants to be kept in nice little packages you can plug in and safely out and take with you or archive in well-labeled shelves somewhere until you find a use for it later on.

    (And on a mostly unrelated note: Premiere Pro may be the "industry standard" but god does this piece of software suck gas giants through nanotubes. It's a nightmarish UI thinly covering a bunch of code held together by chewing gum and duct tape. Apple may have the chance of a snowflake in hell against that with FCP but they absolutely deserve kudos for trying. I don't know if I love Final Cut, but I know I totally hate Premiere.)
  • lwatcdr - Wednesday, January 1, 2014 - link

    "My one hope is that Apple won’t treat the new Mac Pro the same way it did its predecessor. The previous family of systems was updated on a very irregular (for Apple) cadence. "

    This is the real problem. Haswell-EP will ship this year and it used a new socket. The proprietary GPU physical interface will mean those will probably not get updates quickly and they will be expensive. Today the Pro is a very good system but next year it will be falling behind.
  • boli - Wednesday, January 1, 2014 - link

    Hi Anand, cheers for the enjoyable and informative review.

    Regarding your HiDPI issue, I'm wondering if this might be an MST issue? Did you try in SST mode too?

    Just wondering because I was able to add 1920x1080 HiDPI to my 2560x1440 display no problem, by adding a 3840x2160 custom resolution to Switch Res X, which automatically added 1920x1080 HiDPI to the available resolutions (in Switch Res X).
  • mauler1973 - Wednesday, January 1, 2014 - link

    Great review! Now I am wondering if I can replicate this kind of performance in a hackintosh.
  • Technology Never Sleeps - Wednesday, January 1, 2014 - link

    Good article but I would suggest that your editor or proof reader review your article before its posted. It takes away from the professional nature of the article and website with so many grammatical errors.
  • Barklikeadog - Wednesday, January 1, 2014 - link

    Once again, a standard 2009 model wouldn't fair nearly as well here. Even with a Radeon HD 4870 I bet we'd be seeing significantly lower performance.

    Great review Anand, but I think you meant fare in that sentence.
  • name99 - Wednesday, January 1, 2014 - link

    " Instead what you see at the core level is a handful of conservatively selected improvements. Intel requires that any new microarchitectural feature introduced has to increase performance by 2% for every 1% increase in power consumption."

    What you say is true, but not the whole story. It implies that these sorts of small improvements are the only possibility for the future and that's not quite correct.
    In particular branch prediction has become good enough that radically different architectures (like CFP --- Continuous Flow Processing --- become possible). The standard current OoO architecture used by everyone (including IBM for both POWER and z, and the ARM world) grew from a model based on no speculation to some, but imperfect, speculation. So what it does is collect speculated results (via the ROB and RAT) and dribble those out in small doses as it becomes clear that the speculation was valid. This model never goes drastically off the rails, but is very much limited in how many OoO instructions it can process, both at the complete end (size of the ROB, now approaching 200 fused µ-instructions in Haswell) and at the scheduler end (trying to find instructions that can be processed because their inputs are valid, now approaching I think about 60 instructions in Haswell).
    These figures give us a system that can handle most latencies (FP instructions, divisions, reasonably long chains of dependent instructions, L1 latency, L2 latency, maybe even on a good day L3 latency) but NOT memory latency.

    And so we have reached a point where the primary thing slowing us down is data memory latency. This has been a problem for 20+ years, but now it's really the only problem. If you use best of class engineering for your other bits, really the only thing that slows you down is waiting on (data) memory. (Even waiting on instructions should not ever be a problem. It probably still is, but work done in 2012 showed that the main reason instruction prefetching failed was that the prefetched was polluted by mispredicted branches and interrupts. It's fairly easy to filter both of these once you appreciate the issue, at which point your I prefetcher is basically about 99.5% accurate across a wide variety of code. This seems like such an obvious an easy win that I expect it to move into all the main CPUs within 5 yrs or so.)

    OK, so waiting on memory is a problem. How do we fix it?
    The most conservative answer (i.e. requires the fewest major changes) is data pre fetchers, and we've had these growing in sophistication over time. They can now detect array accesses with strides across multiple cache lines, including backwaters, and we have many (at least 16 on Intel) running at the same time. Each year they become smarter about starting earlier, ending earlier, not polluting the cache with unneeded data. But they only speed up regular array accesses.

    Next we have a variety of experimental prefetchers that look for correlations in the OFFSETs of memory accesses; the idea being that you have things like structs or B-tree nodes that are scattered all over memory (linked by linked lists or trees or god knows what), but there is a common pattern of access once you know the base address of the struct. Some of these seem to work OK, with realistic area and power requirements. If a vendor wanted to continue down the conservative path, this is where they would go.

    Next we have a different idea, runahead execution. Here the idea is that when the “real” execution hits a miss to main memory, we switch to a new execution mode where no results will be stored permanently (in memory or in registers); we just run ahead in a kind of fake world, ignoring instructions that depend on the load that has missed. The idea is that, during this period we’ll trigger new loads to main memory (and I-cache misses). When the original miss to memory returns its result, we flush everything and restart at the original load, but now, hopefully, the runahead code started some useful memory accesses so that data is available to us earlier.
    There are many ways to slice this. You can implement it fairly easily using SMT infrastructure if you don’t have a second thread running on the core. You can do crazy things that try to actually preserve some of the results you generate during the runahead phase. Doing this naively you burn a lot of power, but there are some fairly trivial things you can do to substantially reduce the power.
    In the academic world, the claim is that for a Nehalem type of CPU this gives you about a 20% boost at the cost of about 5% increased power.
    In the real world it was implemented (but in a lousy cheap-ass fashion) on the POWER6 where it was underwhelming (it gave you maybe a 2% boost over the existing prefetchers); but their implementation sucked because it only ran 64 instructions during the run ahead periods. The simulations show that you generate about one useful miss to main memory per 300 instructions executed, so maybe two or three during a 400 to 500 cycles load miss to main memory, but 64 is just too short.
    It was also supposed to be implemented in the SUN Rock processor which was cancelled when Oracle bought Sun. Rock tried to be way more ambitious in their version of this scheme AND suffered from a crazy instruction fetch system that had a single fetch unit trying to feed eight threads via round robin (so each thread gets new instructions every eight cycles).
    Both these failures don’t, I think, tell us if this would work well if implemented on, say, an ARM core rather than adding SMT.

    Which gets us to SMT. Seems like a good idea, but in practice it’s been very disappointing, apparently because now you have multiple threads fighting over the same cache. Intel, after trying really hard, can’t get it to give more than about a 25% boost. IBM added 4 SMT threads to POWER7, but while they put a brave face on it, the best the 4 threads give you is about 2x single threaded performance. Which, hey, is better than 1x single threaded performance, but it’s not much better than what they get from their 2 threaded performance (which can do a lot better than Intel given truly massive L3 caches to share between threads).

    But everything so far is just add-ons. CFP looks at the problem completely differently.
    The problem we have is that the ROB is small, so on a load miss it soon fills up completely. You’d want the ROB to be about 2000 entries in size and that’s completely impractical. So why do we need the ROB? To ensure that we write out updated state properly (in small dribs and drabs every cycle) as we learn that our branch prediction was successful.
    But branch prediction these days is crazy accurate, so how about a different idea. Rather than small scale updating successful state every cycle, we do a large scale checkpoint every so often, generally just before a branch that’s difficult to predict. In between these difficult branches, we run out of order with no concern for how we writeback state — and in the rare occasions that we do screw up, we just roll back to the checkpoint. In between difficult branches, we just run on ahead even across misses to memory — kinda like runahead execution, but now really doing the work, and just skipping over instructions that depend on the load, which will get their chance to run (eventually) when the load completes.
    Of course it’s not quite that simple. We need to have a plan for being able to unwind stores. We need a plan for precise interrupts (most obviously for VM). But the basic idea is we trade today’s horrible complexity (ROB and scheduler window) for a new ball of horrible complexity that is not any simpler BUT which handles the biggest current problem, that the system grinds to a halt at misses to memory, far better than the current scheme.

    The problem, of course, is that this is a hell of a risk. It’s not just the sort of minor modification to your existing core where you know the worst that can go wrong; this is a leap into the wild blue yonder on the assumption that your simulations are accurate and that you haven’t forgotten some show-stopping issue.
    I can’t see Intel or IBM being the first to try this. It’s the sort of thing that Apple MIGHT be ambitious enough to try right now, in their current state of so much money and not having been burned by a similar project earlier in their history. What I’d like to see is a university (like a Berkeley/Stanford collaboration) try to implement it and see what the real world issues are. If they can get it to work, I don’t think there’s a realistic chance of a new SPARC or MIPS coming out of it, but they will generate a lot of valuable patents, and their students who worked on the project will be snapped up pretty eagerly by Intel et al.
  • stingerman - Wednesday, January 1, 2014 - link

    I think Intel has another two years left on the Mac. Apple will start phasing it out on the MacBook Air, Mac Mini and iMac. The MacBook rPros and finally the Mac Pro. Discreet x86 architecture is dead ending. Apple's going to move their Macs to SOC that they design. It will contain most of the necessary components and significantly reduce the costs of the desktops and notebooks. The Mac Pro will get it last giving time for the Pro Apps to be ported to Apple's new mobile and desktop 64-bit processors.
  • tahoey - Wednesday, January 1, 2014 - link

    Remarkable work as always. Thank you.
  • DukeN - Thursday, January 2, 2014 - link

    Biased much, Anand?

    Here's the Lenovo S30 I bought a couple of weeks back, and no it wasn't $4000 + like you seem to suggest.

    http://www.cdw.com/shop/products/Lenovo-ThinkStati...

    You picked probably the most overpriced SKU in the bunch just so you can prop up the ripoff that is your typical Apple product.

    Shame.

Log in

Don't have an account? Sign up now