If there was one word to describe the launch of NVIDIA’s Pascal generation products, it’s “expedient.” On the consumer side of the business the company has launched 3 different GeForce cards and announced a fourth (Titan X), while on the HPC side the company has already launched their Tesla P100 accelerator, with the PCIe version due next quarter. With the company moving so quickly it was only a matter of time until a Quadro update was announced, and now today at SIGGRAPH 2016 the company is doing just that.

Being announced today are the two Quadro models that will fill out the high-end of the Quadro family, the P6000 and P5000. As hinted at by the name, these are based on NVIDIA’s latest Pascal generation GPUs., marking the introduction of Pascal to the Quadro family. And like NVIDIA’s consumer counterparts, these new cards should offer significant performance and feature upgrades over their Maxwell 2 based predecessors.

NVIDIA Quadro Specification Comparison
  P6000 P5000 M6000 M5000
CUDA Cores 3840 2560 3072 2048
Texture Units 240? 160 192 128
ROPs 96? 64 96 64
Core Clock ? ? N/A N/A
Boost Clock ~1560MHz ~1730MHz ~1140MHz ~1050MHz
Memory Clock 9Gbps GDDR5X 9Gbps GDDR5X 6.6Gbps GDDR5 6.6Gbps GDDR5
Memory Bus Width 384-bit 256-bit 384-bit 258-bit
VRAM 24GB 16GB 24GB 8GB
FP64 1/32 FP32 1/32 FP32 1/32 FP32 1/32 FP32
TDP 250W 180W 250W 150W
GPU GP102 GP104 GM200 GM204
Architecture Pascal Pascal Maxwell 2 Maxwell 2
Manufacturing Process TSMC 16nm TSMC 16nm TSMC 28nm TSMC 28nm
Launch Date October 2016 October 2016 03/22/2016 08/11/2015
Launch Price (MSRP) TBD TBD $5000 $2000

We will start, as always, at the top, with the Quadro P6000. As NVIDIA’s impending flagship Quadro card, this is based on the just-announced GP102 GPU. The direct successor to the GM200 used in the Quadro M6000, the GP102 mixes a larger number of SMs/CUDA cores and higher clockspeeds to significantly boost performance.

Paired with P6000 is 24GB of GDDR5X memory, running at a conservative 9Gbps, for a total memory bandwidth of 432GB/sec. This is the same amount of memory as in the 24GB M6000 refresh launched this spring, so there’s no capacity boost at the top of NVIDIA’s lineup. But for customers who didn’t jump on the 24GB – which is likely a lot of them, including most 12GB M6000 owners – then this is a doubling (or more) of memory capacity compared to past Quadro cards. At this time the largest capacity GDDR5X memory chips we know of (8Gb), so this is as large of a capacity that P6000 can be built with at this time. Meanwhile this is so far the first and only Pascal card with GDDR5X to support ECC, with NVIDIA implementing an optional soft-ECC method for the DRAM only, just as was the case on M6000.

NVIDIA has also sent over pictures of the card design, and confirmed that the card ships with the Quadro 6000-series standard TDP of 250W. Utilizing the same basic metal shroud and blower design as the M6000 cards, the P6000 should be suitable as drop-in replacement for older M6000 cards. Do note however that like M6000, external power is pulled via a single 8-pin power connector, so technically this card is out of spec (not that this was a problem for M6000).

Unfortunately in their zeal to get this announcement out in time for SIGGRAPH - a frequent venue for Quadro announcements – we don’t have specific performance numbers available. NVIDIA has not locked down the GPU clockspeeds, and as a result we don’t just how P6000s clockspeeds and total throughput will compare to M6000’s. It goes without saying that it should be higher, but how much higher remains to be seen.

For overall expected performance, NVIDIA has published that the P6000 is rated for 12 TFLOPs FP32. Given that it's a fully enabled GP102 we're looking at, this works out to a clockspeed of around 1560MHz. On paper this gives P6000 around 71% more shading performance and 37% more ROP throughput than the older Maxwell 2 M6000. This also puts the P6000 around 9% ahead of the recently announced NVIDIA Titan X.

On a quick technical note, as this announcement comes just 4 days after NVIDIA announced the GP102 GPU used on this card, this Quadro announcement does confirm a few more things about GP102. Quadro P6000 ships with 3840 CUDA cores (30 SMs), confirming our earlier suspicions that GP102 was a (or at least) 30 SM part. Meanwhile this also confirms that GP102 can be outfit with 24GB of GDDR5X. Finally, NVIDIA has confirmed that there’s no high-speed FP64 support on GP102, which is why we’re looking at a 1/32 rate for even the top Quadro card.

M5000

Moving on, let’s talk about Quadro M5000. Based on NVIDIA’s GP104 GPU, this is the smaller, cheaper, lower power sibling to the P6000. This is a fully enabled part with all 2560 CUDA cores (20 SMs) active, so the performance gains versus M5000 should be similar to what we saw with the consumer GeForce GTX 1080. Clockspeeds are also comparable, so we're looking at sizable boost in shading/compute/texture performance of 2.06x, and ROP throughput has increased by 65%. Of the two cards, M5000 is going to the bigger upgrade versus its direct predecessor.

Meanwhile on the memory front, P5000 is equipped with 16GB of GDDR5X memory. This is attached to GP104’s 256-bit memory bus, and like P6000 is clocked at 9Gbps. P5000’s predecessor, M5000, maxed out at just 8GB of memory, so along with a 36% increase in memory bandwidth, this doubles the amount of memory available for a Quadro 5000 tier card.

Looking at the card design itself, to no surprise it strongly resembles the M5000, with its plastic blower dressed up in Quadro livery. The card’s TDP stands at 180W, which is a slight increase over M5000, but shouldn’t too significantly impact the drop-in replacement nature of the design.

Pascal Features & Availability

Along with the significant performance increase afforded by the Pascal architecture and TSMC’s 16nm FinFET manufacturing process, the other big news here is of course the functionality that comes to the Quadro P-series courtesy of Pascal. While for our regular readers there’s nothing new we haven’t seen already with GeForce, Pascal’s new functionality will apply a bit differently to the Quadro lineup.

Perhaps the biggest change here is Pascal’s new display controller. With both the P6000 and P5000 shipping with 4 DisplayPorts, the DisplayPort 1.4 capable controller means that both cards can now support higher resolutions and refresh rates. Whereas the M-series maxed out at 4 4K@60Hz monitors, the P-series can now handle 4 monitors running 5K@60Hz, 4 4K monitors running at 120Hz, or even 8K monitors with additional limitations. Do note however that the per-card monitor limit is still 4 displays, as this is as many displays as Pascal can support.

Speaking of multiple displays, alongside the Quadro card announcements NVIDIA is also announcing a new Quadro Sync card, the aptly named Quadro Sync 2. The multi-adapter/multi-display timing synchronization card is being updated to support the Pascal cards, and will support a larger number of adapters as well. The new Sync 2 will support 8 cards in sync, as opposed to 4 on the original Sync card. Coupled with the 4 display per card capability of Pascal, and this means synchronized video walls and other systems can now be built out to 32 displays.

NVIDIA will also be heavily promoting Simultaneous Multi-Projection (SMP), the company’s multi-viewport technology. Like the consumer cards, VR is a big driver here, with NVIDIA looking reach out to VR developers. NVIDIA is also pitching this at VR CAVE systems, as they can see similar benefits from SMP’s geometry reprojection.

Taking a look at the overall Quadro lineup, the P6000 and P5000 will at least for the time being be sitting alongside the existing M4000 and lower cards. Within the Quadro lineup these cards are meant for the most demanding workloads– massive memory sets and complex rendering/compute tasks – and they will be priced accordingly. Specific pricing has not been announced, but NVIDIA tells us to expect them to be priced similarly to the last generation cards. This would work out to $5000+ for Quadro P6000, and $2000+ for Quadro P5000 at launch.

Finally, as we mentioned before NVIDIA was announcing these cards early, before the final clockspeeds have been locked down. This means that while the cards are being announced today, they won’t launch for another two months; NVIDIA expects them to be available in early October. It’s not unusual for Quadro cards to be announced ahead of time, though as SIGGRAPH is also a popular venue for AMD pro card announcements, the earlier than usual announcement may have been for multiple reasons.

Ecosystem Announcements: New SDKs, Iray VR, & OptiX 4

Along with the announcement of the Quadro P-series, NVIDIA is also using SIGGRAPH to announce updates to various software and ecosystem initiatives within the company. Overall a number of the company’s SDKs are receiving an update in some form, ranging from rendering to video encode and capture, the latter taking advantage of Pascal’s 8K encode/decode capabilities.

Of particular note here, NVIDIA’s Iray physically based render plugin for 3D modeling applications is getting a significant update. As with other parts of their ecosystem, NVIDIA is doubling down on VR here as well. The next update to Iray will include support for generating panoramic VR lightfields – think high detail fixed position 3D panoramas – which can then be displayed on other devices. NVIDIA has been showing off an early version of this technology at GTC 2016 and other events, where it was used to show off renders of the company’s under-construction headquarters.

The Iray update will also be part of a larger focus on integrating the company’s software with their DGX-1 server, which incorporates 8 Tesla P100 accelerators. Iray will be coming to DGX-1 this fall, supporting the same features that are already available in multi-GPU setups with the older Quadro VCA. Longer term, in 2017, the company will be adding NVLink support for better multi-GPU scaling.

NVIDIA’s OptiX ray tracing engine is the other product that’s getting a DGX-1 update. OptiX 4.0, which is being released this week, adds support for the DGX-1, including NVLink support. It is interesting to note though that the company is only supporting clusters of 4 GPUs, despite the fact that DXG-1 has 8 GPUs (the other 4 GPUs form a second cluster). This may mean that OptiX needs direct GPU links to perform best – as in an 8-way configuration, some GPUs are 2 hops away – or it may just be that OptiX naturally doesn’t scale well beyond 4 GPUs.

Finally, NVIDIA is also announcing a change to how mental ray support is handled for Maya. Previous, integrating the ray tracer with Maya was handled by Autodesk, but NVIDIA is currently in the process of taking that over. The goal of doing so is to allow mental ray to be updated and have features added at the more brisk pace that NVIDIA tends to work at. The new plugin is currently scheduled to ship in September, and as one of their first actions, NVIDIA will be integrating a new global illumination engine, GI-Next.

Comments Locked

40 Comments

View All Comments

  • extide - Thursday, August 4, 2016 - link

    No, there are 3840 cores in both GP100, and GP102. In GP100 they allow you to run FP64 work by ganging up two FP32 cores -- it does take extra transistors to enable them to function like this and that is what they ripped out. There are not another 1920 complete cores for FP64 work.

    So, they removed stuff and ended up with a smaller die, they just removed all the stuff that isnt really used by gaming workloads.
  • Tigran - Tuesday, July 26, 2016 - link

    Why GP102 with it's smaller die size and less transistors (see other sources, and it's also obvious from it's number) has more CUDA cores than GP100?
  • DanNeely - Tuesday, July 26, 2016 - link

    It probably has the same number; but being a smaller die GP102 managed to (barely) have yields high enough to allow a product with a 100% active die instead of one with a few sections disabled. WIth super computing customers buying the cards by the shipping container full the Tesla probably also needed to be available in a much larger volume than the top of the line Quadro.
  • Dobson123 - Tuesday, July 26, 2016 - link

    They have the same number (3840) in hardware, GP100 currently uses a partially deactivated GP100, just like the new Titan X uses a partially deactivated GP102. But GP100 has a different, more HPC focused architecture with 64 CUDA cores per SM, larger register files, NVLink and so on.
  • Tigran - Tuesday, July 26, 2016 - link

    So it's not because of TMU&ROP non-existence in Tesla P100? CUDA cores and TMU&ROPs are different and independent from each other calculating blocks, aren't they?
  • Tigran - Tuesday, July 26, 2016 - link

    Quick answer for my stupid question: "each SM has 64 CUDA cores and four texture units" (Nvidia ©). And I guess ROP are outside SM.
  • DanNeely - Tuesday, July 26, 2016 - link

    Correct, ROPs are packaged in with the memory controllers.
  • DanNeely - Tuesday, July 26, 2016 - link

    See Ryan's reply to me above; NVidia says GP100 is fully graphics capable.
  • eddman - Tuesday, July 26, 2016 - link

    Ryan, there are a few M5000 typos in the article. In at least two occasions you've written M5000 instead of P5000.
  • Mirel Aretu - Monday, March 20, 2017 - link

    I really hope you guys can help me decide about what graphics card to use, since I'm not that tech savvy. I will make a list with all the software I use for my work, to give you a better idea. So here it is: Cinema 4D, RealFlow, XParticles, Turbulence FD, Houdini, After Effect - for compositing and visual effects, Illustrator, Photoshop (extensively, on a daily bases), Maya, 3DS Max, Blender, Mocha, Z-Brush/Mudbox (when needed), basically anything that gets the job done, the list is very long. In essence, I need a graphics card with a high computational power to help me with particle simulation, rendering, video encoding and so forth.
    Will the new, cheaper, GTX 1080TI FE do the job or should I just go ahead, sacrifice my soul, and buy a very expensive Quadro P5000?
    Since I never had the chance to put them both to test and never will, nor I understand what one does better than the other, I simply can not decide.

Log in

Don't have an account? Sign up now