Earlier this week we took a look at the GeForce GTX Titan X, NVIDIA’s first product to use their new high-end Maxwell GPU, the GM200. Now just 2 days later the company is back again with GM200 and is set to launch it in their new professional graphics counterpart, the Quadro M6000.

Like Titan, 6000 is NVIDIA’s flagship Quadro card and today’s launch sees the new GM200 based Quadro M6000 take its place at the top of the Quadro graphics stack. What makes this launch interesting is that NVIDIA has never launched a flagship Quadro card so close to a flagship GeForce card in this manner. Quadro cards usually launch months down the line, not days. The end result being that professional users are getting much earlier access to NVIDIA’s best hardware.

NVIDIA Quadro Specification Comparison
  M6000 K6000 K5200 6000
CUDA Cores 3072 2880 2304 448
Texture Units 192 240 192 56
ROPs 96 48 32 48
Core Clock N/A 900MHz 650MHz 574MHz
Boost Clock ~1140MHz N/A N/A N/A
Memory Clock 6.6GHz GDDR5 6GHz GDDR5 6GHz GDDR5 3GHz GDDR5
Memory Bus Width 384-bit 384-bit 256-bit 384-bit
FP64 1/32 FP32 1/3 FP32 1/3 FP32 1/2 FP32
TDP 250W 225W 150W 204W
GPU GM200 GK110 GK110 GF110
Architecture Maxwell 2 Kepler Kepler Fermi
Transistor Count 8B 7.1B 7.1B 3B
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 40nm

So just what is Quadro M6000? Packing a fully enabled GPU, this is GM200 at its best. All 3072 CUDA cores are enabled, and with a maximum clockspeed of 1.14GHz the card is capable of pushing 7 TFLOPs of single precision performance. Coupled with the card is GM200’s double-sized ROP clusters, giving M6000 96 ROPs and better than 2x the pixel throughput of the outgoing K6000.

Meanwhile it’s interesting to note that NVIDIA’s GPU Boost technology has finally come to the Quadro lineup via the M6000. The  M6000 supports 10 different boost states, the fastest of which is the 1.14GHz state that gives the card its 7 TFLOPS of performance. As with GeForce and Tesla cards, GPU Boost allows NVIDIA to raise their shipping clockspeeds for better performance without violating the card’s cooling or power delivery restrictions.

Paired with the GM200 is 12GB of GDDR5 memory, which is as much as the K6000 and still the most one can pack on a memory bus of this size. M6000 clocks its memory at 6.6GHz, which is good for 317GB/sec of memory bandwidth. Furthermore, as with past high-end Quadro cards ECC protection is available for the memory (and only the memory, no cache), which trades off some memory bandwidth for better protection against memory errors.

On the overall performance front, Quadro M6000 is expected to offer a significant performance boost over K6000, similar to what we’ve seen on the consumer side with GTX Titan X. Along with the greater clockspeed and the slight increase in the number of CUDA cores, M6000 brings with it the Maxwell 2 family architecture and its efficiency improvements. Actual performance will depend on the application, but 50% or more is possible, especally in exotic scenarios that stress the ROPs. To that end NVIDIA gave Lucasfilm some of the first M6000 cards, and they reported a better than expected performance increase:

To create the most immersive and visually exciting imagery imaginable, Lucasfilm artists and developers need optimal graphics performance and GPU power," said Lutz Latta, Principal Engineer at Lucasfilm. "With the NVIDIA Quadro M6000 GPU, we saw overall gains of 55% in heavy a compute and memory access ray-tracing application using layered shadow maps. This kind of performance boost gives our artists a necessary edge to realize their creative vision.
(Emphasis ours)

Along with Maxwell 2’s architectural efficiency improvements, Maxwell 2 also brings with it a series of feature improvements that make their debut in the Quadro family on the M6000. On the display side, M6000 is the first Quadro capable of driving four 4K displays (previous gen Quadros were limited to two such displays) thanks to the updated display controller. Meanwhile Quadro also gains the latest NVENC video encoder, which though unlikely to be used at this early stage, opens the door up to real-time HEVC encoding on Quadro.

As for the card’s construction and power requirements, both have changed compared to K6000. M6000’s TDP is 250W, up from 225W on K6000. The increased TDP allows for higher clockspeeds than the Quadro family’s historically conservative clockspeeds, and is at this point equivalent to the consumer GTX Titan X’s power requirements. Interestingly despite this increase, M6000 only requires 1 8-pin PCIe power connector (located on the far side of the card, as in past Quadro designs); this technically puts the M6000 out of spec on PCIe since 250W is more than what the slot + 8-pin connector can provide (225W). We asked NVIDIA about this, and they have told us that the card is pulling the extra power from the 8-pin connector, and though not officially in spec, the kind of systems expected to house the M6000 are expected to have no problem delivering the extra amperage necessary.

Meanwhile the card’s construction has seen the K6000’s plastic shroud and cooling apparatus replaced with the metal GTX Titan shroud and cooler, similar to the GTX Titan X. This change is largely driven by the power increase, as the GTX Titan cooler is already qualified to handle 250W designs. To set it apart from the GTX Titan X, the M6000 gets a black & green paint job rather than the Titan’s all-black paintjob. Otherwise the change in coolers has no effect on the card’s dimensions, with the card still being a double-slot 10.5” long card, just like the K6000.

Moving on, while M6000 will be a graphics monster, as it’s using the GM200 GPU this means that it will also inherit GM200’s compute capabilities, including the GPU’s highly limited double precision (FP64) performance. On the more recent Quadro 6000 cards, NVIDIA has used GPUs with high FP64 throughput (largely an artifact of also using these GPUs in Tesla compute cards) and left FP64 throughput unrestricted on Quadro cards. This made the Quadro K6000 a sort of jack of all trades, offering NVIDIA’s best pro graphics performance along with their full compute performance.

However GM200 and the Quadro M6000 change that. With Quadro M6000 having a native FP64 rate of 1/32 FP32, M6000 will only have minimal FP64 capabilities. In our GTX Titan X article we discuss the development rationale for this, but NVIDIA has essentially opted to build the best graphics and FP32 compute GPU they can, and not waste space on FP64 resources. Consequently this is the first Quadro 6000 series card in some time to have such poor FP64 performance. However as FP64 compute is not widely used in graphics, this is not something NVIDIA believes will be an issue. In the far more common scenario of FP32 compute (e.g. most ray-tracing engines), M6000 will be far more performant than its predecessors.

Finally, as far as use cases go, NVIDIA is aiming the M6000 at a cross-section of possible markets. There is of course the traditional pro visualization market, the high-end of which is always in need of greater GPU performance, something the M6000 can provide in spades. However the company is also pushing the use of Physically Based Rendering (PBR), a compute-intensive rendering solution that uses far more accurate rendering algorithms to accurately model the physical characteristic of a material, in essence properly capturing how light will interact with that material and reflect off of it rather than using a rough approximation. We’ll have more on PBR a bit later this week when we talk about Quadro developments at GDC.

Wrapping things up, NVIDIA tells us that Quadro M6000 will be available soon in complete systems through the company’s regular OEM partners, and as individual cards via the typical retail channels. As is company for NVIDIA, they have not announced a launch price for the M6000, but we would expect to see it launch at $5000+, as has been the case with past Quadro 6000 series cards.

Quadro VCA (2015)

Meanwhile with the launch of the Quadro M6000, NVIDIA is also using this opportunity to refresh their Iray Visual Computing Appliance (VCA), the company’s high-end network-attached render server. The VCA specializes in very high performance remote rendering jobs, packing in multiple GPUs into a single server box, with further scale-out capabilities to multiple VCA boxes via 10GigE and Infiniband.

Now dubbed the Quadro VCA, this updated VCA packs in 8 of NVIDIA’s high-end Quadro cards. The cards themselves are GM200 based but are technically not M6000 – NVIDIA is quick to note that they have a different BIOS that has them clocked slightly differently – but should perform similar to the aforementioned M6000.  These cards have 12GB per GPU and are fully enabled, giving the entire VCA some 96GB of VRAM and 24,576 CUDA cores.

Driving the Quadro cards will be a pair of 10-core Xeon processors (we don’t have the specific model at this time, but believe it to be from the Xeon E5 V3 family), 256GB of system memory, and 2TB of solid state storage.  Other than the change in processors and the updated Quadro cards, the rest of these specs are identical to the previous generation VCA.

On the software side, the new Quadro VCA runs CentOS 6.6. It will also come with Iray 2015 and Chaos’s V-Ray RT pre-installed to make setup easier, however it should be noted that the VCA does not include the licenses for those software packages and those must be purchased separately.

The Quadro VCA will be available soon through NVIDIA's VCA partners for $50,000.



View All Comments

  • ddriver - Friday, March 20, 2015 - link

    Simulations for architecture and automotive industry. DP is important, if you don't want buildings to collapse and people to die.

    And just because you talk out of your butt doesn't mean it is a common practice everywhere, it's just you and it isn't normal...
  • ddriver - Friday, March 20, 2015 - link

    Also, you clearly have no idea of the degree of creeping approximation errors which result in just a few calculations at 32 bit resolution. Even for sound or image processing you are far better off with 64bit precision, anything less is plain out unprofessional. Reply
  • Evarin - Thursday, March 19, 2015 - link

    Can someone explain to me beyond just saying "servers" what this is used for? What would be a specific task you would assign to a rig like the Quadro VCA? Reply
  • LukaP - Thursday, March 19, 2015 - link

    Its never for servers. This is a render farm node. basically you have a bunch of these nodes, and you send rendering data to them from your lousy 1 card 1 cpu workstation, and they render it way faster for you :) Reply
  • WithoutWeakness - Thursday, March 19, 2015 - link

    A company like Pixar might be looking to build a new render farm to crank out their next animated film. They would be looking to buy something like these VCA nodes and stuff a ton of them in a bunch of racks and connect them all together. A desktop computer with one of these cards might take an hour to render a scene. A rack with 8 VCAs with 8 cards in each unit could take less than a minute to render the same scene. If they buy a few racks' worth of VCAs and interconnect them then they would be able to crank out video even faster or go the other way and render more complex things in less time.

    Things like Merida's hair in Brave or Sully from Monsters Inc are ridiculously difficult to render and even with a rendering farm can take hours to render a single frame. Rendering something like that with a regular workstation would be totally impractical.
  • npz - Thursday, March 19, 2015 - link

    Could you add some integer (32 & 64 bit) benchmarks to your GPU testsuite?

    Maxwell improved in integer over Kepler and despite the DP = 1/32 FP32, I would like to know how Maxwell 2 does.

    Also would like to know how more general computing performs on GPU if there are benchmarks for that too. For example how does it handle branchy code or thread synchronization?
  • Kevin G - Thursday, March 19, 2015 - link

    I wonder when the M4000 and M2000 are going to be released. They should be GM204 based so nVidia isn't necessarily waiting around to finish a new chip for them. Reply
  • HisDivineOrder - Thursday, March 19, 2015 - link

    Oh, I get it. People assume they built these Big Maxwell boards with an emphasis on gaming because of gamers buying $1k cards (or cut-down variants for $650-750). No. They built these cards to slap into GRID centers everywhere and provide more vGPU's with less space.

    That explains why they left Prosumers behind. GRID is a more important need atm.
  • Mikemk - Thursday, March 19, 2015 - link

    When I saw the M, I thought this was a mobile card. Reply

Log in

Don't have an account? Sign up now