NVIDIA has introduced a new version of its DGX-2 server that is outfitted with higher-performing CPUs and GPUs. The DGX-2H server is powered by 16 Tesla V100 GPUs that run at higher clocks and feature a 450 W TDP each. The whole system consumes up to 12 kW of power and delivers 2.1 PetaFLOPS of compute horsepower.

NVIDIA’s DGX-2H is an updated version of the DGX-2 machine the company introduced earlier this year. The new system is based on Intel’s two 24-core Intel Xeon Platinum 8174 processor accompanied by 1.5 TB of DDR4 memory, as well as 30 TB of NVMe storage. The key improvements of the new server versus the previous one are faster NVIDIA Tesla V100 GPUs featuring 512 GB of HBM2 memory in total. Meanwhile, the new DGX-2H similar networking capabilities: 10/25/40/50/100 GbE.

UPDATE 11/29: NVIDIA has reached out to clarify a number of data points regarding the DGX servers, so the story has been updated.

NVIDIA DGX Series (with Volta)
  DGX-2H DGX-2 DGX-1
CPUs 2 x Intel Xeon
Platinum 8174
2 x Intel Xeon
Platinum 8168
2 x Intel Xeon
E5-2600 v4
GPUs 16 x NVIDIA Tesla V100
32GB HBM2 (450 W)
16 x NVIDIA Tesla V100
32GB HBM2 (350 W)
8 x NVIDIA Tesla V100
32 GB HBM2
System Memory Up to 1.5 TB DDR4 Up to 0.5 TB DDR4
GPU Memory 512 GB HBM2
(16 x 32 GB)
256 GB HBM
(8 x 32 GB)
Storage 30 TB NVMe
Up to 60 TB
4 x 1.92 TB NVMe
Networking 8 x Infiniband
or
Dual 100 GbE
8 x Infiniband
or
Dual 100 GbE
4 x IB +
2 x 10 GbE
Power 12 kW 10 kW 3.5 kW
Size 360 lbs 360 lbs 134 lbs
GPU Throughput Tensor: 2100 TFLOPs
FP16: ? TFLOPs
FP32: ? TFLOPs
FP64: ? TFLOPs
Tensor: 1920 TFLOPs
FP16: 480 TFLOPs
FP32: 240 TFLOPs
FP64: 120 TFLOPs
Tensor: 960 TFLOPs
FP16: 240 TFLOPs
FP32: 120 TFLOPs
FP64: 60 TFLOPs
Cost ? $399,000 $149,000

Thanks to faster graphics processors with a 450 W TDP each, the system now can deliver 2.1 PFLOPS of compute performance, up from 2 PFLOPS before. Meanwhile, in a bid to increase power, it looks like NVIDIA had to switch to a new cooling method. ServeTheHome believes that NVIDIA also uses a new cooling subsystem as the DGX-2H weighs 20 pounds more than its predecessor (360 pounds vs. 340 pounds), though the company has not confirmed this. Along with performance improvements NVIDIA had to decrease maximum operating temperature of the DGX-2H from 35C to 25C.

NVIDIA has not disclosed pricing of the DGX-2H, though it is likely that it will cost more than $399,000, the price of the DGX-2. What remains to be seen is whether NVIDIA customers find the DGX-2H performance good enough for extra 2 kW of power consumption.

Related Reading:

Sources: NVIDIA, ServeTheHome

POST A COMMENT

14 Comments

View All Comments

  • mode_13h - Wednesday, November 21, 2018 - link

    Got it. Thanks.

    BTW, I previously read the mezzanine V100's were rated at 300 W. Maybe the DGX-1 was already overclocking them.
    Reply
  • DanNeely - Wednesday, November 21, 2018 - link

    I'm not sure if the newer model really makes a lot of sense unless you need the better networking. 10% faster, 20% (28% if you just look at the tesla cards share, - might be relevant if running a workload that has the CPU and network at idle) more power used isn't an attractive option unless there're scalability issues with spreading workloads across multiple boxes. Reply
  • Yojimbo - Wednesday, November 21, 2018 - link

    In situations where the performance is being bound by thermal constraints in the original DGX-2 the increase in the theoretical throughput is not useful to compare the utility of the new system's higher thermal allowance. I think it's safe to assume that it is exactly those situations this new system is meant to target. We would need real world benchmarks to draw any conclusions, but the safer assumption would be that NVIDIA didn't make this system just to keep their systems engineers and salesmen busy because they had no other work to do. Reply
  • Impetuous - Wednesday, November 21, 2018 - link

    you know you're getting old when no one has asked if it can run Crysis yet... Reply

Log in

Don't have an account? Sign up now