NVIDIA has introduced a new version of its DGX-2 server that is outfitted with higher-performing CPUs and GPUs. The DGX-2H server is powered by 16 Tesla V100 GPUs that run at higher clocks and feature a 450 W TDP each. The whole system consumes up to 12 kW of power and delivers 2.1 PetaFLOPS of compute horsepower.

NVIDIA’s DGX-2H is an updated version of the DGX-2 machine the company introduced earlier this year. The new system is based on Intel’s two 24-core Intel Xeon Platinum 8174 processor accompanied by 1.5 TB of DDR4 memory, as well as 30 TB of NVMe storage. The key improvements of the new server versus the previous one are faster NVIDIA Tesla V100 GPUs featuring 512 GB of HBM2 memory in total. Meanwhile, the new DGX-2H similar networking capabilities: 10/25/40/50/100 GbE.

UPDATE 11/29: NVIDIA has reached out to clarify a number of data points regarding the DGX servers, so the story has been updated.

NVIDIA DGX Series (with Volta)
  DGX-2H DGX-2 DGX-1
CPUs 2 x Intel Xeon
Platinum 8174
2 x Intel Xeon
Platinum 8168
2 x Intel Xeon
E5-2600 v4
GPUs 16 x NVIDIA Tesla V100
32GB HBM2 (450 W)
16 x NVIDIA Tesla V100
32GB HBM2 (350 W)
8 x NVIDIA Tesla V100
32 GB HBM2
System Memory Up to 1.5 TB DDR4 Up to 0.5 TB DDR4
GPU Memory 512 GB HBM2
(16 x 32 GB)
256 GB HBM
(8 x 32 GB)
Storage 30 TB NVMe
Up to 60 TB
4 x 1.92 TB NVMe
Networking 8 x Infiniband
or
Dual 100 GbE
8 x Infiniband
or
Dual 100 GbE
4 x IB +
2 x 10 GbE
Power 12 kW 10 kW 3.5 kW
Size 360 lbs 360 lbs 134 lbs
GPU Throughput Tensor: 2100 TFLOPs
FP16: ? TFLOPs
FP32: ? TFLOPs
FP64: ? TFLOPs
Tensor: 1920 TFLOPs
FP16: 480 TFLOPs
FP32: 240 TFLOPs
FP64: 120 TFLOPs
Tensor: 960 TFLOPs
FP16: 240 TFLOPs
FP32: 120 TFLOPs
FP64: 60 TFLOPs
Cost ? $399,000 $149,000

Thanks to faster graphics processors with a 450 W TDP each, the system now can deliver 2.1 PFLOPS of compute performance, up from 2 PFLOPS before. Meanwhile, in a bid to increase power, it looks like NVIDIA had to switch to a new cooling method. ServeTheHome believes that NVIDIA also uses a new cooling subsystem as the DGX-2H weighs 20 pounds more than its predecessor (360 pounds vs. 340 pounds), though the company has not confirmed this. Along with performance improvements NVIDIA had to decrease maximum operating temperature of the DGX-2H from 35C to 25C.

NVIDIA has not disclosed pricing of the DGX-2H, though it is likely that it will cost more than $399,000, the price of the DGX-2. What remains to be seen is whether NVIDIA customers find the DGX-2H performance good enough for extra 2 kW of power consumption.

Related Reading:

Sources: NVIDIA, ServeTheHome

POST A COMMENT

14 Comments

View All Comments

  • Kevin G - Tuesday, November 20, 2018 - link

    The change in weight and cooling spec makes me wonder if they included a liquid cooling system internally. Reply
  • DanNeely - Wednesday, November 21, 2018 - link

    I doubt it. The weight increase would only allow ~1kg for each CPU/GPU's share of waterblock, radiator, and coolant. It's a server setup, so the air cooled version would be relatively small heatsinks with massive wear hearing protection levels of case level airflow; so dropping the air cooling heatsinks doesn't free up much additional weight. Reply
  • MrSpadge - Tuesday, November 20, 2018 - link

    The maximum GPU memory of the DGX-1 should be 8 x 16 GB = 128 GB, shouldn't it? Reply
  • plopke - Tuesday, November 20, 2018 - link

    the data sheet it says "GPU Memory 256 GB total system" ,
    but when I open the white paper of DGX-1 it says "he eight Tesla V100 GPUs have a total of 128 GB HBM2 memory"

    Maybe part of system memory is reserved for the GPU?
    Reply
  • Eric Klien - Wednesday, November 21, 2018 - link

    The original DGX-1 had 128 GB while the latest DGX-1 has 256 GB as the memory per GPU has doubled. So this chart should be fixed showing that each GPU has 32 GB in all 3 systems. I believe you can still buy the original DGX-1 for a mere $129,000. Reply
  • Charlie22911 - Tuesday, November 20, 2018 - link

    Maximum operating temperature of 25c?! Is that normal for systems like this? Why so low? Reply
  • jimjamjamie - Tuesday, November 20, 2018 - link

    There's 16x 450W GPUs in that box. If you're going to spend half a million bucks on something like this, you should probably get some nice AC to stop it from going nuclear when you try and run minecraft. Reply
  • Death666Angel - Wednesday, November 21, 2018 - link

    For AC controlled server rooms, that seems quite high, at least compared to the ones I know. You don't want to bake your millions of dollars worth of computer equipment anyway. Reply
  • mode_13h - Tuesday, November 20, 2018 - link

    I'm actually more impressed they doubled the tensor throughput simply by going to 350 W. The extra bump from going to 450 W isn't worth it, IMO. Reply
  • Santoval - Tuesday, November 20, 2018 - link

    You misread the specs. They did not double the tensor (along with the FP16/32/64) performance of the DGX-1 by raising the TDP of the DGX-2 graphics cards but by doubling their number. Since the numbers exactly doubled we can safely assume that the TDP of the DGX-1 and DGX-2 Tesla V100s is exactly the same (350W). Reply

Log in

Don't have an account? Sign up now