NVIDIA Unveils PCIe version of 80GB A100 Accelerator: Pushing PCIe to 300 Watts

by Ryan Smith on June 28, 2021 8:00 AM EST

15 Comments | Add A Comment

15 Comments

As part of today’s burst of ISC 2021 trade show announcements, NVIDIA this morning is announcing that they’re bringing the 80GB version of their A100 accelerator to the PCIe form factor. First announced in NVIDIA’s custom SXM form factor last fall, the 80GB version of the A100 was introduced to not only expand the total memory capacity of an A100 accelerator – doubling it from 40GB to 80GB – but it also offered a rare mid-generation spec bump as well, cranking up the memory clockspeeds by a further 33%. Now, after a bit over 6 months, NVIDIA is releasing a PCIe version of the accelerator for customers who need discrete add-in cards.

The new 80GB version of the PCIe A100 joins the existing 40GB version, and NVIDIA will continue selling both versions of the card. On the whole, this is a pretty straightforward transfer of the 80GB SMX A100 over to PCIe, with NVIDIA dialing down the TDP of the card and the number of exposed NVLinks to match the capabilities of the form factor. The release of the 80GB PCIe card is designed to give NVIDIA’s traditional PCIe form factor customers a second, higher-performing accelerator option, particularly for those users who need more than 40GB of GPU memory.

NVIDIA Accelerator Specification Comparison
	80GB A100 (PCIe)	80GB A100 (SXM4)	40GB A100 (PCIe)	40GB A100 (SXM4)
FP32 CUDA Cores	6912	6912	6912	6912
Boost Clock	1.41GHz	1.41GHz	1.41GHz	1.41GHz
Memory Clock	3.0 Gbps HBM2	3.2 Gbps HBM2	2.43Gbps HBM2	2.43Gbps HBM2
Memory Bus Width	5120-bit	5120-bit	5120-bit	5120-bit
Memory Bandwidth	1.9TB/sec (1935GB/sec)	2.0TB/sec (2039GB/sec)	1.6TB/sec (1555GB/sec)	1.6TB/sec (1555GB/sec)
VRAM	80GB	80GB	40GB	40GB
Single Precision	19.5 TFLOPs	19.5 TFLOPs	19.5 TFLOPs	19.5 TFLOPs
Double Precision	9.7 TFLOPs (1/2 FP32 rate)	9.7 TFLOPs (1/2 FP32 rate)	9.7 TFLOPs (1/2 FP32 rate)	9.7 TFLOPs (1/2 FP32 rate)
INT8 Tensor	624 TOPs	624 TOPs	624 TOPs	624 TOPs
FP16 Tensor	312 TFLOPs	312 TFLOPs	312 TFLOPs	312 TFLOPs
TF32 Tensor	156 TFLOPs	156 TFLOPs	156 TFLOPs	156 TFLOPs
Relative Performance (SXM Version)	90%?	100%	90%	100%
Interconnect	NVLink 3 12 Links (600GB/sec)	NVLink 3 12 Links (600GB/sec)	NVLink 3 12 Links (600GB/sec)	NVLink 3 12 Links (600GB/sec)
GPU	GA100 (826mm2)	GA100 (826mm2)	GA100 (826mm2)	GA100 (826mm2)
Transistor Count	54.2B	54.2B	54.2B	54.2B
TDP	300W	400W	250W	400W
Manufacturing Process	TSMC 7N	TSMC 7N	TSMC 7N	TSMC 7N
Interface	PCIe 4.0	SXM4	PCIe 4.0	SXM4
Architecture	Ampere	Ampere	Ampere	Ampere

At a high level, the 80GB upgrade to the PCIe A100 is pretty much identical to what NVIDIA did for the SXM version. The 80GB card’s GPU is being clocked identically to the 40GB card’s, and the resulting performance throughput claims are unchanged.

Instead, this release is all about the on-board memory, with NVIDIA equipping the card with newer HBM2E memory. HBM2E is the informal name given to the most recent update to the HBM2 memory standard, which back in February of this year defined a new maximum memory speed of 3.2Gbps/pin. Coupled with that frequency improvement, manufacturing improvements have also allowed memory manufacturers to double the capacity of the memory, going from 1GB/die to 2GB/die. The net result being that HBM2E offers both greater capacities as well as greater bandwidths, two things which NVIDIA is taking advantage of here.

With 5 active stacks of 16GB, 8-Hi memory, the updated PCIe A100 gets a total of 80GB of memory. Which, running at 3.0Gbps/pin, works out to just under 1.9TB/sec of memory bandwidth for the accelerator, a 25% increase over the 40GB version. This means that not only does the 80GB accelerator offer more local storage, but rare for a larger capacity model, it also offers some extra memory bandwidth to go with it. That means that in memory bandwidth-bound workloads the 80GB version should be faster than the 40GB version even without using its extra memory capacity.

This additional memory does come at a cost, however: power consumption. For the 80GB A100 NVIDIA has needed to dial things up to 300W to accommodate the higher power consumption of the denser, higher frequency HBM2E stacks. This is a very notable (if not outright surprising) change in TDPs due to the fact that NVIDIA has long held the line for its PCIe compute accelerators at 250W, which is broadly considered the limits for PCIe cooling. So a 300W card not only deviates from NVIDIA’s past cards, but it means that system integrators will need to find a way to provide another 50W of cooling per card. This isn’t something I expect to be a hurdle for too many designs, but I definitely won’t be surprised if some integrators continue to only offer 40GB cards as a result.

And even then, the 80GB PCIe A100 would seem to be held back a bit by its form factor. The 3.0Gbps memory clock is 7% lower than the 80GB SXM A100 and its 3.2Gbps memory clock. So NVIDIA is apparently leaving some memory bandwidth on the table just to get the card to fit in the expanded 300W profile.

On that note, it doesn’t appear that NVIDIA has changed the form factor of the PCIe A100 itself. The card is entirely passively cooled, designed to be used with servers with (even more) powerful chassis fans, and fed by dual 8-pin PCIe power connectors.

With regards to overall performance expectations, the new 80GB PCIe card should trail the SXM card in a similar fashion as the 40GB models. Unfortunately, NVIDIA’s updated A100 datasheet doesn’t include a relative performance metric this time around, so we don’t have any official figures for how the PCIe card will compare to the SXM card. But, given the continued TDP differences (300W vs 400W+), I would expect that the real-world performance of the 80GB PCIe card is near the same 90% mark as the 40GB PCIe card. Which serves to reiterate that GPUs clockspeed aren’t everything, especially in this age of TDP-constrained hardware.

In any case, the 80GB PCIe A100 is designed to appeal to the same broad use cases as the SXM version of the card, which roughly boils down to AI dataset sizes, and enabling larger Multi-Instance GPU (MIG) instances. In the case of AI, there are numerous workloads which can benefit in terms of training time or accuracy by using a larger dataset, and overall GPU memory capacity has regularly been a bottleneck in this field, as there’s always someone who could use more memory. Meanwhile NVIDIA’s MIG technology, which was introduced on the A100, benefits from the memory increase by allowing each instance to be allocated more memory; running at a full 7 instances, each can now have up to 10GB of dedicated memory.

Wrapping things up, NVIDIA isn’t announcing specific pricing or availability information today. But customers should expect to see the 80GB PCIe A100 cards soon.

Gallery: NVIDIA ISC 2021 Press Deck

Source: NVIDIA

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

15 Comments

View All Comments

shabby - Monday, June 28, 2021 - link
So will we ever see any more gpu reviews here Ryan?
YB1064 - Monday, June 28, 2021 - link
Good luck trying to find a GPU to review.
JimRamK - Monday, June 28, 2021 - link
I mean media tend to get samples from the manufacturers so it shouldn't be an issue for them. I think it's mostly an issue of time.
shabby - Monday, June 28, 2021 - link
Reviewers can't get cards?
Samus - Tuesday, June 29, 2021 - link
Even if there were video card reviews, I'd been trying to get a 3070 since Christmas for anywhere near retail price and just kind of gave up on PC gaming for the year. I guess I'll revisit the concept of upgrading my 1070Ti when Battlefield comes out.

After all, what's the point of getting all excited for something that you can't reasonably get?
flyingpants265 - Wednesday, June 30, 2021 - link
My friend got a 3070 and I got two 3060tis, just by waiting on waiting list.
df99 - Tuesday, July 6, 2021 - link
Which waiting list, how long did you wait?

How much did you pay? TIA.
DominionSeraph - Monday, June 28, 2021 - link
Still waiting on the GTX 960 review...
Ryan Smith - Monday, June 28, 2021 - link
Yes, you will.
shabby - Monday, June 28, 2021 - link
Did Nvidia always use tsmc for these type of gpus? Thought they were on the samsung bandwagon?

NVIDIA Unveils PCIe version of 80GB A100 Accelerator: Pushing PCIe to 300 Watts

Post Your Comment

15 Comments

View All Comments

shabby - Monday, June 28, 2021 - link

YB1064 - Monday, June 28, 2021 - link

JimRamK - Monday, June 28, 2021 - link

shabby - Monday, June 28, 2021 - link

Samus - Tuesday, June 29, 2021 - link

flyingpants265 - Wednesday, June 30, 2021 - link

df99 - Tuesday, July 6, 2021 - link

DominionSeraph - Monday, June 28, 2021 - link

Ryan Smith - Monday, June 28, 2021 - link

shabby - Monday, June 28, 2021 - link

Log in

Don't have an account? Sign up now