The NVIDIA Titan V Deep Learning Deep Dive: It's All About The Tensor Coresby Nate Oh on July 3, 2018 10:15 AM EST
DeepBench Inference: Convolutions
Moving on to convolutions, 8-bit multiply/32-bit accumulate again makes an appearance with INT8 inferencing.
The most striking aspect in the average convolutions performance is Titan Xp's superior INT8 throughput. The numbers, being comparable to the DeepBench Titan Xp inference results, are correct. Nor is padding responsible for the disparity.
Breaking out the convolutions into application-specific workloads, we see that Resnet, Speaker ID, and Vision showcase Titan Xp's superior INT8 performance.
Nothing seems obvious from the kernels, but if anything, this is likely due to DP4A library/driver maturity on Pascal, compared to it's Volta implementation. There's also the chance that Volta is handling it solely through its INT cores.