Compute & Synthetics

Shifting gears, we'll look at the compute and synthetic aspects of the RTX 2070. Though it has its own GPU in the form of TU106, the hardware resources at hand are similar in progression to what we've seen in TU102 and TU104.

Starting off with GEMM tests, the RTX 2070's tensor cores are pulled into action with half-precision matrix multiplication, though using binaries originally compiled for Volta. Because Turing is backwards compatible and in the same compute capability family as Volta (sm_75 compared to Volta's sm_70), the benchmark continues to work out-of-the-box, though without any Turing optimizations.

Compute: General Matrix Multiply Half Precision (HGEMM)Compute: General Matrix Multiply Single Precision (SGEMM)

At reference specifications, peak theoretical tensor throughput is around 107.6 TFLOPS for the RTX 2080 Ti, 80.5 TFLOPS for the RTX 2080, and 59.7 TFLOPS for the RTX 2070. Unlike the 89% efficiency with the Titan V's 97.5 TFLOPS, the RTX cards are essentially at half that level, with around 47%, 48%, and 45% efficiency for the RTX 2080 Ti, 2080, and 2070 respectively. A Turing-optimized binary should bring that up, though it is possible that the GeForce RTX cards may not be designed for efficient tensor FP16 operations as opposed to the INT dot-product acceleration. After all, the GeForce RTX cards are for consumers and ostensibly intended for inferencing rather than training, which is the reasoning for the new INT support in Turing tensor cores.

In terms of SGEMM efficiency though, the RTX 2070 is hitting a ridiculous 97% of its touted 7.5 TFLOPS, though to be fair the reference specifications here are done manually rather with a reference vBIOS. The other two GeForce RTX cards are at similar 90+% levels of efficiency, though a GEMM test like this is specifically designed for maximum utilization.

Compute: CompuBench 2.0 - Level Set Segmentation 256

Compute: CompuBench 2.0 - N-Body Simulation 1024KCompute: CompuBench 2.0 - Optical Flow

 

Compute: Folding @ Home Single Precision

Compute: Geekbench 4 - GPU Compute - Total Score

The breakdown of the GB4 subscores seems to reveal a similar uplift like we spotted with the Titan V, which had scored in excess of 509,000 points. We'll have to investigate further but Turing and Volta are clearly accelerating some of these workloads beyond what was capable in Pascal and Maxwell.

Synthetic: TessMark, Image Set 4, 64x Tessellation

Given that TU106 has 75% of the hardware resources of TU104, the tessellation performance is in line with expectrations. For reference, we noted earlier that the Titan V scored 703 while the Titan Xp scored 604.

Synthetic: Beyond3D Suite - Pixel Fillrate

Synthetic: Beyond3D Suite - Integer Texture Fillrate (INT8)

Total War: Warhammer II Power, Temperature, and Noise
Comments Locked

121 Comments

View All Comments

  • ThanosPAS - Saturday, November 10, 2018 - link

    Does 2070 worth it vs 1070 for its Tensor cores? I would utilize the card on machine learning. In Final Words, this wasn't covered. It seems to me that 2070 is the cheapest solution in rder to get dedicated Tensor cores, which if I am not mistaken make a great portion of difference in the computational performance between these two cards. Opinions?

Log in

Don't have an account? Sign up now