Compute & Synthetics

Shifting gears, we'll look at the compute and synthetic aspects of the GTX 1660 Ti.

Beginning with CompuBench 2.0, the latest iteration of Kishonti's GPU compute benchmark suite offers a wide array of different practical compute workloads, and we’ve decided to focus on level set segmentation, optical flow modeling, and N-Body physics simulations.

Compute: CompuBench 2.0 - Level Set Segmentation 256

Compute: CompuBench 2.0 - N-Body Simulation 1024K

Compute: CompuBench 2.0 - Optical Flow

On paper, the GTX 1660 Ti looks to provide around 85% of the RTX 2060's compute and shading throughput; for Compubench, we see it achieving around 82% of the latter's performance.

Moving on, we'll also look at single precision floating point performance with FAHBench, the official Folding @ Home benchmark. Folding @ Home is the popular Stanford-backed research and distributed computing initiative that has work distributed to millions of volunteer computers over the internet, each of which is responsible for a tiny slice of a protein folding simulation. FAHBench can test both single precision and double precision floating point performance, with single precision being the most useful metric for most consumer cards due to their low double precision performance.

Compute: Folding @ Home Single Precision

Next is Geekbench 4's GPU compute suite. A multi-faceted test suite, Geekbench 4 runs seven different GPU sub-tests, ranging from face detection to FFTs, and then averages out their scores via their geometric mean. As a result Geekbench 4 isn't testing any one workload, but rather is an average of many different basic workloads.

Compute: Geekbench 4 - GPU Compute - Total Score

In lieu of Blender, which has yet to officially release a stable version with CUDA 10 support, we have the LuxRender-based LuxMark (OpenCL) and V-Ray (OpenCL and CUDA).

Compute/ProViz: LuxMark 3.1 - LuxBall and Hotel

Compute/ProViz: V-Ray Benchmark 1.0.8

We'll also take a quick look at tessellation performance.

Synthetic: TessMark, Image Set 4, 64x Tessellation

Finally, for looking at texel and pixel fillrate, we have the Beyond3D Test Suite. This test offers a slew of additional tests – many of which we use behind the scenes or in our earlier architectural analysis – but for now we’ll stick to simple pixel and texel fillrates.

Synthetic: Beyond3D Suite - Pixel Fillrate

Synthetic: Beyond3D Suite - Integer Texture Fillrate (INT8)

Synthetic: Beyond3D Suite - Floating Point Texture Fillrate (FP32)

The practically identical pixel fill rates for the GTX 1660 Ti and RTX 2060 might seem odd at first blush, but it is an entirely expected result as both GPUs have the same number of ROPs, similar clockspeeds, same GPC/TPC setup, and similar memory configurations. And being the same generation/architecture, there aren't any changes or improvements to DCC. In the same vein, the RTX 2060 puts up a 25% higher texture fillrate over the GTX 1660 Ti as a consequence of having 25% more TMUs (96 vs 120).

 

Total War: Warhammer II Power, Temperature, and Noise
Comments Locked

157 Comments

View All Comments

  • Rudde - Friday, February 22, 2019 - link

    Never mind, the second page explains this well. (Parallell execution of fp16, fp32 and int32)
  • CiccioB - Saturday, February 23, 2019 - link

    Not only that.
    With Turing you also get mesh shading and a better support for thread switching, which is a awful technique used on GCN to improve its terrible efficiency, having lots of "bubbles" in the pipelines.
    That's the reason you see previous AMD optimized games that didn't run too well with Pascal work much better with Turing, as the high threaded technique (the famous AC which is a bit overused in engines created for the console HW) is not going to constantly stall the SM with useless work as that of frequent task switching.
  • AciMars - Saturday, February 23, 2019 - link

    “Worse yet, the space used per SM has gotten worse“. not true.. you know, turing have separate cuda cores for int and fp. It means when turing have 1536 cuda cores means 1536 int + 1536 fp cores. So on die size actually turing have 2x cuda cores compare to pascal
  • CiccioB - Monday, February 25, 2019 - link

    Not exactly, the number of CUDA core are the same, just that a new independent ALU as been added.
    A CUDA core is not only an execution unit, it also registers, memory (cache), buses (memory access) and other special execution units (load/store).
    By adding a new integer ALU you don't automatically get double the capacity as really doubling the number of a complete CUDA core.
  • ballsystemlord - Friday, February 22, 2019 - link

    Here are some spelling and grammar corrections.

    This has proven to be one of NVIDIA's bigger advantages over AMD, an continues to allow them to get away with less memory bandwidth than we'd otherwise expect some of their GPUs to need.
    Missing d as in "and":
    This has proven to be one of NVIDIA's bigger advantages over AMD, and continues to allow them to get away with less memory bandwidth than we'd otherwise expect some of their GPUs to need.
    so we've only seen a handful of games implement (such as Wolfenstein II) implement it thus far.
    Double implement, 1 befor ()s and 1 after:
    so we've only seen a handful of games (such as Wolfenstein II) implement it thus far.

    For our games, these results is actually the closest the RX 590 can get to the GTX 1660 Ti,
    Use "are" not "is":
    For our games, these results are actually the closest the RX 590 can get to the GTX 1660 Ti,

    This test offers a slew of additional tests - many of which use behind the scenes or in our earlier architectural analysis - but for now we'll stick to simple pixel and texel fillrates.
    Missing "we" (I suspect that the sentence should be reconstructed without the "-"s, but I'm not that good.):
    This test offers a slew of additional tests - many of which we use behind the scenes or in our earlier architectural analysis - but for now we'll stick to simple pixel and texel fillrates.

    "Looking at temperatures, there are no big surprises here. EVGA seems to have tuned their card for cooling, and as a result the large, 2.75-slot card reports some of the lowest numbers in our charts, including a 67C under FurMark when the card is capped at the reference spec GTX 1660 Ti's 120W limit."
    I think this could be clarified as their are 2 EVGA cards in the charts and the one at 67C is not explicitly labeled as EVGA.

    Thanks
  • Ryan Smith - Saturday, February 23, 2019 - link

    Thanks!
  • boozed - Friday, February 22, 2019 - link

    The model numbers have become quite confusing
  • Yojimbo - Saturday, February 23, 2019 - link

    I don't think they are confusing, 16 is between 10 and 20, plus the RTX is extra differentiation. In fact if NVIDIA had some cards in the 20 series with RTX capability and some cards in 20 series without RTX capability, even if some were 'GTX' and some were 'RTX', then that would be far more confusing. Putting the non-RTX Turing cards in their own series is a way of avoiding confusion. But if they actually come out with an "1180" as say some rumors floating around, that would be very confusing.
  • haukionkannel - Saturday, February 23, 2019 - link

    Interesting to see the next year.
    Rtx 3050 and gtx 2650ti for the weaker version, if we get one new card rtx family... Hmm... that could work if They keep the naming. 2021 RTX3040 and gtx 2640ti...
  • CiccioB - Thursday, February 28, 2019 - link

    Next generation all cards will have enough RT and tensor core enabled.

Log in

Don't have an account? Sign up now