Overclocking: 4.0 GHz for 500W

Who said that a 250W processor should not be overclocked? AMD prides itself as being a processor manufacturer that offers every consumer processor as a multiplier unlocked part, as well as using a soldered thermal interface material to assist with thermal dissipation performance. This 2990WX has an X in the same, so let the overclocking begin!

Actually, confession time. We did not have much time to do overclocking by any stretch. This processor has a 3.0 GHz base frequency and a 4.2 GHz turbo frequency, and in an air-conditioned room using the 500W Enermax Liqtech cooler, when running all cores under POV-Ray, we observed each core running around 3150 MHz, which is barely above the turbo frequency. The first thing I did was set the all-core turbo to 4.2 GHz, the same as the single core turbo frequency. That was a bust.

However, the next stage of my overclocking escapades surprised me. I set the CPU to a 40x multiplier in the BIOS, for 4.0 GHz on all the cores, all the time. I did not adjust the voltage, it was kept at auto, and I was leaving the ASUS motherboard to figure it out. Lo and behold, it performed flawlessly through our testing suite at 4.0 GHz. I was shocked.

All I did for this overclock was turn a setting from ‘auto’ to ‘40’, and it breezed through almost every test I threw at it. I say almost every test – our Prime95 power testing failed. But our POV-Ray power testing, which draws more power, worked. Every benchmark in the suite worked. Thermals were high (in the 70s), but the cooler could take it, and with good reason too.

At full load in our POV-Ray test, the processor was listed as consuming 500W. The cooler is rated for 500W. At one point we saw 511W. This was split between 440W for the cores (or 13.8W per core) and 63W for the non-core (IF, IO, IMC) which equates to only 12.5% of the full power consumption. It answers the question from our Infinity Fabric power page - if you want the interconnect to be less of the overall power draw, overclock!

We also tried 4.1 GHz, and that seemed to work as well, although we did not get a full benchmark run out of it before having to pack the system up. As stated above, 4.2 GHz was a no-go, even when increasing the voltage. With tweaking (and the right cooling), it could be possible. For anyone wanting to push here, chilled water might be the way to go.

Performance at 4.0 GHz

So if the all-core frequency was 3125 MHz, an overclock to 4000 MHz all-core should give a 28% performance increase, right? Here are some of the key tests from our suite.

AppTimer: GIMP 2.10.4 (copy)Blender 2.79b bmw27_cpu Benchmark (copy)POV-Ray 3.7.1 Benchmark (copy)WinRAR 5.60b3 (copy)PCMark10 Extended Score (copy)Agisoft Photoscan 1.3.3, Complex Test (copy)

Overclocking the 2990WX is a mixed bag, because of how it does really well in some tests, and how it still sits behind the 2950X in others due to the bi-modal nature of the cores. In the tests were it already wins, it pushes out a lot more: Blender is up 19% in throughput, POV-Ray is up 19%, 3DPM is up 19%. The other tests, is catches back up to the 2950X (Photoscan), or still lags behind (app loading, WinRAR).

Overclocking is not the cure-all for the performance issues on the 2990WX, but it certainly does help.

Power Consumption, TDP, and Prime95 vs POV-Ray Thermal Comparisons and XFR2: Remember to Remove the CPU Cooler Plastic!
Comments Locked

171 Comments

View All Comments

  • T1beriu - Monday, August 13, 2018 - link

    > We confirmed this with AMD, but for the most part the scheduler will load up the cores that are directly attached to memory first, before using the other cores. [...]

    It seems that Tomshardware says the opposite:

    >AMD continues working with Microsoft to route threads to the die with direct-attached memory first, and then spill remaining threads over to the compute dies. Unfortunately, the scheduler currently treats all dies as equal, operating in Round Robin mode. [...] According to AMD, Microsoft has not committed to a timeline for updating its scheduler.
  • Ian Cutress - Monday, August 13, 2018 - link

    Yeah, Paul and I were discussing this. It is a round robin mode, but it's weighted based on available resources, thermal performance, proximity of busy threads, etc.
  • JoeyJoJo123 - Monday, August 13, 2018 - link

    Maybe just user error, but all the article pages between Test Setup and Comparison Results to Going up Against Epyc, just have the text "Still writing...". I'm unsure if the article is actually still being written and was supposed to be published in this partial manner or if possible something was lost between writing and upload.

    In any case, kind of crazy how the infinity fabric is consuming so much power. The cores look super-efficient, but if the uncore can get efficiency improvements, that can help the Zen architecture stay even more efficient under load. Intel's uncore consumes a fraction of the wattage, but doesn't scale as well for multiple threads.
  • Ian Cutress - Monday, August 13, 2018 - link

    Still being written. See my comment at the top. Unfortunately travel back and forth from UK to SF bit me over the weekend and I lost a couple of days testing, along with having to take a full benchmark set up with me to SF to test in the hotel room.
  • JoeyJoJo123 - Monday, August 13, 2018 - link

    I understand, take your rest. You don't need to reply to me, I actually saw the reason after I posted.
  • compilerdev2 - Monday, August 13, 2018 - link

    Hi Ian,
    I have some questions about the Chromium compilation benchmark, since I was hoping to get the 2990WX for compiling large C++ apps. What version of Chromium is used? Is the compiler being used Clang-CL or Visual C++? Is the build in debug or release (optimized) mode? If it's release mode with Visual C++, does it use LTCG? (link-time code generation, the equivalent of LTO of gcc/clang). For example, if the build is Visual C++ LTCG, the entire code optimization, code generation and linking is by default limited to 4 threads. Thanks!
  • Ian Cutress - Monday, August 13, 2018 - link

    It's the standard Windows walkthrough available online. So we use a build of Chrome 62 (it was relevant when we pulled), VC++, build in release. It's done in the command line via ninja, and yes it does use LTCG.

    Destructions are here. They might be updated a little from when I wrote the benchmark. Out test is automated to keep consistency.

    https://chromium.googlesource.com/chromium/src/+/m...
  • compilerdev2 - Monday, August 13, 2018 - link

    With LTCG those strange results make sense - it's spending a lot of time on just 4 threads - actually majority of the time is on one thread for the Chromium case, it hits some current limitations of the VC++ compiler regarding CPU/memory usage that makes scaling worse for Chromium (but not for smaller programs or with non-LTCG builds). Increasing the number of threads from the default of 4 is possible, but will not help here. The frontend (parsing) work is well parallelized by Ninja, it's probably the reason why the Threadrippers do end up ahead of the faster single-core Intel CPUs. It would be interesting to see the benchmarks without LTCG, or even better, more compilation benchmarks, since these CPUs are really great for C/C++/Rust programmers.
  • Nexus-7 - Monday, August 13, 2018 - link

    Cool write-up on the uncore power usage! I especially enjoyed that part of the article.
  • johnny_boy - Monday, August 13, 2018 - link

    The Phoronix articles are more telling for the sort of workloads a 64 thread count would be used for.

Log in

Don't have an account? Sign up now