Core Overclocking

After G80 hit (the first NVIDIA GPU to employ a separate clock domain for shaders), silent shader clock speed increases were made with any core clock speed increase. At first this made sense because NVIDIA only exposed the ability to adjust core and memory clock and the shader clock was not directly adjustable by the end user. Of course, we went to some trouble back then to try our hand at BIOS flashing for shader overclocking. After NVIDIA finally exposed a separate shader adjustment, they still tied core clock and shader clock to some degree.

Since the middle of last year, NVIDIA's driver based clock speed adjustments have been "unlinked," meaning that the shader clock is not affected by the core clock as it used to be. This certainly makes things a lot easier for us, and we'll start by testing out core clock speed adjustment.

The maximum core clock we could hit on our reference GTX 275 was 702. Try as we might, we just could not get it stable beyond that speed. But it's still a good enough overclock for us to get a good handle on scaling. We know some people have GTX 275 parts that will get up toward 750 MHz, so it is possible to get more speed out of this. Still, we have an 11% increase in core clock speed which should net us some decent results.




1680x1050    1920x1200    2560x1600


Call of Duty edges up toward the theoretical maximum but drops off up at 2560x1600 which is much more resource intensive. Interestingly, most of the other games see more benefit at the highest resolution we test hitting over 5% there but generally between 2 and 5 percent at lower resolutions. FarCry 2 and Fallout 3 seem not to gain as much benefit from core overclocking as our other tests.

It could be that the fact we aren't seeing numbers closer to theoretical maximums because there is a bottleneck either in memory or in the shader hardware. This makes analysis a little more complex than with the AMD part, as there are more interrelated factors. Some aspects of a game could be accelerated, but if a significant amount of work is going on elsewhere, we'll still be waiting on one of the other subsystems.

Let's move on to the last independent aspect of the chip and then bring it all together.

Memory Overclocking Shader Overclocking
Comments Locked

43 Comments

View All Comments

  • Hrel - Friday, June 5, 2009 - link

    Wow, I guess the guys who programmed WAW and Race Driver did a REALLY crappy job at resource allocation; 30 percent compared to about 8 percent from Left 4 Dead; pretty terrible programming.
  • MonsterSound - Friday, June 5, 2009 - link

    I too like the 'change-in-place' resolution graphs, but have to agree that they would be better if the scale was consistent.

    As far as the 702mhz OC on your 275, that seems like a weak attempt. The retail evga 275 ftw model for example has been binned as an overclocker and stock speed is 713mhz. My MSI 275 FrozrOC is running at 735mhz right now. I can't seem to find mention of which models of the 275 you were testing with, but obviously not the fastest.
    respectfully,...
  • Anonymous Freak - Thursday, June 4, 2009 - link

    While I love the 'change-in-place' resolution graphs, they really need to be consistent. Leave games in the same location vertically; and keep the same scale horizontally. That way I can tell at an instant glance what the difference is. I don't like having the range switch from 0-15 to 0-7 to 0-10, plus changing the order of the games, when I click the different resolutions!

    After all, the only difference that matters on the graphs is the one the individual bars represent. So why go changing the other aspects? Yes, it's "pretty" to have the longest bar the same length, and to always have the graph sorted longest-on-top; but it makes the graph less readable.

    For the few graphs that have a bunch of values clustered near each other, plus one or two outliers, just have the outliers run off the edge. For example, in most of your one-variable graphs, a range of 0-10% would be sufficient. Just make sure that for a given resolution set, the range is the same.
  • yacoub - Thursday, June 4, 2009 - link

    This article completely kicks butt! It includes everything I'd want to see in charts, including both % gains and the actual FPS numbers versus other cards, and all with the three most important resolutions.

    Very, very good article. Please keep up this level of quality - the data and the depth really answer all the major questions readers and enthusiasts would have.
  • chizow - Thursday, June 4, 2009 - link

    Nice job Derek, I've been lobbying for a comparison like this since G80 but nice to see a thorough comparison of the different clock domains and impact on performance.

    As I suggested in some of your multi-GPU round-up articles, it'd be nice to see similar using CPU clockspeed scaling with a few different types of CPU, say a single i7, a C2Q 9650 and a PII 955 for example, then test with a fast single GPU and observe performance difference at different clockspeeds.

    It'd also be interesting to see some comparisons between different GPUs, say 260 to 275 to 280/285 at the same clockspeeds to measure the impact of actual physical differences between the GPU versions.
  • spunlex - Thursday, June 4, 2009 - link

    It looks like a stock GTX 275 beats the 280 in almost every benchmark even at stock speed. Does anyone have any explanation as to why this is happening??

    I guess GTX 280 sales will be dropping quiet a bit now
  • PrinceGaz - Thursday, June 4, 2009 - link

    This whole idea of the three seperate overclocks (core, shader, memory) being able to simultaneously provide almost their full percentage increase to any single result cannot possibly be right.

    Imagine you take the situation where a card is overclocked by 10% throughout (instead of 11%, 14%, 18% like you did). Core up 10%. Shaders up 10%. Memory up 10%. Going from your numbers, that would probably have given you about a 20% performance increase in two of the games! Do you really expect us to believe a graphics-card running 10% faster, can give a 20% performance boost to the overall framerate?

    How does magically making Core and Shader seperate overclocks allow them to work together to nearly double their effect. If it worked that way, you could split the card up into twenty seperate individually overclockable parts, overclock them all by 10%, and end up with something giving over 3x the performance-- all from a 10% overclock :p

    Something else must be happening in addition to what you are doing, and my first priority would be to check the actual speeds the card is running at using a third-party utility which reports not the speed the clocks have been set to, but the actual speed the hardware is running at (I believe RivaTuner does that in real-time in its hardware-monitor charts).
  • DerekWilson - Thursday, June 4, 2009 - link

    I used rivatuner to check the clock speeds. i made very sure things were running at exactly the speeds I specified. At some clocks, the hardware would sort of "round" to the next available clock speed, but the clocks I chose all actually reflect what is going on in hardware.

    I do see what you are saying, but it doesn't work either the way you think it should or the way that you claim my logic would lead it be. Extrapolating the math I used (which I believe I made clear was not a useful judge of what to expect, but an extreme upper bound that is not achievable) is one thing, but that isn't what is actually "happening" and I don't believe I stated that it was.

    Like I said, it is impossible for the hardware to achieve the full theoretical benefit from each of its overclocked subsystems as this would imply that performance was fully limited by each subsystem, which it just not possible.

    If I was confusing on that point then I do apologize.

    Here's what I know, though: 1) the reported clock speeds are the clock speeds the hardware was actually running at and 2) the performance numbers are definitely correct.

    I fully realize I didn't do a good job of explaining why the two above points are both true ... mostly because I have no idea why.

    I tried to paint the picture that what actually happened was not impossible, while (I thought) making it clear that I don't actually know what causes the observed effect.
  • Kibbles - Thursday, June 4, 2009 - link

    Great article. I especially liked the 3 linked graphs. One question though. I've been wondering how much power the lastest graphics cards use when you underclock them to the lowest possible while idling, or does the hardware do it automatically? For example, I have my 2D mode on my 8800gtx set to only 200mhz core/shader/memory using nibitor. Or would it matter?
  • DerekWilson - Thursday, June 4, 2009 - link

    All the current gen cards do significantly underclock and undervolt themselves in 2D mode. They also turn off parts of the chip not in use.

    I believe you can set the clocks lower, but the big deal is the voltage as power is proportional to frequency but proportional to the square of voltage. I don't /think/ it would make that much difference in 2D mode, but then it's been years since I tried doing something like that.

Log in

Don't have an account? Sign up now