Updates to Skylake Discrete Graphics Performance: PCIe Optimizations Incomingby Ian Cutress on September 8, 2015 5:00 AM EST
In our initial review of the two 6th Generation Intel Skylake-K processors launched on August 5th, the i7-6700K and the i5-6600K, our comparative analysis to the previous generations of Intel processors was for the most part, positive. On the whole, clock-for-clock performance was a marginal increase over previous generations but the cumulative end-to-end effort of several generations of upgrades, plus for those that overclock, gave a substantial reason for those in CPU limited workloads to find an upgrade (along with benefits on the chipset and DRAM side as well). However, one element of the equation was puzzling at the time – the performance of games using discrete graphics cards was marginally lower with the new platform compared to older platforms when looking at average frame rates.
Discrete Graphics Performance: Before
During our testing, it is not uncommon to see two platforms that perform similarly to have a reasonable margin of error, often ±1%, due to variations in pre-initialised cache structures, or in the case of games like GRID that rely on a random sequence to provide the end-result numbers. Despite this, we noticed that for Skylake-K we saw consistent drop in our discrete GPU testing, often around the -1% to -3% mark but sometimes as low as -5% or -7% when we compared it to both Intel’s 5th Generation (Broadwell) and 4th Generation (Haswell). Other websites such as The Tech Report also noted these results, placing Broadwell’s numbers at the top of the stack (if only marginal). Some commentary at the time focused on Broadwell’s use of eDRAM in the desktop components which can aid performance while retaining a frequency deficit, although given our analysis of the eDRAM in Broadwell as a victim cache rather than a transparent DRAM cache it seems less likely that this is the case, plus we also now have new information coming post launch about this issue. But if we remove Broadwell as a special case, it was still concerning that the i7-6700K lagged behind the i7-4770K despite being higher in frequency and clock-for-clock performance.
Before it came time to publish our Skylake review, we performed our initial analysis and ended up with our results. Whenever the results are worse than expected, we typically discuss with the manufacturer regarding any anomalies and if they can account for them (or something doesn’t seem to be configured properly). So we passed on our data to Intel as well as ASUS due to our setup at the time, and did not hear anything back for a number of weeks except the odd whisper of ‘we are looking in to it’. Then, in our meeting with Intel at the Intel Developer Forum in mid-August, an Intel processor engineer said that they were still working on it internally, but from their testing it seems that one of the registers controlling an internal frequency was not being set properly during start-up – as in not being set to Intel’s recommended value.
Another couple of weeks later, we were contacted by ASUS who shed a lot more light on the issue. The register in question is called the FCLK (or ‘f-clock’), which controls some of the cross-frequency compensation mechanisms between the ring interconnect of the CPU, the System Agent, and the PEG (PCI Express Graphics). Basically this means it is to do with data from the processor to the GPUs. So when data is handed from one end to another, this element of the processor manages the data buffers to allow that cross boundary migration in a lossless way. This is a ratio frequency setting which is tied directly to the base frequency of the processor (the BCLK, typically 100 MHz), and can be set at 4x, 8x or 10x for 400 MHz, 800 MHz or 1000 MHz respectively.
The default value of the FCLK is at 800 MHz for both mobile and desktop Skylake processors, and it is this value that all the motherboard manufacturers have validated their systems on – such as overclocking and margins due to external environmental factors. However, the Intel recommended value for desktops, as dictated in their ‘tuning guide’ for motherboard manufacturers was 1000 MHz, or the 10x ratio setting. The recommended value for laptops is still the 8x ratio setting.
So going back to Skylake-K launch on the 5th of August – it is our understanding that Intel moved the launch of these processors from IDF (mid-August) to Gamescom to coincide with their push towards a gaming focused platform. So despite the fact that between Gamescom and IDF the only people who really had these processors were other media and a few system integrators selling pre-built systems, everything had to be ready to go at that time. But at this time, the 10x ratio setting in Intel’s microcode (MRC) was not functioning as expected when motherboard manufacturers tried to initialise it during start-up. As a result, the ‘default’ value was used universally.
Discrete Graphics Performance: After
Fast forward to mid-August, and firmware update 1168 from Intel now allows motherboard manufacturers to implement the 10x setting for FCLK at POST. This means that the motherboard manufacturers now have to implement that firmware into their BIOS packages and request that all owners upgrade in order to benefit from this change.
From what we are being told by ASUS, they will have it enabled by default (at stock) on version 0801 on the Z170-A, with 090x versions of the BIOS providing a manual option inside the Tweakers’ Paradise sub-menu. ASRock by comparison, on the 1.70 BIOS for the Extreme7+, has an option to adjust the FCLK in the CPU configuration menu, but sets 800 MHz as default and requires adjusting to 1 GHz to make the change. For motherboard manufacturers, this new change (if they want to implement it by default) requires a complete verification process to make sure everything else in the system works, all PCIe cards are properly validated and added to their QVLs, and also overclocking margins are still as advertised. That being said though, we have been told to be wary of exact benefits from the new firmware, as some internal testing has shown not that big a jump in most instances.
On the overclocking side, if a user leaves the FCLK setting at auto but initiates as base frequency overclock (from 100 MHz to 120 MHz), then the start-up sequence on the motherboard should be able to take this into account and move from the 10x ratio to the 8x ratio, giving 8 * 120 = 960 MHz, making it closer to the 1000 MHz value. This can be overridden of course, but our sources say that FCLK can be adjusted to around 1400 MHz before it starts to fail, meaning that this ‘test at start-up’ procedure has to take the BCLK into account.
Interestingly enough, this register that adjusts the FCLK ratio can be probed and changed at run-time as well, in the middle of the operating system. That lends itself to some interesting dilemmas if software detects its presence and tries to manually adjust it when a system is BCLK overclocked. We might see some software adjust this automatically (look out for general performance increase claims on Skylake only), so our hope here is that the software is also able to probe the BCLK and find the most appropriate ratio to avoid instability. Obviously this matters more for those on motherboards that still run at 8x or who have manually set their own ratio.
But does it make much of a difference? We ran our GPU suite with the processor at stock, the new BIOS, and it set explicitly to 1 GHz. The quick answer to the question is yes – it makes a subtle difference.
*For full disclosure, our initial review contained erroneous results on the GTX 770 and Shadow of Mordor due to an unknown reason. For our retest, this benchmark at 1080p Ultra and 4K Ultra was re-run at 800 MHz FCLK and our numbers in Bench are updated. The change in this benchmark result does not affect our conclusion in the initial review or this secondary set of testing.
The overall results showed an increase in frame rates by 1.3% from both the Haswell (i7-4770K) and Broadwell (i7-5775C) processors. The key takeaway is that in almost every scenario where performance was worse (except the GTX 980 against the i7-5775C), the new FCLK setting makes a change. Across the board, moving from an 800 MHz value on FCLK to 1 GHz gave an increase no matter what the discrete graphics test. Previously where our individual benchmarks ranged from zero change through -1%, -2% down to -7%, most results move up slightly, usually under 1%, but some get a boost as high as 5%.
So this raises a number of questions.
Q: Firstly, does this mean our initial review results are now invalid?
A: It depends what you mean by testing the stock processor. Is it how it performs out of the box, or is it how it performs to Intel’s ‘recommended’ tuning profile. As mentioned above, it seems like motherboard manufacturers will act differently on the FCLK issue, whereas some might enable it by default and others require the user to implement the change. The fact that the 800 MHz is main setting for mobile platforms (Skylake-Y and Skylake-U, perhaps even Skylake-H) means little on the desktop, but we might be in a period of transition for motherboards as the cycle progresses. At this time, our review as posted should still be the performance out of the box, but in time everyone should migrate to the 1 GHz setting for desktop. I will add the data into Bench to act as a comparison between the two, and in time retire the older set of data.
Q: Is Broadwell still the ‘preferred’ processor by a number of journalists in terms of performance in gaming due to the eDRAM, given that this minor change produces a different result?
A: From AnandTech’s perspective, this does not change much on our side – the Skylake platform still offers access to new features such as DDR4, the Z170 chipset with increased PCIe storage, USB 3.1 controllers on most boards >$150 and a cumulative generational increase in performance over the last few years. Going back to our architecture deep-dive, it also affords better performance in certain benchmarks such as Hybrid x265 than Broadwell due to its ability to keep more load/store operations in flight. There will be benchmarks that enjoy the eDRAM, such as WinRAR and a couple of our Linux-Bench server tests, and the fact that the Broadwell with eDRAM competes with a slower frequency is an interesting exercise in cache implications for performance. But for discrete gaming it is pretty much par for the course. Arguably, the i7-6700K should be easier to get hold of over time compared to the i7-5775C as well.
Q: Does anything change in the CPU benchmarks/performance?
A: As long as it doesn't touch the PCIe bus/routing, then there is no difference to the operation.
Q: I am about to invest in a Skylake desktop/I have a Skylake desktop. What should I do?
A: If your system is working fine, it might be best to leave it as is for now, at least until all the motherboard manufacturers have had a chance to go through a number of updates and tweak the setting to their satisfaction. If you need to be on the bleeding edge and feel like updating your BIOS, do so and explicitly adjust the FCLK to 1 GHz, rather than leaving it on automatic. However, remember that this value is tied to the BCLK (base frequency), and treat this setting like an overclock. This means running a usual range of stability benchmarks. We tried the setting on a couple of motherboards, and it is still a little rough (crashes in a benchmark or two from time to time), which I imagine is down to the manufacturers needing to fine tune this setting (either internal voltages, or skew balancing). At this point in time we are under the impression that every Skylake-S processor should be able to run at this higher ratio when at stock speeds. As always, in overclocked environments, your mileage may vary.
Q: Will this affect the Skylake Xeons?
A: At this point, we do not know. However, given how early we are in the Skylake launch cycle, and that Intel has stated a Q4/Q1 release for the E3 v5 Xeons, we would expect it to be a non-issue when they are launched.