Building a Big CPU

It turns out that how "big" you can make a microprocessor (how wide - how many execution units, and how deep - how many stages in the pipeline) is first limited by the accuracy of your branch predictor. The branch predictor single handedly determines the number of instructions you can have "in flight" (active in the pipeline) before the CPU will incorrectly predict a branch, causing a stall in the pipeline. In theory, your CPU should be no bigger than it needs to be, thus it should be able to accommodate no more than the number of instructions that can be sent down the pipe before a mispredict will occur.

With the K8, AMD improved on the branch prediction unit of the K7. The global history counter has been increased tremendously, and is now four times the size of the global history counter in the K7. The global history counter is actually a massive array of 2-bit counters (2-bit counters count from 0 - 3) that actually determine whether or not a particular branch should be taken. When a branch instruction is reached, the branch prediction unit takes a part of the address of the instruction (and sometimes performs some logical operations on the address) and uses that as an index into the global history counter - this determines which counter to look at and increment. The counter at that index is examined, and if its value is greater than or equal to 2, the branch is predicted as "taken" and the branch is taken. If the branch was incorrectly predicted, then the value of the counter is decremented by one, but if it was correctly predicted then the value of the counter is incremented by one (remember that the counter does have set limits, it can't be decremented past 0 or incremented past 3 as it is only a 2-bit counter).

The problem with this approach is that if you have two branch instructions that are given the same index value, the counter associated with that index value will be arbitrarily incremented or decremented; this is known as interference. The larger your global history counter is (the more 2-bit counters it is composed of), the less likely interference will occur. And finally, the less interference you have, the more accurate your branch predictor will be.

There are some other improvements to the K8's branch predictor, but all of them result in an overall more accurate prediction of branches. Going back to our original statement that the accuracy of a microprocessor's branch predictor determines how big we should build our CPU, by outfitting the K8 with a more accurate branch predictor, AMD enabled the K8 to be a bigger CPU than the K7; but bigger how?

Designing a CPU, More Specifically the K8 Go Deep
Comments Locked

1 Comments

View All Comments

Log in

Don't have an account? Sign up now