An Early Christmas present from AMD: More Registers

In our coverage of the Opteron we focused primarily on the major architectural enhancements the K8 core enjoyed over the K7 (Athlon XP) - the on-die memory controller, improved branch predictor and more robust TLBs. For information on exactly what these improvements are for and why we'll direct you back to our Opteron coverage; the same information applies to the Athlon 64 as we are talking about the same fundamental core.

What we didn't spend much time talking about in our Opteron coverage was the benefit of additional registers, a benefit that is enabled in 64-bit mode. To understand why this is a benefit let's first discuss the role registers play in a microprocessor.

Although we think of main memory and cache as a CPU's storage areas, the often overlooked yet very important storage areas that we don't talk about are registers. Registers are individual storage locations that can hold numbers; these numbers can be values to add together, they can be memory addresses where the CPU can find the next piece of information it will need or they can be temporary storage for the outcome of one operation. For example, in the following equation:

A = 2 + 4

The number 2, the number 4 and the resulting number 6 will all be stored in registers, with each number taking up one register. These high speed storage locations are located very close to the processor's functional units (the ALUs, FPUs, etc…) and are fixed in size. In a 32-bit x86 processor like the Athlon XP or Pentium 4, the majority of registers will be 32 bits in width, meaning they can store a single 32-bit value. In 32-bit mode, the Athlon 64's general purpose registers are treated as being 32-bits wide, just like in its predecessor. However, in 64-bit mode all of the general purpose registers (GPRs) become 64-bits wide, and we gain twice as many GPRs. Why are more registers important and why haven't AMD or Intel added more registers in the past? Let's answer these two questions next.

Take the example of A = 2 + 4 from before; in a microprocessor with more than 3 registers, this operation could be carried out successfully without ever running out of registers. Internal to the microprocessor, the operation would be carried out something like this:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in Register 3

After the operation has been carried out, all three values are able to be used, so if we wanted to add 2 to the answer, the processor would simply add register 1 and register 3.

If the microprocessor only had 2 registers however, if we ever needed to use the values 2 or 4 again, they would have to be stored in main memory before being overwritten by the resulting value of A. Things would change in the following manner:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in a location in main memory

Here you can see that there is now an additional memory access that wasn't there before, and what we haven't even taken into account is that the location in main memory the CPU will store the result in will also have to be placed in a register so that the CPU knows where to tell the load/store unit to send the data. If we wanted to use that result for anything the CPU would have to first go to main memory to retrieve the result, evict a piece of data from one of the occupied registers and put it in main memory, and then store the result in a register. As you can see, the number of memory accesses increases tremendously; and the more memory accesses you have, the longer your CPU has to wait in order to get work done - thus you lose performance. Simple enough? Now here's where things get a little more complicated, why don't we just keep on adding more registers?

The beauty of the x86 Instruction Set Architecture (ISA) is that there are close to two decades of software that will run on even today's x86 microprocessors. One way this sort of backwards compatibility is maintained is by keeping the ISA the same from one microprocessor generation to the next; while this doesn't include things like functional units, cache sizes, or anything of that nature, it does include the number and names of registers. When a program is compiled to be run on an x86 CPU, the compiler knows that the architecture has 8 general purpose registers and when translating the programmer's code into machine code that the CPU can understand it references only those 8 general purpose registers. If Intel were to have 10 general purpose registers, anything that was compiled for an Intel CPU would not be able to run on an AMD CPU as the extra 2 general purpose registers would not be found on the AMD processor.

Microprocessor designers have gotten around this by introducing a technique known as register renaming, which makes only the allowed number of registers visible to software, however the hardware can rename other internal registers to juggle data around without going to main memory. Register renaming does fix a large percentage of the issues associated with register conflicts, where a CPU simply runs out of registers and must start swapping to main memory, however there are some cases where we simply need more registers.

When AMD introduced their AMD64 architecture, they had a unique opportunity at their hands. Because no other x86 processor would be able to run 64-bit code anyways, they decided to double the number of general purpose and SSE/SSE2 registers that were made available in 64-bit mode. Since AMD didn't have to worry about compatibility, doubling the register count in 64-bit mode wasn't really a problem, and the majority of the performance increases you will see for 64-bit applications on the desktop will be due to the additional registers.

What is important to note is that although AMD has increased the number of visible registers in 64-bit mode, the number of internal registers for renaming has not increased - most likely for cost/performance ratio constraints.

Index Where does 64-bit help?


View All Comments

  • Anonymous User - Wednesday, September 24, 2003 - link

    (to #87)

    xbit labs did a pretty decent review and used "performance" platforms for the CPUs but left out VIA chipsets which people are saying are faster than nVidia for the AMD64 tests. Their conclusion was unbiased pointing out pros and cons of each processor type - I especially like the closing statement of how the current manufacturing processes are getting tapped out and it will be up to the new 90 nanometer process to get increased performance.

    Tom's Hardware used the absolute best platforms under their optimal settings (i.e. the latest motherboards including both from nVidia and VIA for AMD, the latest optimized drivers, 4 x 256 DDR for Intel vs. 2 x 512) the way real enthusiasts would set their platforms up. Tom's conclusion tends to lean towards Intel with the P4 3.2 EE winning more tests than the Athlon FX - he did update his benchmarks and took out most of the overclocked P4 scores and I still count the P4 3.2 EE winning 26 benchmarks vs the Athlon FX winning 15.

    HardOCP used an Intel Bonanza motherboard which doesn't really allow the P4s to perform at their best IMO - lower memory timings cause the Intel motherboard to perform slower than the 875 boards from ASUS and Abit. Their conclusion was that the new AMD chips are pretty good but AMD is still in a tight spot.

    Extremetech also used the latest optimized platforms and also included nVidia and VIA chipsets. Their conclusion was pretty unbiased and left it to the reader to make their own choice.
  • Anonymous User - Wednesday, September 24, 2003 - link

    I still can't understand *why* people are hyping the emergency edition P4. You can't buy the chip and won't be able to for at least another two or more months. By the time the chip comes out it's already old news since the FX53 and Athlon64 3400+ will have already begun shipping with Prescott not too long after (well dunno about Prescott since those comments at IDF to the tune of a 3.2Ghz P4EE outperforming a 3.2Ghz Prescott don't seem good to me). Not to mention the fact that even against the current *available* processors it can't beat the FX51 in overall performance. What exactly is the good thing about this chip ATM? Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    He was talking about the manufacturers when talking about credibility. Look at PM Forum part 2 just for an example, every single one of them considers AMD still being a niche player even with AMD64 and Intel being the technology leader. Don't forget that these AMD64 processors were supposed to come out over a year ago, but they weren't able to deliver at all. If they had, this would be a much different market than it is now. You should know by now that what fanboys believe to be the trends in the marketplace is completely different from reality. Reply
  • Anonymous User - Wednesday, September 24, 2003 - link



    What on earth did Anand mean about AMD's loss of credibility? According to who? I mean, whatever you make of the P rating, who could possibly think that the XP wasn't a credible processor (even if the accuracy of the ratings exhibited slippage over time) or that the 64 wouldn't be a major improvement?

    What was he trying to say by droning on and on about lost credibility? I really have no idea. Who was going around saying AMD wasn't "credible"? And what does that mean?
  • Anonymous User - Wednesday, September 24, 2003 - link

    (this is #80)..

    I just read the X-bit labs review and it was a very long and in depth review. I think you guys should and read it and give me your impressions about it.


  • Anonymous User - Wednesday, September 24, 2003 - link

    This is the tiniest athlon64 test I've ever seen. Not only is it way too small to make a fair projection of amd's capacitys but you've not tested the P4 EE ... I think that this must be the worst review ever made by you guys... Reply
  • Anonymous User - Wednesday, September 24, 2003 - link


    (this is #80) There are fanboys on both sides of the fence here. I think I tried to be as close to unbiased as I could, though I still think Toms benchies always seem skewed in a bad way. And in all fairness, if he oc'd his P4ee to 3.6, why not OC the AMD chips? Seems he's playing favorites to me..

    The smart people always buy what is the best deal, regardless of the manufacturer (though almost everyone has a brand of something they won't buy again, regardless of product (clothing, food, etc..).

    I currently have a P4 2.0a OC'd to 2.4 (took a new mobo to get there..the soyo just didn't cut it), so my soon to be here 2500+ and Nforce2 mobo may surprise the intel fanboys, but then I am not a fanboy. I was going to go with a 2.4c and Abit IS7-G but there was about a $120 price difference so I opted for a more economical route..(and if I can get 3200+ performance out of a $94 CPU that won't hurt either).. :-)

    Anyway, I agree the fanboys on both sides need to read the facts and stop the bashing. It gets old fast and only shows ignorance.

    I think the P4ee and FX are just too much for the mainstream/general public. VERY few people (well that I know anyway) have $800 just to blow on a CPU, plus any additional hardware needed (mobo, ram, etc) on a whim, so to me it seems the test will be how the 64 compares to the prescot. I also think AMD needs to drop the 64 price, but maybe that's just me.

    And like it or not fanboys, there will almost always be tests where one CPU always wins over the other so take the tests with a grain of salt (this does apply to both sides of the fence).

    When tax time rolls around in march or so if the 64 is price competitive to the P5/prescot and performance competitive I just may go that route. I imagine by then the new P5/prescots (not the soon to be out socket 478 variety) will be out and needing a new mobo also, so a mobo purchase looks to be in my future anyway.

    But if the prescot blows away the 64 and is priced similiarly then that's where my money will go.

    But again, like I stated previously, we do not want a "one choice" situation. That just bodes poorly for us, the consumer. So we have to think that AMD will do well. If not, the future will be bleak for u (remember we are only this far along due to the XP/P3-P4 battle that raged well over 2 years (correct?)..

    Otherwise may might just now be getting the infamous P4 CPU....

    ..Just something to think about..



  • Anonymous User - Wednesday, September 24, 2003 - link

    It is very difficult to evaluate the performance or value of the Athlon 64 chips. It is not unlikely that most cpu demanding apps like video processing and 3d rendering will support 64 bit once more developers and users go over on Athlon 64. But untill then there is a draw between P4 3.2+ and FX-51. A stable motherboard is at least as important as the CPU. Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    Wow, this is the most fanboys I have ever seen in one place. How nerdy do you have to be to be protective of your precious CPU brand you stand behind?

    I just buy what's the best deal at the time, and both the new AMD and Intel CPUs are damn fast and close in performance. But the fact is the Athlon 64 is out now, the new Intel CPU isn't.

    Also you can look forward to future increased performance out of the athlon 64 to sweeten the pot. If I were to pick I'd definitely get the A64 over the P4EE.

    That being said, I'm buying a p4 3ghz next month, and I currently have an Athlon XP 2500+.

    You fanboys need to just stfu already, jesus.
  • Anonymous User - Wednesday, September 24, 2003 - link

    Enough of this pressithot shite! Yes i sound like a fanboy which i am but i'm not sounding like one year we will see "Athens" @ 0.9nm with a Dual-channel DDR-|| controller and of course an improved Hypertransport Bus..possibly HTB2 and i tell ya..not even tejas will keep up @ WIN 64-bit mode. Reply

Log in

Don't have an account? Sign up now