An Early Christmas present from AMD: More Registers

In our coverage of the Opteron we focused primarily on the major architectural enhancements the K8 core enjoyed over the K7 (Athlon XP) - the on-die memory controller, improved branch predictor and more robust TLBs. For information on exactly what these improvements are for and why we'll direct you back to our Opteron coverage; the same information applies to the Athlon 64 as we are talking about the same fundamental core.

What we didn't spend much time talking about in our Opteron coverage was the benefit of additional registers, a benefit that is enabled in 64-bit mode. To understand why this is a benefit let's first discuss the role registers play in a microprocessor.

Although we think of main memory and cache as a CPU's storage areas, the often overlooked yet very important storage areas that we don't talk about are registers. Registers are individual storage locations that can hold numbers; these numbers can be values to add together, they can be memory addresses where the CPU can find the next piece of information it will need or they can be temporary storage for the outcome of one operation. For example, in the following equation:

A = 2 + 4

The number 2, the number 4 and the resulting number 6 will all be stored in registers, with each number taking up one register. These high speed storage locations are located very close to the processor's functional units (the ALUs, FPUs, etc…) and are fixed in size. In a 32-bit x86 processor like the Athlon XP or Pentium 4, the majority of registers will be 32 bits in width, meaning they can store a single 32-bit value. In 32-bit mode, the Athlon 64's general purpose registers are treated as being 32-bits wide, just like in its predecessor. However, in 64-bit mode all of the general purpose registers (GPRs) become 64-bits wide, and we gain twice as many GPRs. Why are more registers important and why haven't AMD or Intel added more registers in the past? Let's answer these two questions next.

Take the example of A = 2 + 4 from before; in a microprocessor with more than 3 registers, this operation could be carried out successfully without ever running out of registers. Internal to the microprocessor, the operation would be carried out something like this:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in Register 3

After the operation has been carried out, all three values are able to be used, so if we wanted to add 2 to the answer, the processor would simply add register 1 and register 3.

If the microprocessor only had 2 registers however, if we ever needed to use the values 2 or 4 again, they would have to be stored in main memory before being overwritten by the resulting value of A. Things would change in the following manner:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in a location in main memory

Here you can see that there is now an additional memory access that wasn't there before, and what we haven't even taken into account is that the location in main memory the CPU will store the result in will also have to be placed in a register so that the CPU knows where to tell the load/store unit to send the data. If we wanted to use that result for anything the CPU would have to first go to main memory to retrieve the result, evict a piece of data from one of the occupied registers and put it in main memory, and then store the result in a register. As you can see, the number of memory accesses increases tremendously; and the more memory accesses you have, the longer your CPU has to wait in order to get work done - thus you lose performance. Simple enough? Now here's where things get a little more complicated, why don't we just keep on adding more registers?

The beauty of the x86 Instruction Set Architecture (ISA) is that there are close to two decades of software that will run on even today's x86 microprocessors. One way this sort of backwards compatibility is maintained is by keeping the ISA the same from one microprocessor generation to the next; while this doesn't include things like functional units, cache sizes, or anything of that nature, it does include the number and names of registers. When a program is compiled to be run on an x86 CPU, the compiler knows that the architecture has 8 general purpose registers and when translating the programmer's code into machine code that the CPU can understand it references only those 8 general purpose registers. If Intel were to have 10 general purpose registers, anything that was compiled for an Intel CPU would not be able to run on an AMD CPU as the extra 2 general purpose registers would not be found on the AMD processor.

Microprocessor designers have gotten around this by introducing a technique known as register renaming, which makes only the allowed number of registers visible to software, however the hardware can rename other internal registers to juggle data around without going to main memory. Register renaming does fix a large percentage of the issues associated with register conflicts, where a CPU simply runs out of registers and must start swapping to main memory, however there are some cases where we simply need more registers.

When AMD introduced their AMD64 architecture, they had a unique opportunity at their hands. Because no other x86 processor would be able to run 64-bit code anyways, they decided to double the number of general purpose and SSE/SSE2 registers that were made available in 64-bit mode. Since AMD didn't have to worry about compatibility, doubling the register count in 64-bit mode wasn't really a problem, and the majority of the performance increases you will see for 64-bit applications on the desktop will be due to the additional registers.

What is important to note is that although AMD has increased the number of visible registers in 64-bit mode, the number of internal registers for renaming has not increased - most likely for cost/performance ratio constraints.

Index Where does 64-bit help?
Comments Locked

122 Comments

View All Comments

  • Anonymous User - Monday, September 29, 2003 - link

    Where can I buy the P4 Extreme Edition ? Anand says it will be available in a month or two. Many websites say that it will be available in the Q1 ' 04.

    Anand , why the hell you included P4 Emergency Edition in your review , when it is not available for purchase ? Nor is it clear when will it be available for purchase ?

    Athlon64 3200+ and FX-51 ARE available for purchase. So it does not make sense to do a comparision with YET NOT Available P4 Emergency Edition.

  • Anonymous User - Monday, September 29, 2003 - link

  • Anonymous User - Monday, September 29, 2003 - link

    #102 - that's eight quarters of losses in a row and this quarter just about to end will make 9. It is projected that AMD will not be back into the black until at least Q4 2004 - but that was before they bought into that Flash business that is also losing money.
  • Anonymous User - Sunday, September 28, 2003 - link

    #102...I don't see AMD laughing all the way to the bank any time soon. Last time I checked they lost money for over a year and a half. Thats 6 quarters in a row of losing money! Translation: If they don't stop giving away tons of performance for nothing, they'll die (2500+ Bartons for $85?? Sheesh). If you think prices are high now, wait till AMD dies. Tell me what you think of prices then :)
  • Anonymous User - Sunday, September 28, 2003 - link

    It's almost comic to see people saying prescott is that good. It won't be cranking up the speed as the P4 did. It's already at 104watts of heat. There "PLANNED" fix is only to get it to 95 watts at launch. To crank this cpu up at all is going to require water cooling. Jeez, AthlonFX is only running around 60watts. Translation easy to jack up the clock whenever the process gets the kinks worked out. The biggest issue is ALREADY solved (heat). Intel has a long way to go (look at the roadmap, not many releases for Prescott in '04). AMD should have NO trouble keeping up and topping them in my book if desired. AMD has taken the lead and won't likely give it back for a while (save the stupid INTEL benchmarks by bapco and their other shill ZD). Check all review sites but anandtech and toms, take a trip to www.aceshardware.com for 15 games, 11 of which are victories for athlon, many by LARGE MARGINS, up to 43% faster in age of mythology! Two of the P4EE victories were 1% (margin of error that is). Look at how many are double digit victories for AthlonFX (again, this is against a chip that won't be available for 2 months at best).

    Lets also not forget that AMD is in the drivers seat now with 64bit. Intel is USING AMD's instruction set! Intel is now in AMD's old boat. You will never be FASTER than the guy that wrote the actual instructions, when you are trying to reverse engineer them, emulate them and get them to work. How is Intel going to cheat with 64bit now, when AMD has control of the instruction set? There won't be an SSE2 cheat in the next batch of cheater benchmarks (you can't leave out AMD's 64bit instructions as the pricks did with sse2 this time).

    I can't even believe that Anand did NOT turn on SSE/SSE2 in AthlonFX. Claiming he doesn't like to change benchmarks. Newsflash moron: Turning on the chips SSE does NOT change the benchmark. It merely recognizes the chip. Isn't this the FIRST thing you would do if you bought this chip? Go get a free download FROM microsoft that fixes their "ID INTEL ONLY" bug? Don't forget you SAID back in an old athlon article where you DID use this patch, that you would ONLY use it in the future if it was PUBLICLY released (and not a patch amd was just handing out to reviewers). Well, it's PUBLICLY released, and NOT by AMD. Microsoft officially supports this patch, which BTW doesn't change ONE BIT of scripting in the benchmark. It merely tells the benchmark, "hey, I've got SSE also, turn it on cheaters!". Look at the scores here:
    http://www.sudhian.com/showdocs.cfm?aid=434&pi...

    Patched scores 64.3, while unpatched scores 52. NOTE: this only turns on SSE in ONE app included in this benchmark. Windows Media Encoder. I've got one question for bapco. Why does media encoder count for so much in their silly benchmark? AMD gains 20% in this ENTIRE benchmark just by turning on SSE in WME? WTF? More cheating.

    One question for Anandtech: Why are you using such a cheating benchmark? What happened to CSA Research's benchmark that you loved until Intel told you stop using it? Don't forget INTEL hired the guy to write that benchmark. They just didn't like his results so fired him...LOL. When you tell the truth Intel gets pissed. If you're going to use this benchmark, do what ALL public users will do and turn on SSE. You said you'd use it if it was PUBLICLY available (this was in regards to last years version, but your statement should still hold true). AMD is no longer handing out patches to reviewers. MICROSOFT IS! Also silly in Anands benches is the use of lightwave after its already included in ZD's benchmarks. Where is the lightwave SUNSET benchmarks? Oh that's right, Intel didn't write that one, and they told you to use the other ones (held a meeting on sept8th just to get that straight, what you should use and should NOT use...Looks like anand was listening to Intel). Look at the studiomax benches elsewhere. Check Aceshardware again, for the STUDIO PC benchmark for 3dstudiomax. Daily use, by a company that USES the software, instead of a special written script from Intel. Hmm...AMD wins? Check out REAL TIME RAYRACING at aces. Hmm, amd wins? Mojo world, hmm...AMD wins? Looks like we better rethink this "workstation and rendering is better on Intel" crap eh? Check out Cinebench 2003's Cinema 4D benchmarks here:
    http://www.tech-report.com/reviews/2003q3/athlon64...

    Look at those shading scores...Hmm, AMD wins in some cinema 4D benchmarks? Well, I guess there's a DIFFERENT side to the coin for all of anands claimed P4 victories in RENDERING. You should stop making blanket statements when you run so few rendering benchmarks (and all Intel's favorites at that, favorite scripts that is, with favorite filters). Clearly it depends on WHAT you are doing, as to WHO is the fastest chip for X rendering job eh?

    Telling people to NOT buy AthlonFX is stupid too. What are you suggesting they buy then? A P4 that is a DEAD END SOCKET? Isn't this the same as AthlonFX51? Which AMD has said they will produce through end of 2004 (but that remains to be seen, I'd think they will make something else work in the boards first in 939 or something). It's the fastest chip on the block, but you shouldn't buy it. Cool, what should I buy right now?...ROFL.

    AMD has totally lost their credibility? WTF? When did that happen? I'm a reseller and I must have missed this I guess? They can't get it back? I wasn't aware they lost credibility in the first place. How much did Intel pay you to say this crap? Did AMD chips suddenly get incompatible, or all die at once last week and I didn't hear about it? Don't complain about price either. If a chip CLEARLY smokes the competition, why should the winning company give the chips away at rock bottom prices? More BS. AMD had to price low when compatibility was an issue, and speed was an issue in some cases. Neither of these are issues today. They win in the majority of benchmarks (especially gaming!), they should charge accordingly. Markets set prices, companies don't. If they fly off the shelves at the current price, then its a price the market can bare isn't it? IF they sit on the shelf, clearly AMD will lower them. But I fail to see why you should lower your prices until the market tells you it won't pay what you're asking for the chip. Never heard of supply and demand eh? I price everything I sell, at whatever the top dollar is that I feel I can get. And adjust as needed after that. No business does any different. No business purposely low-balls themselves until FORCED to by market conditions. If AMD wasn't here, Intel would be still charging us $1000 for a top end cpu, and you'd still be telling us to buy them if we want the best chip on the block. Why is AMD different? AMD clearly wins, sells the chip at a relative price, (P4EE will be priced slightly HIGHER mind you and still loses) and you complain and say don't buy it? What's the excuse for the double standard? You've slowly joined tomshardware on the Intel side (I can't even believe tom's said ID Software makes Unreal Tourney and it isn't optimized for AthlonFX"...LOL. Uh, Epic makes it fool, and its was the first game announced FOR athlon64...ROFL. Why are you still using Quake3 (which toms still loves too...LOL)? This benchnmark is 3+yrs old! No CURRENT Q3 based benchmark shows a 50+fps lead for Intel. Nobody I know even plays Q3 today. Why use it? Where are the wolfenstein, or jedi2 benches (much more relevant if you're going to use a Q3 based game at all)? Why would you use a 3yr old benchmark when there are clearly more relevant benchmarks USING THE SAME ENGINE available to show todays performance? Was this just to slight AMD, because of their CLEAR victory in Unreal 2003, when showing botmatch? Which is where all games will lean AMD's way. If you are going online, prepare to have a slower gaming experience when a lot of people get in the game, no matter what game you're playing. TRUE CPU power kicks in then, and there are no tricks to pull to save you. Q3 just likes bandwidth, which is not an issue when online. CPU power is an issue online. When in a single player game the cpu is not your problem (at most you are calculating for 3 or 4 enemies on your screen and nothing happening elsewhere in the game). Online it IS your problem, as you calculate for everyone everywhere. Hence the stomping Intel receives in botmatch. This is also the case in strategy games and sims (racing etc), where calc's are made for TONS of units or a lot of AI. Same story, Intel gets killed. Hence the story with Age of mythology, Grand Prix4, Battlefield 1492, wolf enemy territory (hey isn't this a Q3 based game?..AMD kicks Intel A$$ in it! Oops, thats right, we should only be benching the 3yr old versions..Nobody plays WolF enemy territory, battlefield etc...Who plays new games? We all want those old benchmarks..LOL) and Medieval at aceshardware. The only one I can't figure out is Civ3. Hmm. With all that though, remember, you shouldn't buy this KILLER CPU...ROFL. Bring on the Prescott chip (at 104watts I need a heater this winter in oregon...hehe).

    I digress..
  • Anonymous User - Friday, September 26, 2003 - link

    I'm also no fanboy (isnt it sad that on forums you have to start making your point by making sure everyone understands that you are not a fanboy?), but to be honest I don't think that Intel was ever truly leading in terms of CPUs. They did have the "performance crown" for the past few months, but AMD is still much better value for money. I know for a fact that I'd prefer to have a CPU thats 5% slower if it means I save myself enough money to have a twice as fast graphics card :)

    Besdides, the times before the 2800+ were just taking the piss, as AMD CPUs were botrh cheaper AND faster :)
  • Anonymous User - Friday, September 26, 2003 - link

    Dudes, u gotta be kidding me.. isnt it obvious that Intel is winning and that AMD is just catching up. Im no Intel or AMD fanboy, i like good competition cause it makes better/cheaper products for me...=-).. AMD has just made a proc that can keep up with the P4 and beat it in some applications but Intel is already planning past the p4 and AMD has just caught up.. This is AMD's best VS. Intels old.. I think AMD will be behind a few months once again.
  • Anonymous User - Friday, September 26, 2003 - link

    The German TomsHardWare site has just revised its Athlon 64 review as they found they wer ebeing fooled by Intel as a marketing trick... I'm sure Tomshardware.com will be updated soon with the same info.

    Basically the P4 EE isnt available until christmas or springtime which essentially makes any comparrison with the Athlon 64 or FX completely useless. The Athlon is available NOW, Intels chip is still 60-90 days away...
  • Anonymous User - Friday, September 26, 2003 - link

  • Anonymous User - Thursday, September 25, 2003 - link

    AMDs chip seems to offer an incremental improvement in performance (like Intels offering). All this hype for products which offer moderate to good improvements in speed? Pretty anti-climactic. Let's have some real innovation here...something that actually increases the usefullness of a computer, not just ticking benchmarks up to the next level and claiming it's a revolution. Users with no sense of the value of money eat the marketing up and blow hundreds of dollars on the latest cpu while these companies laugh all the way to the bank.

Log in

Don't have an account? Sign up now