An Early Christmas present from AMD: More Registers

In our coverage of the Opteron we focused primarily on the major architectural enhancements the K8 core enjoyed over the K7 (Athlon XP) - the on-die memory controller, improved branch predictor and more robust TLBs. For information on exactly what these improvements are for and why we'll direct you back to our Opteron coverage; the same information applies to the Athlon 64 as we are talking about the same fundamental core.

What we didn't spend much time talking about in our Opteron coverage was the benefit of additional registers, a benefit that is enabled in 64-bit mode. To understand why this is a benefit let's first discuss the role registers play in a microprocessor.

Although we think of main memory and cache as a CPU's storage areas, the often overlooked yet very important storage areas that we don't talk about are registers. Registers are individual storage locations that can hold numbers; these numbers can be values to add together, they can be memory addresses where the CPU can find the next piece of information it will need or they can be temporary storage for the outcome of one operation. For example, in the following equation:

A = 2 + 4

The number 2, the number 4 and the resulting number 6 will all be stored in registers, with each number taking up one register. These high speed storage locations are located very close to the processor's functional units (the ALUs, FPUs, etc…) and are fixed in size. In a 32-bit x86 processor like the Athlon XP or Pentium 4, the majority of registers will be 32 bits in width, meaning they can store a single 32-bit value. In 32-bit mode, the Athlon 64's general purpose registers are treated as being 32-bits wide, just like in its predecessor. However, in 64-bit mode all of the general purpose registers (GPRs) become 64-bits wide, and we gain twice as many GPRs. Why are more registers important and why haven't AMD or Intel added more registers in the past? Let's answer these two questions next.

Take the example of A = 2 + 4 from before; in a microprocessor with more than 3 registers, this operation could be carried out successfully without ever running out of registers. Internal to the microprocessor, the operation would be carried out something like this:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in Register 3

After the operation has been carried out, all three values are able to be used, so if we wanted to add 2 to the answer, the processor would simply add register 1 and register 3.

If the microprocessor only had 2 registers however, if we ever needed to use the values 2 or 4 again, they would have to be stored in main memory before being overwritten by the resulting value of A. Things would change in the following manner:

Store "2" in Register 1
Store "4" in Register 2
Store Register 1 + Register 2 in a location in main memory

Here you can see that there is now an additional memory access that wasn't there before, and what we haven't even taken into account is that the location in main memory the CPU will store the result in will also have to be placed in a register so that the CPU knows where to tell the load/store unit to send the data. If we wanted to use that result for anything the CPU would have to first go to main memory to retrieve the result, evict a piece of data from one of the occupied registers and put it in main memory, and then store the result in a register. As you can see, the number of memory accesses increases tremendously; and the more memory accesses you have, the longer your CPU has to wait in order to get work done - thus you lose performance. Simple enough? Now here's where things get a little more complicated, why don't we just keep on adding more registers?

The beauty of the x86 Instruction Set Architecture (ISA) is that there are close to two decades of software that will run on even today's x86 microprocessors. One way this sort of backwards compatibility is maintained is by keeping the ISA the same from one microprocessor generation to the next; while this doesn't include things like functional units, cache sizes, or anything of that nature, it does include the number and names of registers. When a program is compiled to be run on an x86 CPU, the compiler knows that the architecture has 8 general purpose registers and when translating the programmer's code into machine code that the CPU can understand it references only those 8 general purpose registers. If Intel were to have 10 general purpose registers, anything that was compiled for an Intel CPU would not be able to run on an AMD CPU as the extra 2 general purpose registers would not be found on the AMD processor.

Microprocessor designers have gotten around this by introducing a technique known as register renaming, which makes only the allowed number of registers visible to software, however the hardware can rename other internal registers to juggle data around without going to main memory. Register renaming does fix a large percentage of the issues associated with register conflicts, where a CPU simply runs out of registers and must start swapping to main memory, however there are some cases where we simply need more registers.

When AMD introduced their AMD64 architecture, they had a unique opportunity at their hands. Because no other x86 processor would be able to run 64-bit code anyways, they decided to double the number of general purpose and SSE/SSE2 registers that were made available in 64-bit mode. Since AMD didn't have to worry about compatibility, doubling the register count in 64-bit mode wasn't really a problem, and the majority of the performance increases you will see for 64-bit applications on the desktop will be due to the additional registers.

What is important to note is that although AMD has increased the number of visible registers in 64-bit mode, the number of internal registers for renaming has not increased - most likely for cost/performance ratio constraints.

Index Where does 64-bit help?
POST A COMMENT

121 Comments

View All Comments

  • Anonymous User - Wednesday, September 24, 2003 - link

    Well I have read the Anand, Toms, and Tech Report reviews (about to read a few more)..

    Anand - I do have to agree that P4ee tests shouldn't have been left out, that just doesn't look fair to the readers and does appear to show some bias. Also most of the other sites seem to have the via chipset ahead of the nvidia.

    Toms review was very in depth but I have to question why he chose to include OC'd P4EE scores? I see no other reason other than to have the P4EEs at the top of every possible chart. I have always been a little leary of Toms CPU reviews. I have never read so many reviews (on one site) where AMD loses so often and by such a large margin practically every single test. I have read CPU reviews from several sites and on the other sites the XPs seem to fair quite a bit better.

    And doesn't it seem like he inlcudes OC'd P4 scores in almost every single CPU review? I could be wrong here..anyone?

    And in Toms own words several of the tests were intel optimized, so shouldn't the P4ee win regardless (as is usually the case with almost any appilication specific optimization)..

    Tech Reports review seemed like it was pretty good. They ran several test and both the P4ee and the 64/fx faired pretty well.

    I am looking foward to reading more reviews today.

    I DO have to agree on one thing here though : AMD in the past has always done well due to it's pricing structure. The low end has almost always been very affordable and very competitive, and that's where they aqquired most of their user base.

    To totally go against that makes sense in a financial way for AMD, but not for the customer.

    With the looming prescot on the horizon I am curious to see how things turn out.

    When tax time rolls around I will be upgrading my CPU and Motherboard (and ram "if" necessary), and I hope it isn't a one sided decision as far as who I go with..(heck, I just purchased an Nforce 2 motherboard (Soltek SL-75FRN2-RL) and retail Barton 2500+ for $184 shipped from newegg (will be here thursday), with hope of hitting 3200+ speeds (several seem to have had luck with it)..so it isn't like I am an intel zealot or anything).

    I just hope the 64 line scales well and can keep up with the prescot. If the prescot performs as well as the P4ee things will be difficult for AMD. I hope they do well as I am interested in the 64 and if it a good choice at tax time it will get my money.

    NO ONE (well in the general public) wants to pay $700-800 for a cpu, over $300 for MORE memory and $160-200 for a new motherboard when a $700-800 P4ee cpu performs almost as well (in some tests as it did lose some to the fx51)..if you have a socket 478 motherboard with the correct chipset that is.

    But even then going the P4ee route you can more than likely still use your current ddr ram (anyone looking to buy a $700-800 cpu more than likely has adequate memory) and a motherboard can be had for less than $100.

    I REALLY hope AMD does well, if for nothing more than the sole purpose of having more than one choice (we DO NOT want that people). Like I said I have interests in the new AMD cpus and tax time is about 6 months away, so it gives AMD time to get things rolling.

    But, if the prescot performs just as well if not better (totally up in the air and we have NO benchmarks or real specs (CPU speed, etc) of any worth and cost the same if not less, the battle will be a very hard one for AMD...and my money in't brand loyal (intel fanboys take note)..

    Isn't the soon to come prescot (not the initial launch version) supposed to be a new socket type, or am I on something? :-P

    I have probably forgotten something I wanted to say, but I'll post again if I do.

    Peace

    Kevin

    legionosh@msn.com
    Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    Hi Anand,

    Just wanted to say that I'm a bit dissappointed in your review. Not much mention of the hardware config, using nforce3 boards with a problem, and the conclusion based on comments like '(on pricing) which is a mistake for a company that has lost so much credibility'. Um, maybe in your eyes, but lets focus on the facts next time, rather than perceived credibility. I don't feel that AMD has lost credibility on the basis of benchmarks, in fact, they seem to be far more upstanding than their competitor in this regard. In any case, the A64 is shipping and beats its competitor in most benchmarks (based on results from just about everywhere except Tom's Hardware). As well, the A64 3200+ is about 50% cheaper than Intel's comparable offering, and I expect that AMD will continue to offer less expensive and better products than Intel as pricing changes - there is a long history of this situation. Your comments seem particularly off the mark when this example is considered.

    I don't purchase cpu's based on a company's credibility, I buy them based on stability, performance, and architecture (ie. how long is the platform going to be around), in that order of priority. I don't feel like Anandtech helped me make a decision with the tests run or conclusion drawn. I'm sorry to see such a worthy site as yours stumbling.

    Regards,

    Mark
    Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    The Intel/AMD fanbois don't have anything on the NVidia/ATi ones I can tell ya, but it's still true that all fanbois are dumbest...

    AMD's Athlon success has been built on having better bang for buck than Intel. If they cannot offer this advantage then it seems likely that they will suffer, regardless of what the enthusiast market does.

    Regardless of who has the faster chip what counts is that AMD are competitive - it's the only thing that is going to keep Intel honest on pricing.
    Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    The reason Intel chose 2MB L3 cache for the P4 EE was stated on several other review sites as "vertex buffers for many games reside neatly in 2MB of cache. Secondarily, a full frame of video at D1 resolution requires just a little more than 1MB of cache" so I'm wondering if Intel's next generation Pentium M with the 2MB L2 cache will be the next awesome gaming chip? Oh heck - just ask Intel to make the Prescott EE version with 2 MB of L2 cache and skip the L3. Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    If you have to pay for a new system, you might as well pay for the fastest http://www.go-l.com/miva/merchant.mv?Screen=PROD&a... This thing will probably hit a cool 4 GHz with the FSB cranked up to 250 MHz (x 16) - so the answer to the Athlon 64 3400+ and even FX 52/53 already exists. Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    "40-bit of physically addressable memory - or ~137GB" ???????

    40 bits gives 1024Gb NOT "~137Gb"...
    I thought that anand people can at least convert between binary and decimal systems...
    Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    #73 Pick a sentence and stick with it! Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    Agreed "BIASED" - furthermore it's more than suspicious they "forgot" to show us some P4EE-results is certain tests. It suggests P4EE was better than the whole AMD-branch, but AMD pays them more :) Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    The Athlon 64 3200+ ($417) is definitely the most interesting offering. With the exception of Ghost Recon and Enemy Territory, it outperforms its direct competitor, which is about 50% more expensive. Intel will lower the price next month from $637 to $417, but until then is the Athlon 64 a bargain for the enthusiast (and AMD will probably adapt prices too).

    The Athlon 64 FX-51 is indeed the fastest desktop processor right now as the Pentium 4 EE is not really available to the enthusiast. The large L3-cache of Pentium 4 EE gives it an advantage in applications like 3D Animation, but in games the Athlon 64 FX-51 is overall the fastest processor. However, the high price tag plus the fact that you have to buy buffered RAM makes the Athlon FX-51 less interesting from a price/performance perspective.

    We can't help it but geeks as we are we also like to look at the architecture. From an architectural point of view, the Athlon 64 shines: all the rough edges of the K7 architecture have been perfected, and the Athlon 64 architecture is - despite still being based on ancient x86 - a very balanced and elegant design. The rough K7 diamond has been cut and polished and shines brightly now, especially when you look at how well this CPU scales with higher frequencies. We will show you more in our next review.

    One thing that could justify the rather high system cost of an Athlon 64 FX based PC is the extra memory space and performance in Windows 64. Windows 64 is not ready yet, though. NVIDIA OpenGL Drivers, for example, do not seem to support hardware acceleration and few applications have been ported so far as the OS in a beta phase. The future of AMD64 is a bit murky: many companies want to support the Opteron and Athlon 64 as a 32 bit chip, but have "a wait and see attitude" when it comes to porting their applications to 64-bit.

    There are so many 64-bit roads that Intel may take, and therefore it is very hard to predict what future AMD64 has. Intel and HP are very committed to the Itanium, and the performance and industry acceptance of the Itanium are finally taking off. So we definitely can forget the scenario where Intel will ditch IA-64 for some form of x86-64, even though it is very likely that Prescott has some 64-bit functionality hidden away.

    The most likely scenario is that Intel will try to push the Itanium towards the gigantic dual processor market more quickly, at the expense of the Xeon. While Madison and McKinley were typically CPUs for scientific and large database applications (backend of 3-tier model), Deerfield is already destined to find a place in front end (application servers like webserver etc.) and blade market (HPC).

    When the Itanium family finally begins to replace the Xeon in both the workstation and server market, Intel can proceed with extending x86 to 64-bit as well and try to pull the plug out of AMD64. Because at that point, the Itanium will no longer be so vulnerable to poor ISV support. Introducing a form of Intel x86-64 in the coming moths would trample the Itanium sapling just at a time when it shows promise to grow faster.

    Essentially, AMD has a few years to gather enough support and marketshare. AMD will have to do better than ever before, but the first steps in right direction have been taken.

    For the moment, the future of AMD64 is no concern to the average user. The Athlon 64 (non-FX) line gives you excellent 32-bit performance for a decent price, and maybe even more importantly it is a much safer CPU. Replacing or inserting an AMD CPU is no longer a risky endeavour. Computer shops and enthusiasts, in particular, will appreciate this.
    Reply
  • Anonymous User - Wednesday, September 24, 2003 - link

    Wow... to tell you guys the truth, after all the hype, all the promises, and all the amd fans uttering "AMD 64" like they praying... I expected a bigger performance difference between the P4EE and the AMD64.. Like the AMD64 was supposed to crush it... but looks like the p4ee keeps up with it just fine. hmm.. so all im saying is it doesnt live up to the hype, but its fast, and it deserves props for that. Reply

Log in

Don't have an account? Sign up now