Athlon MP Technology

As you will remember from our original story on the Athlon 4, there are a number of improvements that allow the Palomino core to maintain a somewhat noticeable performance advantage over its predecessor.  With the knowledge that the Athlon MP uses the same Palomino core as the Athlon 4, these same improvements are present in this processor as well.  The only thing the Athlon MP won't boast as a feature is PowerNow!, though it is supported by the processor. 

The Athlon MP is the first non-mobile AMD processor to bring a full implementation of Intel's Streaming SIMD Extensions (SSE) to the table.  This allows the Athlon MP to run code optimized for 3DNow! or SSE instruction sets although it doesn't necessarily mean that it can run SSE optimized code as fast as a Pentium III/Pentium 4.  AMD calls the Athlon MP's 3DNow! + SSE support their 3DNow! Professional technology.  AMD will eventually include full SSE2 compliance in their Hammer line of CPUs. 

The second improvement the Athlon MP offers over the Athlon is its improved data prefetch mechanism.  This feature allows the Athlon MP to automatically take advantage of otherwise unused FSB bandwidth for prefetching data that the processor thinks it may be requested to gather, before it is actually instructed to do so.  This increases the Athlon MP’s dependency on a high-speed FSB and memory bus as well, and it also accounts for the majority of the Athlon MP's performance advantage over the Athlon.  As we’ve noticed in the Pentium 4's performance characteristics, data prefetch can help in applications that require a great deal of bandwidth and have easily predictable memory accesses, such as video editing or more specific to this article, 3D rendering and database serving.  Data prefetch is actually quite useful in the case of the Athlon MP since its chipset platform offers a considerable amount of FSB bandwidth, which is more easily consumed with data prefetch enabled, but more on that later.

The third improvement offered by the Athlon MP is a set of three enhancements to the processor's Translation Look-aside Buffers (TLBs).  As taken from AMD’s tech docs on the Palomino core, the three TLB enhancements are:

1. The L1 Data TLB increases from 32 to 40 entries
2. Both the L2 Instruction TLB and L2 Data TLB use an exclusive architecture
3. TLB entries can be speculatively reloaded

As you will remember from our initial story on the Athlon 4 processor, the task of the TLB is to cache translated memory addresses.  This translation process is necessary for the CPU to gain access to the data stored in main memory, and by caching the translated addresses, it becomes much quicker to find data in main memory. 

The first improvement comes by increasing the number of entries in the L1 Data TLB.  This increase allows for a greater hit rate (probability of finding what the CPU needs in the TLB) in the L1 Data TLB.  You will also remember that the Pentium III has a L1 Data TLB with significantly more entries than even the new 40 entry TLB on the Athlon MP. 

The next Athlon MP TLB enhancement comes by moving the L2 TLBs to an exclusive architecture.  This means that data contained within the L1 TLBs is not duplicated in the L2 TLBs, which obviously saves space in the L2 TLBs meaning that they can be used to store even more translated addresses.  The downside to this exclusive architecture is that there is a latency sacrifice that is made since the addresses aren't duplicated in the L2 TLBs.

The final improvement is that the TLB entries can be speculatively reloaded.  This means that in the event that an address is not found in the TLB, the address can be loaded into the TLB before the instruction that requested the address is finished executing.  On older Athlon cores, this was not possible, resulting in a bit of a performance hit in this situation.  According to AMD, this situation is usually observed in "high-end software applications." 

In fact, AMD states that the TLB enhancements of the Athlon MP are most useful in these "high-end software applications."  Hopefully, we will see whether or not they are correct with our benchmarks, which are composed of a number of very high-end tests.

Memory: 1GB is barely enough The importance of Cache Coherency

Log in

Don't have an account? Sign up now