Original Link: http://www.anandtech.com/show/3377



AMD's Shanghai ...
is really shaping up well. As I told you in my Cebit coverage, several people told us that they have already been testing it. Shanghai is an evolutionary improvement over AMD's Barcelona, and includes several IPC improvements and a 6 instead of 2 MB L3-cache. In 2009, AMD plans to improve performance with a better IOMMU. A new RAS feature will also be available called "L3 Cache index disable". We could not get more information about this RAS feature that sounds more like a performance crippling than a RAS feature...
 
According to IDC, AMD's overall market share in the server CPU space has not decreased in 2007 (about 13 procent). AMD's market share grew in the low budget 1 socket server market (from 9 to 14). It also increased slightly in the lucrative 4 socket market (from 37% to about 42%) but decreased significantly in the high volume 2 socket market (14 to 11%).
 
AMD's third generation opteron, now available in B3 revision will be launched in this quarter at 2.3 GHz, so slightly more conservative than the newly launched Phenom (2.4 GHz, 95W). A 125 W SE version (2360SE) at 2.5 GHz will be launched late this quarter. The low power version stays at 1.9 GHz, which is a bit disappointing...
 
Low Voltage Xeon
... As Intel launches the L5420, a low power Xeon at 2.5 GHz. This CPU consumes 50 W (TDP), less than 12.5W per core thus, and only 16W (4 W per core) when running idle. The CPU consumes as little power as the previous 65 nm L5335, but performs about 30% better in for example Povray, Sungard and Cinebench. Since Intel has introduced the 5100 chipset, AMD has lost the lower power consumption of DDR-2 too. It seems that AMD will lose performance/watt crown until Shanghai is up and running.  
 
Nehalem 
Nehalem can access 3 memory channels, which can be run as independent or lockstep. Independent is of course the setting for best performance in almost all cases.
 
But the most interesting news is the new TLB architecture of Nehalem. You might remember that we wrote that the TLB architecture can really make a difference when you run a lot of virtual machines on top of your serverCPU. Below you see the number of entries in the TLB. Between brackets is the size of the page. Remember that currently all 32 bit OS make use of 4 KB size, but that most 64 bit OS (Linux and windows) can use 4 KB or 2MB page size. 2MB will become more and more popular (see for example the Specjbb2005 submissions) for memory intensive applications.
 
TLB Architecture  AMD Barcelona
Intel Penryn
Intel Nehalem
 L1- Instructions
 48  (4KB)
 48  (2MB)
128 (4KB)
    8  (2MB)
 ?
 L1- Data
 
 48  (4KB)
 48  (2MB)
 16 (4 KB)
 16 (2 MB)
 ?
 L2
 512 (4 KB)
 128 (2 MB)
Data + instruc.
 256 (4 KB)
   32 (2 MB)
Data only
 
 512 (4 KB)
  64? (2 MB)
Data + instruc.
 
It will be interesting to see what TLB architecture that AMD's 45 nm Opteron (Shanghai) will have. Remember that while the Penryn TLB's might be more than enough for running one machine, with EPT or NPT the TLB is split among a lot of virtual machines. VMWare and DELL for example report that on average no less than 8 virtual machines are run on top of their 2 socket servers. But 12 to 20 virtual machines per server are no exception. If the TLB is big enough, NPT (also called RVI by AMD) and EPT can offer up to 20% performance increase.

Log in

Don't have an account? Sign up now