AMD's Shanghai ...
is really shaping up well. As I told you in my Cebit coverage, several people told us that they have already been testing it. Shanghai is an evolutionary improvement over AMD's Barcelona, and includes several IPC improvements and a 6 instead of 2 MB L3-cache. In 2009, AMD plans to improve performance with a better IOMMU. A new RAS feature will also be available called "L3 Cache index disable". We could not get more information about this RAS feature that sounds more like a performance crippling than a RAS feature...
According to IDC, AMD's overall market share in the server CPU space has not decreased in 2007 (about 13 procent). AMD's market share grew in the low budget 1 socket server market (from 9 to 14). It also increased slightly in the lucrative 4 socket market (from 37% to about 42%) but decreased significantly in the high volume 2 socket market (14 to 11%).
AMD's third generation opteron, now available in B3 revision will be launched in this quarter at 2.3 GHz, so slightly more conservative than the newly launched Phenom (2.4 GHz, 95W). A 125 W SE version (2360SE) at 2.5 GHz will be launched late this quarter. The low power version stays at 1.9 GHz, which is a bit disappointing...
Low Voltage Xeon
... As Intel launches the L5420, a low power Xeon at 2.5 GHz. This CPU consumes 50 W (TDP), less than 12.5W per core thus, and only 16W (4 W per core) when running idle. The CPU consumes as little power as the previous 65 nm L5335, but performs about 30% better in for example Povray, Sungard and Cinebench. Since Intel has introduced the 5100 chipset, AMD has lost the lower power consumption of DDR-2 too. It seems that AMD will lose performance/watt crown until Shanghai is up and running.  
Nehalem can access 3 memory channels, which can be run as independent or lockstep. Independent is of course the setting for best performance in almost all cases.
But the most interesting news is the new TLB architecture of Nehalem. You might remember that we wrote that the TLB architecture can really make a difference when you run a lot of virtual machines on top of your serverCPU. Below you see the number of entries in the TLB. Between brackets is the size of the page. Remember that currently all 32 bit OS make use of 4 KB size, but that most 64 bit OS (Linux and windows) can use 4 KB or 2MB page size. 2MB will become more and more popular (see for example the Specjbb2005 submissions) for memory intensive applications.
TLB Architecture  AMD Barcelona
Intel Penryn
Intel Nehalem
 L1- Instructions
 48  (4KB)
 48  (2MB)
128 (4KB)
    8  (2MB)
 L1- Data
 48  (4KB)
 48  (2MB)
 16 (4 KB)
 16 (2 MB)
 512 (4 KB)
 128 (2 MB)
Data + instruc.
 256 (4 KB)
   32 (2 MB)
Data only
 512 (4 KB)
  64? (2 MB)
Data + instruc.
It will be interesting to see what TLB architecture that AMD's 45 nm Opteron (Shanghai) will have. Remember that while the Penryn TLB's might be more than enough for running one machine, with EPT or NPT the TLB is split among a lot of virtual machines. VMWare and DELL for example report that on average no less than 8 virtual machines are run on top of their 2 socket servers. But 12 to 20 virtual machines per server are no exception. If the TLB is big enough, NPT (also called RVI by AMD) and EPT can offer up to 20% performance increase.
Comments Locked


View All Comments

  • mvrx - Friday, April 4, 2008 - link

    IBM is very far along developing optical interconnects for chips that are as fast or faster than todays copper interconnects... So picture this..

    Say your processor has 4 optical connectors that connect to optical switches... you buy pure cache modules that interconnect via that same optical system.. Adding another switch allows you to stack and stack and stack.. Think about a 8 core chip that can interconnect with up to 16 * 32MB L2 or L3 cache modules...

    Or you keep plugging more 8 core chips, and more 8,12,16,32MB cache modules into the clustered CPUs...

    IMO, this is one of the directions IBM is going to take the POWER line.. and hopefully even the CELL B.E.
  • mino - Monday, March 31, 2008 - link

    Please be so kind to test Your new modifications in Opera as the new movong add is not possible to close (no close button) and renders across the published conetnt in the middle of the screen.

    Also, login to post comments stopped working..

    Sorry to post here, dop not have account on forums.
  • Visual - Monday, March 31, 2008 - link

    seconded, i get a nasty floating thingy that i can not close, and i am with firefox+noscript ( scripts allowed, all the others blocked)
  • chiadog - Monday, March 31, 2008 - link

    Thanks for removing the Ad. Not only was it annoying and intrusive, it wasn't very relevant either. Why do we need to know about Shanghai the city?
  • thesix - Saturday, March 29, 2008 - link

    Barcelona is a city name, so is Shanghai (one of the best cities in China). What is 'Shangai'?
  • Nehemoth - Sunday, March 30, 2008 - link

    Well as Barcelona is an Spanish city, Shangai is how you write in Spanish Shanghai.

    :- )
  • JohanAnandtech - Sunday, March 30, 2008 - link

    Shangai was nothing more than a typo... Fixed
  • ThaHeretic - Thursday, March 27, 2008 - link

    "Remember that currently all 32 bit OS make use of 4 KB size, but that most 64 bit OS (Linux and windows) can use 4 KB or 2MB page size."

    That's actually quite false. 32-bit OSes have been able to use hugepages (>= 2MB page size) for many years. I'm relatively sure that ability was added along with PAE to the P6 core, ala PentiumPros back in the day. Linux itself has had hugepage support since the 2.0.x kernel series. I've used hugepages for Oracle databases in Oracle databases for at least 5-years now.

    Now granted you get more benefit from larger memory footprints which 64-bit CPUs give you access too, that doesn't mean you can't use hugepages in 32-bit environments. There's 64-bit requirement.

    What I'm really waiting on are x86 supporting page sizes > 2MB. Itanium does up to 256MB pages I believe, and Alphas long ago support 64MB pages. I'm deploying SunFire x4600s with 8GB of RAM per core, and hugepages is an ENORMOUS performance benefit.
  • wordsworm - Sunday, March 30, 2008 - link

    A key point that you're missing about the differences in 32 and 64 bit computing is that despite the fact that *some* OSes have used PAE, it does so with a performance hit, potential problems with drivers, etc. In the case where software such as what Adobe pumps out, the total number of pages that can be open cannot exceed 2GB. With 64 bit computing, you can have numerous pages open each of which consume a maximum of 2GB. So, you could easily have three pages open each consuming 2GB for a total of 6GB on your system. You couldn't do the same with 32 bit without using PAE.

    Furthermore, with PAE, no single application could use more than 4GB of RAM. AWE allowed applications to exceed 4GB, but not simultaneously.

    64 bit is the way to go if you're going past 3-4GB of RAM. It's a shame that MS created a 32 bit Vista.
  • BikeDude - Sunday, April 6, 2008 - link

    32-bit XP has PAE enabled by default by SP2.

    ...and practically, although you limit memory hungry processes to 4GB (well, half that is normally reserved by the kernel, so it is actually 2GB), your cache manager can still make use of the rest. Adobe advocates using systems with more memory than 4GB, since file I/O is improved (Photoshop enables caching for their swap-files in case the user has lots of memory).

    That said, I fully agree that 64-bit is where it is happening. We've pushed the 32-bit design to the limit and it is time for everyone to move on...

Log in

Don't have an account? Sign up now