Conclusion

The aggressive pricing puts the expensive quad socket systems with the Xeon MP and Opteron 8xx(x) under fire. Some customers will still prefer the slightly better RAS features of the latter, but let's be honest: a large part of the market will be quite happy with the more than decent RAS features of the dual socket Intel platform. The S5000PSL for example supports memory sparing and mirroring aside from the obligatory ECC RAM.

The introduction of the new Xeon quad core should still have a big impact, and it is only the beginning. In Q2 2005, we saw the introduction of the Opteron 2005, and less than two years later the number of cores on one socket has doubled again. The increase in multi-core power is outpacing the natural growing demands of software. The introduction of the new "Barcelona" quad core and Intel's "Tigerton" will make the current high-end systems (8-32 socket) retreat to an ever shrinking market niche.


The Dual core Xeon MP "Tulsa" looks pretty "fat" compared to the quad core Xeon E5345

To the financial analysts, CRM, ERP and Java server people, the new quad core Xeon E53xx is close to irresistible. You can get four cores for the price of two, or up to eight (!) cores in a relatively cheap dual socket server. We observed at least a 40% performance increase compared to probably the best dual core CPU of today: the Xeon 5160.

For the people looking for a 3D rendering workstation, your usage model will determine whether the Xeon 5160 or the Xeon E5345 is the best solution. You get better animation and 3D manipulation performance (mostly single threaded) and better rendering performance at resolutions lower than High Definition with the Xeon 5160. 3D render servers are better off with the Quad Xeon E53xx but only if they have to render at 720p or full HD (1080p) resolutions.

The past 6 months have been excellent for Intel: after regaining the performance crown in the dual socket server market, there is also now a very viable and lowly priced alternative for the more expensive quad Opteron based systems. However, it is not all bad news for AMD. The current quad core might be good for Intel's yields, time to market, and production costs, but it does have a weakness. The quad core Xeon scaling is very mediocre, and this despite a high performance chipset. The current 5000p chipset has a large 16MB snoop filter, reads speculatively to decrease memory latency, and has a whole other bag of clever tricks to get more performance out of the platform. Despite all this and a 2x4MB L2 cache setup, the quad core Xeon scales worse than the relatively old quad Opteron platform.

Let us summarize:

AMD Quad Opteron Platform
  • Advantages:
    • Still the best performing FP platform: highest rendering performance
    • Scales better than comparable Intel platform
  • Cons:
    • Expensive 8xx(x) CPUs and expensive platform (motherboard)
    • (Slightly) lower integer performance than E5345
    • Lower performance/Watt than Xeon E5345

Intel Quad Xeon MP Platform
  • Advantages:
    • Better RAS than other platforms
    • Good integer performance thanks to huge L3 cache
  • Cons:
    • Expensive MP CPUs, especially compared to Xeon E5345, and very expensive platform (motherboard, memory boards.)
    • Pretty bad FP/rendering performance
    • Very high latency memory subsystem, L3 cache. (bad HPC performance)
    • Bad Performance/Watt, compared to Xeon E53xx and Opteron

Intel Dual Xeon Platform / Clovertown
  • Advantages:
    • Quad socket performance...
    • ...For very low dual socket price in CRM, SAP, Financial analyses and Java server
    • Excellent rendering performance at high resolutions (>=720p)
    • In some cases, a simple upgrade for Xeon 51xx.
  • Cons:
    • Mediocre scaling in many applications
    • Slightly higher power consumption but little or no performance gain compared to Xeon 5160 in flow modeling, 3D rendering (lower resolutions), structural simulation, MySQL and TPC.

A look into the future

Quite some time ago, Pat Gelsinger of Intel showed a CPU that was called "Clovertown MP". Clovertown MP does not exist (anymore) according to all Intel representatives we talked to. So is Tigerton the new Clovertown MP? It does seem to have two dual core dies just like Clovertown and runs at the same maximum clock speed as Clovertown (2.66GHz), so it is very likely that Tigerton is very similar to or even a rebadged Clovertown MP. Another indication is the Clarksboro chipset, which has four DIBs, a gigantic 64MB snoop filter, and other features designed to tackle the scaling problems that we noticed. We are not sure that it will be enough.

It is quite possible, assuming that AMD executes well, that AMD will keep the advantage in the four socket server market with its new Barcelona core in 2007. Its current platform already scales well, and AMD has made a lot of improvements that help scaling. The upcoming Barcelona core has one L3 cache per four cores (less cache coherency traffic), faster and more HT ports, and so on. There are certainly interesting times ahead... But a bird in the hand is worth two in the bush, so until AMD's quad core Opteron actually ships, Intel has the most attractive dual socket platform.

Analysis
Comments Locked

15 Comments

View All Comments

  • Antinomy - Wednesday, March 7, 2007 - link

    A great review, very interesting.
    But there are a few things to mention. A mistake in results of Cinebench test. In the overall table the uni Clovertown system got 1272 points, but in the next (per core performance) - 1169. The result was swapped with the one of Xeon 7130. And a comment about the scalability extrapolation. The result of scalability 2.33 Clover vs 3.0 Dual Woodcrest can be hardly compared due to different organization of the systems. These MoBo have two independent FSB so this means, that the two Woodcrests will be provided with twice more peak memory bandwith. This can't make no influence on the result. Also the 4 channel memory mode provides a 5% increase versus 2 channel in real bandwith, so we can't say that theese applications do not suffer from lack of memory bandwith.
    It would be interesting to provide a test of uni Woodcrest system and a test of system based on Woodcrest (both uni and dual) at the same frequency as Clovertown has. And a Kentsfield\Conroe systems (despite they aren't server ones) would be nice to look at because of their more efficient usage of memory bandwith and FSB.
  • afuruhed - Thursday, December 28, 2006 - link

    We are getting more Clovertowns. There is a chart at http://www.pantor.com/software.html">pantor.com that indicates that some applications benefit a lot. http://en.wikipedia.org/wiki/FIX_protocol">The FIX protocol is a technical specification for electronic communication of trade-related messages (financial markets).
  • henriks - Thursday, December 28, 2006 - link

    Agree with other responses - good article!

    Some comments on the jbb results page:

    You state that JRockit is (only) available for x86-64 and Itanium. x86 and Sparc should be added to this list.

    The JRockit configuration you're using enables a single-spaced GC. In that configuration, performance is tied to heap size (larger heap means fewer GC events). Increasing the heap size to 3 GB - as for the Sun benchmark results - would increase performance slightly but in particular give much better scalability when you increase the number of warehouses to large numbers.

    It looks like you have not enabled large pages in the OS. Doing this would give a large performance boost and help scalability regardless of chip or JVM vendor.

    Astute readers may note that your results are lower than the published results on www.spec.org. Apart from OS and possibly BIOS tuning, the reason is that the most recent results are using a newer JRockit version (not yet available for public download). This new version improves performance on this benchmark by 20-30% on x86 chips - Intel *and* AMD - with the largest positive effect on high-bin chips from the respective vendors. The effect on other Java applications vary from zero to a lot.

    Cheers!

    Henrik, JRockit team
  • dropadrop - Wednesday, December 27, 2006 - link

    Considering how much we just payed for some DL585's compared to DL380's I think the performance is pretty impressive. There is still something the DL380's (and most other two socket servers) can't do, and that is hosting 64GB or more ram.

    I mainly take care of vmware servers, and there the amount of memory becomes a bottleneck long before the processors, atleast in most setups. I don't think I'd have alot of use for octal processors unless I got a minimum of 32GB of ram, probably 64.
  • rowcroft - Thursday, December 28, 2006 - link

    I've run into the same challenge when planning for the quads. My take is that I'm getting dual quads for half the price of quad dual cores. With ESX 3's HA functionality I can group the host servers and get the 32GB of ram with double the cores and have host based redundancy for critical vm's.
  • mino - Thursday, December 28, 2006 - link

    there is another thing DL380 lacks: no drop-in analog to Barcelona on the horizon...
  • Justin Case - Wednesday, December 27, 2006 - link

    Finally a good article at AT, written by someone who knows what he's talking about. Meaningful benchmarks, meaningful comments, and conclusions that make sense. If only some Johanness could rub off on other AT writers...
  • hans007 - Wednesday, December 27, 2006 - link

    i think an alternative to say a dual dual core AMD though even as a server or workstation is say a quad core socket 775 cpu. I know the lower 3xxx series xeons are made for this (and are exactly the same as core 2 duo) so

    you could do a comparison of 2 amd dual cores vs a single 775 quad with ECC ddr2 etc.
  • mino - Thursday, December 28, 2006 - link

    Check QuadFX vs. Kentsfield reviews.

    With ECC both results will be a bit lower but the conparison remains.

    A small hint: NO ONE tested QuadFX as DB server against Kenstfield....

    Gues what: Quad FX is cheaper and would rules the roost on server-like tasks.
  • ltcommanderdata - Wednesday, December 27, 2006 - link

    Well it's nice to finally see a review of the 5145, although I was hoping for more detailed power consumption numbers. The performance benchmarks were very detailed though which was great.

    Thought I would point out a few errors I noticed as I was flipping through. First on page 2, in the Cache2Cache Latency chart the 201 for the Xeon DP 5060 that is placed in the "Same die, same package" row should be in the "Different die, same package" row. Dempsey uses a dual die approach like Presler and Cloverton as opposed to a single die approach like Smithfield and Paxville DP. And in the last page in the conclusion, you mentioned Clarksboro having "four DIBs", which implies 8 FSBs. I believe that should read two DIBs or really a Quad Independent Bus (QIB) since I'm pretty sure it only has 4 FSBs. (On a side note, Intel slides showed those 4 FSBs clocked at 1066MHz which is really disappointing. Hopefully, now that Cloverton turns out to come in 1333MHz versions instead of only 1066MHz versions that was first announced, Tigerton (and therefore Clarksboro) which is based on Cloverton will also have 1333MHz versions.)

Log in

Don't have an account? Sign up now