HPC and Encryption Benchmarks

Just a few days prior to today's launch, we were able to get access to the benchmark numbers that Intel and AMD produced for LSDyna’s (Crash simulation) and Fluent (fluid dynamics) from Ansys. The first benchmark is the Ansys Fluent Truck_14 m benchmark.  

Ansys Fluent Truck_14m

The next one is LS Dyna “Neon refined revised”.

LS Dyna Neon refined revised

In both cases, the four memory channels and 12 core mix per CPU seem to pay off: AMD can beat Intel again in the HPC benchmarks, although the advantage is small.

Next we ran Sisoft Sandra 2010's encryption benchmark. Do remember that this is a completely synthetic benchmark. A 100% encryption performance advantage might translate in a very small performance advantage in a real world application. For example the code run on a website might only include a small part of encryption code.

Sisoft Sandra Encryption benchmark: AES

Sisoft Sandra Encryption benchmark: SHA

Once the Xeon X5670's AES instructions can do their work, encryption is lightening fast. Here the new Xeon is 19 times faster than its older brother and 9 times faster than the best Opteron. Encryption can be broken up easily in smaller parts, it scales extremely well. The result is that the CPU with the most threads, the Xeon 5670 and Opteron 6174 easily outperform their older brothers in cryptographic hash functions.

vApus Mark I: Performance-Critical applications Virtualized Power Consumption
POST A COMMENT

58 Comments

View All Comments

  • zarjad - Friday, April 02, 2010 - link

    I understand that HT can be disabled in BIOS and that some benchmarks don't like HT. Reply
  • elnexus - Wednesday, April 21, 2010 - link

    I can report that one of my customers, performing intensive image processing, found that DISABLING hyper-threading on a Nehalem-based workstation, actually IMPROVED performance considerably.

    It seems that certain applications don't like hyper-threading, while others do. I always recommend that my customers perform sensitivity analyses on their computing tasks with HT on and off, and then use whichever is best.
    Reply
  • tracerburnout - Wednesday, March 31, 2010 - link

    How is it possible that Intel's Xeon X5670 rig returns 19k+ for a score while AMD's magny-cours returns only 2k+?? I only question the results of this benchmark chart because Intel's Xeon X5570 rig returns only around 1k. How can a X5670 be 19x faster than a X5570?? And I doubt the same is true for the magny-cours by being just 10.5% of what the X5670 can do.

    (is there an extra '0' by accident in there?)



    tracerburnout
    proud supporter of AMD, with a few Intel rigs for Linux only
    Reply
  • JohanAnandtech - Thursday, April 01, 2010 - link

    No, it is just that Sisoft uses the new AES instructions of West-mere. It is a forward looking benchmark which tests only a small part of a larger website code base. So that 19x faster will probably result in 10 to 20% of the complete website being 19x faster. So the real performance impact will be a lot slower. It is interesting though to see how much faster these dedicated SIMD instructions are on these kinds of workloads. Reply
  • alpha754293 - Thursday, April 01, 2010 - link

    If you guys need help with setting up or running the Fluent/LS-DYNA benchmarks let me know.

    I see that you don't really spend as much time writing or tweaking it as you do with some of the other programs, and that to me is a little concerning only because I don't think that it is showing the true potential of these processors if you run it straight out-of-the-box (especially with Fluent).

    Fluent tends to have a LOT of iterations, but it also tends to short-stroke the CPU (i.e. the time required to complete all of the calculations necessary is less than 1 second and therefore; doesn't make full use of the computational ability.)

    Also, the parallelization method (MPICH2 vs. HP MPI) makes a difference in the results.

    You want to make sure that the CPUs are fully loaded for a period of time such that at each iteration, there should be a noticable dwell time AT 100% CPU load. Otherwise, it won't really demonstrate the computational ability.

    With LS-DYNA, it also makes a difference whether it's SMP parallelization or MPP parallelization as well.
    Reply
  • k_sarnath - Friday, April 02, 2010 - link

    The most baffling part is how linux could engage 12-CPUs much better than windows. I am obviously curious about the OS platform for other tests.. Similary MS SQL was able to scale well on multi-cores... In this context, I am not sure how we can look at the performance numbers... A badly scaling app or OS could show the 12-core one in bad light. Reply
  • OneEng - Saturday, April 03, 2010 - link

    Hi Johan,

    I have followed your articles from the early day's at Ace's and have a good respect for the technical accuracy of your articles.

    It appears that the X5570 scaling between 4 and 8 cores has very little gain in the Oracle Calling Circle benchmark. Furthermore, the 24 cores of MC at 2.2Ghz are way behind. Westmere appears to do quite well, but really should not be able to best 8 cores in the X5570 with all else being equal.

    I have heard some state that the benchmark is thread bound to a low number of threads (don't know if I am buying this), but surely something fishy is going on here.

    It appears that there is either a real world application limit to core scaling on certain types of Oracle database applications (if there are, could you please explain what features an app has when these limits appear), or that the benchmark is flawed in some way.

    I have a good amount of experience in Oracle applications and have usually found that more cores and more memory make Oracle happy. My experience seems at odds with your latest benchmarks.

    Any feedback would be appreciated .... Thanks!
    Reply
  • JohanAnandtech - Tuesday, April 06, 2010 - link

    I am starting to suspect the same. I am going to dissect the benchmark soon to see what is up. It is not disk related, or at least that surely it is not our biggest problem. Our benchmark might not be far from the truth though, I think Oracle really likes the big L3-cache of the Westmere CPU.

    If you have other ideas, mail at johanATthiswebsiteP
    Reply
  • heliosblitz2 - Wednesday, April 07, 2010 - link

    You wrote
    Test-Setup:
    Xeon Server 1: ASUS RS700-E6/RS4 barebone
    Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz
    6x4GB (24GB) ECC Registered DDR3-1333

    "Also notice that the new Xeon 5600 handles DDR3-1333 a lot more efficiently. We measured 15% higher bandwidth from exactly the same DDR3-1333 DIMMs compared to the older Xeon 5570."

    That is not exactly the reason, I think.
    The reason ist you populated the second memory-bank in both setups.
    Intel specification:
    Westmere-1333MHZ-CPUs run with 1333 MHZ with second bank populated while
    Nehalem-1333MHZ-CPUs run with 1066 MHZ with second bank populated

    That could be updated.

    Compare tech docs on Intel site: datasheet Xeon 5500 Part 2 and datasheet Xeon 5600 Part 2

    Arnold.
    Reply
  • gonerogue - Saturday, April 10, 2010 - link

    The Viper is a V10 and most certainly not a traditional muscle car ;) Reply

Log in

Don't have an account? Sign up now