STARS Euler3D CFD

The STARS Euler3D CFD benchmark got popular thanks to Scott of Techreport.com. It is a computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab, the Computational AeroServoElasticity Laboratory at Oklahoma State University. Since the benchmark has been used for years by Scott, we felt it was a good place to start our HPC benchmarking adventure: we could check if our results are in the right ballpark.

The benchmark is downloadable and described in great detail here. The benchmark score is reported as a CFD cycle frequency in Hertz, with higher results being better.

Stars Euler 3D CFD: maximum score

The Xeon E7 scales quite nicely on the condition that you disable Hyper-Threading. The benchmark is able to take advantage of Hyper-Threading, which can be seen on the dual Xeon system. However, the threads work on the same data grid, so the more threads, the more locking contention rears its ugly head. Here's a more detailed look at scaling with the number of threads:

The Hyper-Threading enabled Xeon X5670 performs worse than the non-HT setup until we run more than 12 threads. Once we do that it can offer a decent performance boost (17%). The benchmark however does not scale enough to take advantage of 80 threads. Hyper-Threading offers better resource utilization but that does not negate the negative performance effect of the overhead of running 80 threads. Once we pass 40 threads on the E7-4870, performance starts to level off and even drop.

Of course, you are probably more interested in the other server result. What happened to the Opteron scores? Why is the 48 core Opteron five times slower than the 40 core Xeon E7? Let's investigate further.

Cinebench Release 11.5 Investigating the Opteron Performance Mystery
Comments Locked

52 Comments

View All Comments

  • proteus7 - Tuesday, October 11, 2011 - link

    STREAM triad on a 4S Xeon E7 should hit about 65GB/s, unless your memory, or UEFI/bios options are misconfigured. Firmware settings can have a HUGE difference on these systems.

    Did you:
    Enable Hemisphere mode?
    Disable HT?
    If running Windows, assume it was Server 2008 R2 SP1?
    If running Windows, realize that only certain applications, compiled with specific flags will work on core counts over 64 (kgroup0). Not an issue if HT was off.
    Enable prefetch modes in firmware?
    ensure system firmware was set to max perf, and not powersaving modes?
    if running windows, set power options to max performance profile? (default power profile on server drops perf substantially for short burst benchmarks)
    TPC-E is also a great benchmark to run (need some SSD storage/Fusion I/O) HPCC/Linpack are good for HPC testing.
  • pventi - Monday, October 31, 2011 - link

    As you can read from the icc manual when running on non INTEL processors the Non-Temporal pre-fetches are not implemented in the final machine code. This alone means it could be up to 27% faster.

    Another reason why it's slower is because the "standard" HW configuration of the Opteron throttles the DRAM pre-fetchers when under load.
    Under Linux this behaviour can be changed from shell and should add another 5~10% increase in performance.

    So this benchmark should show ~ 30% higher number for the Opteron.

    www.metarstation.com

    Best Regards
    Pierdamiano

Log in

Don't have an account? Sign up now