OpenFoam

Computational Fluid Dynamics is a very important part of the HPC world. Several readers told us that we should look into OpenFoam, and my lab was able to work with the professionals of Actiflow. Actiflow specializes in combining aerodynamics and product design. Calculating aerodynamics involves the use of CFD software, and Actiflow uses OpenFoam to accomplish this. To give you an idea what these skilled engineers can do, they worked with Ferrari to improve the underbody airflow of the Ferrari 599 and increase its downforce.

We were allowed to use one of their test cases as a benchmark, but we are not allowed to discuss the specific solver. All tests were done on OpenFoam 2.2.1 and openmpi-1.6.3.

Many CFD calculations do not scale well on clusters, unless you use InfiniBand. InfiniBand switches are quite expensive and even then there are limits to scaling. We do not have an InfiniBand switch in the lab, unfortunately. Although it's not as low latency as InfiniBand, we do have a good 10G Ethernet infrastructure, which performs rather well. So we can compare our newest Xeon server with a basic cluster.

We also found AVX code inside OpenFoam 2.2.1, so we assume that this is one of the cases where AVX improves FP performance. To understand this real world test case better, we'll start with a single-threaded benchmark.

Actiflow OpenFOAM – One Thread

As this is AVX code, the clock speed "rules" change. A 2.3GHz Xeon E5 v3 can fall back to 1.9GHz if necessary, but it may also boost to 3.3GHz if the thermals allow it. The Xeon 2695 v3 has less TDP headroom and as a result it performs slightly slower than the Xeon E5-2699 v3. Still they cannot beat the Xeon E5-2667 v3 in single-threaded HPC performance. The latter is the better chip for this workload as it guarantees 2.7GHz and can boost up to 3.5GHz. As the previous Xeons also support AVX and run between 2.7 and 3.3GHz, they keep up with the Xeon E5-2667 v3.

Of course, most HPC code is now multi-threaded. We next ran OpenFOAM at one thread per physical core, which is about 5% faster than running with one thread per logical core (likely due to AVX).

Actiflow OpenFOAM

If you work professionally with OpenFOAM, it is clear that it pays off to understand what a certain CPU offers. If money does not matter much, the Xeon E5-2699 v3 does what is has to, which is to beat everybody else despite the fact that OpenFOAM does not scale that well beyond a certain point.

To give you an idea of what we're seeing, with 16 threads on the Xeon E5-2699 v3 we were already running at 30 runs per hour. Despite the fact that our workload is already a pretty heavy one (>600k cells), it is clear you need a larger mesh to really use the best Xeons of today to their full potential.

A less expensive option is the Xeon E5-2667 v3, but the real winner here is the Xeon E5-2650L v3 which costs a full $1000 per CPU les than the Xeon E5-2695 v3 and consumes quite a bit less as we will see on the next page.

Drupal Website: Performance per Watt Energy and HPC
Comments Locked

85 Comments

View All Comments

  • bsd228 - Friday, September 12, 2014 - link

    Now go price memory for M class Sun servers...even small upgrades are 5 figures and going 4 years back, a mid sized M4000 type server was going to cost you around 100k with moderate amounts of memory.

    And take up a large portion of the rack. Whereas you can stick two of these 18 core guys in a 1U server and have 10 of them (180 cores) for around the same sort of money.

    Big iron still has its place, but the economics will always be lousy.
  • platinumjsi - Tuesday, September 9, 2014 - link

    ASRock are selling boards with DDR3 support, any idea how that works?

    http://www.asrockrack.com/general/productdetail.as...
  • TiGr1982 - Tuesday, September 9, 2014 - link

    Well... ASRock is generally famous "marrying" different gen hardware.
    But here, since this is about DDR RAM, governed by the CPU itself (because memory controller is inside the CPU), then my only guess is Xeon E5 v3 may have dual-mode memory controller (supporting either DDR4 or DDR3), similarly as Phenom II had back in 2009-2011, which supported either DDR2 or DDR3, depending on where you plugged it in.

    If so, then probably just the performance of E5 v3 with DDR3 may be somewhat inferior in comparison with DDR4.
  • alpha754293 - Tuesday, September 9, 2014 - link

    No LS-DYNA runs? And yes, for HPC applications, you actually CAN have too many cores (because you can't keep the working cores pegged with work/something to do, so you end up with a lot of data migration between cores, which is bad, since moving data means that you're not doing any useful work ON the data).

    And how you decompose the domain (for both LS-DYNA and CFD makes a HUGE difference on total runtime performance).
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    No, I hope to get that one done in the more Windows/ESXi oriented review.
  • Klimax - Tuesday, September 9, 2014 - link

    Nice review. Next stop: Windows Server. (And MS-SQL..)
  • JohanAnandtech - Tuesday, September 9, 2014 - link

    Agreed. PCIe Flash and SQL server look like a nice combination to test this new Xeons.
  • TiGr1982 - Tuesday, September 9, 2014 - link

    Xeon 5500 series (Nehalem-EP): up to 4 cores (45 nm)
    Xeon 5600 series (Westmere-EP): up to 6 cores (32 nm)
    Xeon E5 v1 (Sandy Bridge-EP): up to 8 cores (32 nm)
    Xeon E5 v2 (Ivy Bridge-EP): up to 12 cores (22 nm)
    Xeon E5 v3 (Haswell-EP): up to 18 cores (22 nm)

    So, in this progression, core count increases by 50% (1.5 times) almost each generation.

    So, what's gonna be next:

    Xeon E5 v4 (Broadwell-EP): up to 27 cores (14 nm) ?

    Maybe four rows with 5 cores and one row with 7 cores (4 x 5 + 7 = 27) ?
  • wallysb01 - Wednesday, September 10, 2014 - link

    My money is on 24 cores.
  • SuperVeloce - Tuesday, September 9, 2014 - link

    What's the story with 2637v3? Only 4 cores and the same freqency and $1k price as 6core 2637v2? By far the most pointless cpu on the list.

Log in

Don't have an account? Sign up now