HPC: Fluid Dynamics with OpenFOAM

Computational Fluid Dynamics is a very important part of the HPC world. Several readers told us that we should look into OpenFOAM, and my lab was able to work with the professionals of Actiflow. Actiflow specializes in combining aerodynamics and product design. Calculating aerodynamics involves the use of CFD software, and Actiflow uses OpenFOAM  to accomplish this. To give you an idea what these skilled engineers can do, they worked with Ferrari to improve the underbody airflow of the Ferrari 599 and increase its downforce.

We were allowed to use one of their test cases as a benchmark, however we are not allowed to discuss the specific solver. All tests were done on OpenFOAM 2.2.1 and openmpi-1.6.3. The reason why we still run with OpenFOAM 2.2.1 is that our current test case does not work well with higher versions.

We also found AVX code inside OpenFoam 2.2.1, so we assume that this is one of the cases where AVX improves FP performance. 

OpenFOAM

As this is AVX code, the clock speed of our Xeon processors can be lower than Intel's official specifications, and turbo boost speeds are also lower. Despite the fact that on Broadwell the only cores that reduce their clock when running AVX code are the AVX-active cores themseves (the others can continue at higher speeds), OpenFOAM does not run appreciably faster on the top of the line Xeon E5 v4 than it did on the E5 v3.

It is not as if OpenFOAM does not scale: 22% more cores delivers 13% higher performance (E5-2699v4 vs E5-2695v4). No, our first impression is that the new Xeon v4 needs to lower the clockspeed more than the old one. The official specifications tell us that both the Xeon E5-2699 v4 and v3 should run AVX code at up to 2.6 GHz with all cores enabled. The reality is however that Broadwell runs at a lower clock on average. 

Spark Benchmarking NAMD
Comments Locked

112 Comments

View All Comments

  • SkipPerk - Friday, April 8, 2016 - link

    "Anyone putting Microsoft on bare hardware these days is nuts"

    This brother is speakin the truth!
  • warreo - Thursday, March 31, 2016 - link

    Can someone clarify this line for me?

    "The average performance increase versus the Xeon E5-2690 is 3%, and the Broadwell cores get a boost of no less than 19%."

    Does that mean IPC increase is 19% for Broadwell, offset by ~16% decline in clockspeed to get to 3% average performance increase? But that doesn't make sense to me as a 3.8ghz (E5-2690) to 3.6ghz (E5-2699 v4) is only 5% decline in max clockspeed?
  • ShieTar - Thursday, March 31, 2016 - link

    I understood it as "the -Ofast setting boosts Broadwell by 19%", so with the -O2 setting it was actually 16% slower than the 2690.

    And I think the AT-Theory based on the original measurements is that the 3.6GHz boost are not even held for a significant amount of time, so that Broadwell in reality comes with an even worse decline in clock speed.
  • warreo - Thursday, March 31, 2016 - link

    Your interpretation makes much more sense than mine, but still doesn't quite add up. The improvement from using -Ofast vs. -O2 is 13% on average, and the lowest improvement is 4% on the xalancbmk, well below the "no less than 19%" quoted by Johan.

    Perhaps the rest of the disparity is normalizing for sustained clock speeds as you suspect? Johan is that correct?
  • Ryan Smith - Thursday, March 31, 2016 - link

    I've reworded that passage to make it clearer. But ShieTar's interpretation was basically correct.

    "Switching from -O2 to -Ofast improves Broadwell-EP's absolute performance by over 19%. Meanwhile the relative performance advantage versus the Xeon E5-2690 averages 3%. "
  • JohanAnandtech - Thursday, March 31, 2016 - link

    That means that the -ofast has much more effect on the Broadwell. I mean by that that -ofast is 19% faster than -o2 on Broadwell, while it is 3% faster on Sandy Bridge. I assume that the older the architecture, the better the compiler is able to optimize it without special tricks.
  • warreo - Friday, April 1, 2016 - link

    Thanks for the clarification. Loved the review, great work Johan!
  • Pinn - Thursday, March 31, 2016 - link

    I'm still happy I went with the 6 core x99 over the 8 core. Massive core count is nice to see available, but I don't see the true value. Looks like you have to do the same rough math to see if the clock speed reduction is worth the core count.
  • Oxford Guy - Tuesday, April 5, 2016 - link

    Why would there be "true value" for six and not for eight?
  • Pinn - Wednesday, April 6, 2016 - link

    Single threaded workloads.

Log in

Don't have an account? Sign up now