Testing the Opteron HPC Remedy

The results of memory node interleaving are pretty spectacular, at least in terms of improving Opteron performance.

Once we disable NUMA, our Opteron server scales properly. Performance is multiplied by 3 when we run the benchmark with 48 threads. So memory interleaving does the trick, but since memory interleaving increases the traffic between the CPU nodes, we decided to test with HT assist (a 1MB snoop filter) on and off.

Stars Euler 3D CFD: maximum score revisited

Notice how this benchmark relies on the CPU interconnects: when we disable HT assist but leave interleaving on, we lose more than 25% performance. HT assist avoids many unnessary broadcasts on the HT interconnects. What is more, we did test the Xeon E7 with memory node interleaving (4-way) but this did not improve or decrease performance in any substantial way.

There's even more good news for the Opteron: the score on Cinebench R11.5 rendering improved from 25 (NUMA) to 26.3. (memory node interleaving). It's hardly spectacular, but that's still a nice and free of charge 5% performance boost, assuming you're running workloads that will benefit.

Investigating the Opteron Performance Mystery Final Analysis
Comments Locked

52 Comments

View All Comments

  • MrSpadge - Friday, September 30, 2011 - link

    Agreed - performance of a single i7 2600 can be hard to beat, depending on the application. My Matlab code uses all physical cores through the Intel Math Kernel Library, yet is ~30% slower on 2 x X5570 (wich is about the difference in clock speed, incidently).

    MrS
  • JohanAnandtech - Friday, September 30, 2011 - link

    http://www.anandtech.com/show/4486/server-renderin...

    the core i970 3.2 GHz is included. But indeed, it has been some time since we have used backburner.

    Is this the kind of bench you are looking for?
    http://www.anandtech.com/show/2240/7

    Backburner scales extremely well, so I suspect that especially the Quad MC Dell is a very good choice compared to a workstation.
  • JoeKan - Friday, September 30, 2011 - link

    Yes - the backburner test is it. Although I use different rendering software, that test would be appropriate as the visualization rendering can properly represent real life usage and can stress the hardware at the same time.

    The test linked uses frames 20-29. I'd like to see a longer frame sequence.

    The reason I asked that a workstation be used as a base reference is because that gives us, the readers, a point of reference to compare against. I define a workstation as a single CPU box anyone can build with off the shelf components, like a i7-2600K, or a i7-970 - a performance CPU in the $300+ to $600 range. That allows one to compare performance on a per $ basis.

    Not a true 'workstation' as it does not use a Xeon, but it gives the ability to compare 'performance' to 'performance per buck' basis.

    By using a $1000+ class CPU for comparison the 'bang for the buck' comparison is distorted.
  • xxtypersxx - Friday, September 30, 2011 - link

    I love reading about the high end server hardware, its like F1 compared to road cars.

    As for benchmarks, may I suggest the linux x64 Folding at Home client? We know it scales past at least 128 cores without issue and as many of us that fold are running server hardware anyway, it will attract a new audience to the reviews.
  • rehm - Friday, September 30, 2011 - link

    Hello,
    for CFD benchmarking you could also consider the code OpenFOAM. It scales very well and is gaining a lot of interest in industry and academia. Memory behaviour should be comparable to Fluent and it can be compiled with gcc and icc.

    Regards
  • JohanAnandtech - Friday, September 30, 2011 - link

    Very nice suggestion... but is there a sample solution/ benchmark we can measure? It is a bit hard for a hardware reviewer to come up with very specialized realworld tests :-).
  • ozztheforester - Friday, September 30, 2011 - link

    I am currently using a bunch of 2600k's for rendering in the past I used some dual xeon setups but only found those being extremely inefficient on cost/performance ratio. Can you please let us know the cost and power consumption of this system?

    currently getting around 8.72 points on cinebench 11.5 on a 2600k pc @4.5ghz which is consuming less than 200 watts at full load and costing a bit less than 800usd

    also I would suggest using vray for multi thread benchmarks
  • sicofante - Friday, September 30, 2011 - link

    Why didn't you set up a scene in Maya or Softimage and then render it with Mental Ray? THAT would be a professional test, Cinebench is not.

    BTW, no matter how powerful, these Xeon E7 systems are a no-go for studios. They are plainly anti-economical. You can have a much sensibler setup by putting ordinary Xeons or overclocked Core i7s in many racks, i.e., a rendering farm.

    (Note: I build rendering farms for studios. Since 3D rendering grows almost linearly with frequency, what matters in the end is Euros/GHz, that is normalized GHz)
  • Phynaz - Friday, September 30, 2011 - link

    What studio renders on overclocked desktop cpu's?
  • confusis - Friday, September 30, 2011 - link

    My studio does. We can't yet step up to a higher end multi-socket rendering server (finances, start-up company) so we make do with Phenom II x4's. A desktop box is good value for money at our end of the company scale. Once we grow we'll be looking at Interlagos however

Log in

Don't have an account? Sign up now