Measuring Real-World Power Consumption

The Equal Workload (EWL) version of vApus FOS is very similar to our previous vApus Mark II "Real-world Power" test. To create a real-world “equal workload” scenario, we throttle the number of users in each VM to a point where you typically get somewhere between 20% and 80% CPU load on a modern dual CPU server. The amount of requests is the same for each system, hence "equal workload".

The CPU load is typically around 30-50%, with peaks up to 65% (for more info see here). At the end of the test, we get to a low 10%, which is ideal for the machine to boost to higher CPU clocks (Turbo) and race to idle. We use the "Balanced" power policy and enable C-states as the current ESXi settings make poor use of the C6 capabilities of the latest Opterons and Xeons.

vApus FOS EWL Power consumption

We cannot say "mission accomplished", but AMD has made significant progress. 12% to 20% better performance while decreasing the power consumption by 6% to 8% is pretty good. The 95W TDP Xeons are still the performance per Watt champs though. Still, it looks like the Opteron is a decent alternative for some. Power consumption is about 12-13% higher (6376 vs E5-2660), but the performance per dollar is slightly better.

vApusMark FOS SAP S&D
Comments Locked

55 Comments

View All Comments

  • Sivar - Wednesday, February 20, 2013 - link

    Please go away. You don't add any new information to the discussion.

    Your writing is of a teenager who knows nothing of processor architecture, the brilliant engineers at both AMD and Intel, or the competitive landscape.

    You present no data, only misinformed opinion. You reduce the quality of this discussion, and have shown no interest in improving your knowledge.
  • JamesAnthony - Wednesday, February 20, 2013 - link

    In the article it mentions you were using the E5-2660 CPU (8 core 2.2 GHz) 95W, in a Dell PowerEdge R720 server

    It may have been a lot more useful to also have included the E5-2680 (8 core 2.7 GHz) and the E5-2690 (8 Core 2.9 GHz) as while they are 130W parts, they are ones that are often used in the PowerEdge R720 and from what we find in a lot of server sales the higher performance ones are very popular for transactional database servers and payment processing servers.

    If you want to go head to head on Intel's top part vs AMD's top part, then it would seem it should be the E5-2690 vs 6386 SE
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    We all know that when you want top performance, Intel is the way to go. So I don't really see the point, even AMD will tell you that the 6376 and 6380 are their most competitive parts.. It is pretty obvious that the E5-2690 2.9 GHz will be faster and consume less than a 6386SE. I don't think our readers really need to see numbers on that.

    And I really doubt that the E5-2690 are sold that much. Most reports say that the top bins with the highest TDP are less than 5% of the total sales.
  • lwatcdr - Wednesday, February 20, 2013 - link

    Wow this is about the most gibberish I have seen in a post ever.
    Good heavens you are an idiot.
    Let's just tear this post bits so this person will NEVER post on here again.
    1"No, it's worth per dollar that you have paid to buy Intel based servers. Intel is more reliable because it has Hyperthreading so you can reduce the latencies that will occur in every workloads."
    Hyperthreading has nothing to do with reliability. So that was a waste of bandwidth.
    "Unlike AMD's engineers who can not design a microprocessor properly. It was AMD's own fault why AMD did not have money like Intel"
    My I introduce you to Titan http://www.olcf.ornl.gov/titan/ The worlds most powerful computer and powered by AMD cpus. AKA yea I think that AMD can actually do pretty well at designing CPUs so this part of your post is also pure manure.
    "Look 99% Bank's in the world uses Intel based ATM as Intel processor can send information without any error." And here we can see that you understand nothing about digital theory or communications. Again a waste of bandwidth.
    "That is why IBM itself does not use Power based processors for its ATM machine because its CEO has admitted that its engineers are not capable to design a lower power processor. So, IBM uses Intel as the standard processor to exchange information between ATM machine to server, so every digits that sent will come in exact same digits when it has been received."
    The IBM power line is for high end systems not for ATM machines. Odds are good that many banks use Power based system for handling ATM transactions. IBM uses Intel or AMD because it is cheap and you can get standard boards. As to the every digit sent nonsense. IT IS DIGITAL you MORON. The communications links have error checking and correction not the CPUs. Please NEVER WASTE OUR TIME AGAIN, YOU KNOW NOTHING OF VALUE ON THIS SUBJECT.
  • toyotabedzrock - Wednesday, February 20, 2013 - link

    Something is wrong with the LZMA benchmarks.

    Can you do a realworld test? There are scripts out there to do this.

    LZMA is built around the idea that decompression is supposed to be much faster than compression.
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    From the 7zip manual:

    "The benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off. "

    So that is the reason why the compression MIPS values are in the same order as the decompression. The decompression "MB/s" values are indeed about 10x and more higher than compression.
  • Oldboy1948 - Thursday, February 21, 2013 - link

    It is an interesting bench and if cache and memory are fast decompress and compress will be very close. It looks better for Bulldozer in this:
    http://www.7-cpu.com/

    ARM has a long way to go if it will be a server one day.
  • extide - Wednesday, February 20, 2013 - link

    Can we PLEASE get folding@home benches?! musky on the hardocp forums has come up with a system where you can run repeatable benchmarks. Myself as well as many others would really love to see F@H benches on systems like this!
  • JohanAnandtech - Wednesday, February 20, 2013 - link

    Ok, Link? :-)
  • alpha754293 - Wednesday, February 20, 2013 - link

    Because of the way that the current Opteron architecture is (1 FPU per module), did you run with the number of LS-DYNA processes equal to the number of FPUs on chip or did you run it based on per "core" (i.e. 2 processes per module)?

Log in

Don't have an account? Sign up now