Nehalem EX Confusion

One of the reasons that the Xeon X7560 did not show its full potential at launch was a small error in the firmware of the Dell R810 testing platform. This caused the memory subsystem to underperform. As a result some of the bandwidth sensitive benchmarks, including many HPC applications, were not performing optimally. Intel claimed that a dual CPU config should be able to reach 39GB/s, and a quad CPU configuration should reach up to 70GB/s. We could not reach those stream numbers as we test with our somewhat older stream binary as described here. Using the same stream binary as before allows us to compare our findings with all our previous measurements.

We reran our stream benchmarks on the new QSSC-S4R server system.

Stream TRIAD on 64 bit Linux—maximum threads
* New measurements.

The new results tell us that available memory bandwidth is about 21% higher (29GB/s) than what we previously measured on the DELL R810 (24GB/s). That means that many benchmarks published at the launch of the Xeon 7500 and using the Dell R810 were too low, especially the HPC ones. The Xeon X7560 will not be able to beat the quad Opteron 6174 when it comes to raw bandwidth, but it is far from a bandwidth starved platform.

The 32-Core, 64-Thread Beast Stress Testing the High End
Comments Locked

51 Comments

View All Comments

  • Ratman6161 - Wednesday, August 11, 2010 - link

    Many products license on a per CPU basis. For Microsoft anyway, what they actually count is the number of sockets. For example SQL Server Enterprise retails for $25K per CPU. So an old 4 socket system with single cores would be 4 x $25K = $100K. A quad socket system with quad core CPUs would be a total of 16 cores but the pricing would still be 4 sockets x $25K = 100K. It used to be that Oracle had a complex formula for figuring this but I think they have now also gone to the simpler method of just counting sockets (though their enterprise edition is $47.5K).

    If you are using VMWare, they also charge per socket (last I knew) so two dual socket systems would cost the same as a single 4 socket system. Thing is though you need to have at least two boxes in order to enable the high availability (i.e. automatic failover) functionality.
  • Stuka87 - Wednesday, August 11, 2010 - link

    For VMWare they have a few pricing structures. You can be charged per physical socket, or you can get an unlimited socket license (which is what we have, running one seven R910's). You just need to figure out if you really need the top tier license.
  • semo - Tuesday, August 10, 2010 - link

    "Did I mention that there is more than 72GHz of computing power in there?"

    Is this ebay?
  • Devo2007 - Tuesday, August 10, 2010 - link

    I was going to comment on the same thing.

    1) A dual core 2GHz CPU does not equal "4GHz of computing power" - unless somehow you were achieving an exact doubling of performance (which is extremely rare if it exists at all).

    2) Even if there was a workload that did show a full doubling of performance, performance isn't measured in MHz & GHz. A dual-core 2GHz Intel processor does not perform the same as a 2GHz AMD CPU.

    More proof that the quality of content on AT is dropping. :(
  • mino - Wednesday, August 11, 2010 - link

    You seem to know very little about the (40yrs old!) virtualization market.
    It flourishes from *comoditising* processing power.

    Why clearly meant a joke, that statement of Johan, is much closer to the truth than most market "research" reports on x86.
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    Exactly. ESX resource management let you reserve CPU power in GHz. So for ESX, two 2.26 GHz cores are indeed a 4.5 GHz resource.
  • duploxxx - Thursday, August 12, 2010 - link

    sure you can count resources together as much as you want... virtually. But in the end a single process is still only able to handle the max ghz a single cpu can offer but can finish the request faster. That is exactly the thing why those Nehalem and gulf still hold against the huge core count of Magny cours.
  • maeveth - Tuesday, August 10, 2010 - link

    So I have nothing at all against AnandTech's recent articles on Virtualization however so far all of them have only looked at Virtualization from a compute density point of view.

    I currently am the administrator of a VMware environment used for development work and I run into I/O bottle necks FAR before I ever run into a compute bottleneck. In fact computational power is pretty much the LAST bottleneck I run into. My environment currently holds just short of 300 VMs, OS varies. We peak at approximately 10-12K IOPS.

    From my experience you always have to look at potential performance in a virtual environment at a much larger perspective. Every bottleneck effects others in subtle ways. For example if you have a memory bottleneck, either host or guest based you will further impact your I/O subsystem, though you should aim to not have to swap. In my opinion your storage backend is the single most important factor when determining large-scale-out performance in a virtualized environment.

    My environment has never once run into a CPU bottleneck. I use IBM x3650/x3650M2 with Dual Quad Xeons. The M2s use X5570s specifically.

    While I agree having impressive magnitudes of "GHz" in your environment is kinda fun it hardly says anything about how that environment will preform in a real world environment. Granted it is all highly subject to work load patterns.

    I also want to make it clear that I understand that testing on a such a scale is extremely cost prohibitive. As such I am sure AnandTech, Johan speficially, is doing the best he can with what resources he is given. I just wanted to throw my knowledge out there.

    @ELC
    Yes, software licensing is a huge factor when purchasing ESX servers. ESX is licensed per socket. It's a balancing act that depends on your work load however. A top end ESX license costs about $5500/year per socket.
  • mino - Wednesday, August 11, 2010 - link

    However, IMO storage performance analysis is pretty much beyond AT's budget ballpark by an order of magnitude (or two).

    There is a reason this space is so happily "virtualized" by storage vendors AND customers to a "simple" IOPS number.
    It is a science on its own. Often closer to black (empiric) magic than deterministic rules ...

    Johan,
    on the other hand, nothing prevents you form mentioning this sad fact:

    Except edge cases, a good virtualization solution is build from the ground up with
    1. SLA's
    2. storage solution
    3. licensing considerations
    4. everything else (like processing architecture) dictated by the previous
  • JohanAnandtech - Wednesday, August 11, 2010 - link

    I can only agree of course: in most cases the storage solution is the main bottleneck. However, this is aloso a result of the fact that most storage solutions out there are not exactly speed demons. Many storage solutions out there consist of overengineered (and overpriced) software running on outdated hardware. But things are changing quickly now. HP for example seems to recognize that a storage solution is very similar to a server running specialized software. There is more, with a bit of luck, Hitachi and Intel will bring some real competition to the table. (currently STEC has almost a monopoly on the enterprise SSD disks). So your number 2 is going to tumble down :-).

Log in

Don't have an account? Sign up now