Original Link: http://www.anandtech.com/show/3648/xeon-7500-dell-r810
High-End x86: The Nehalem EX Xeon 7500 and Dell R810by Johan De Gelas on April 12, 2010 6:00 PM EST
The stellar performance of the Xeons based on the Nehalem and Westmere architectures had a dark side for Intel: it cast a big shadow on Intel's top of the line Xeon 7400 series. You can talk about RAS features all you want but when a dual-CPU configuration outperforms quad-CPU configurations of your top-of-the-line CPU, something is wrong. And if this happens in the applications the latter is supposed to excel in, something is very wrong. SAP, OLTP, and other high-end server workloads are the workloads that are supposed to run better on the most expensive Xeon, not on the "popular" Xeon. Even worse, the AMD six-core 8000 series outperforms Intel's Xeon X7460 by a large margin as of several months ago. Quad dodeca-CPU servers will start to pop up in the shops of several tier-one OEMs any moment now, so Intel's new Xeon EX has a serious challenge.
Intel emphasizes that its Xeon X7500 series plays in a higher league than the competition from Austin. The mission of the X7560 is to beat the RISC chips. That's not a bad strategy, as the RISC server buyers are used to paying a lot more for their servers. For example, a very basic IBM Power 7 configuration startsat $34000. Intel created an octal-core 16-thread giant based on the successful Nehalem architecture. To fit in with the other RISC monsters the CPU also comes with a massive L3 cache (24MB) and a bucket load of RAS features.
On the lower-end of the targeted high-end server market, the market where x86 traditionally did well, Intel is going to get fierce completion. AMD's latest 2.2GHz twelve-core 6174 comes with a price tag of $1165, regardless of whether the server features two or four sockets. Intel however expects the server manufacturers to cough up to $3692 for a 2.26GHz X7460. It's clear that both competitors are targeting a different market.
AMD is going after the cost conscious HPC/virtualization market, offering the best price/performance and performance/watt. Intel has no intention to compete on price/performance. It targets the higher-end market where software license costs are more important than the hardware, where downtime is so costly that people are willing to pay a premium for extra reliability features, and/or where the performance demands are extremely high. Intel's objective is to offer better performance than RISC vendors with similar RAS features at a lower price point. Up to 64 cores (8x8) and 128 threads and 512GB RAM can be found in a single Xeon 7500 series machine, so scalability should be quite impressive. For those who need RAS features but have no need for high performance, Intel offers the Xeon 6000 series. In this article we take a closer look at one of the most affordable Xeon 7500/6500, the Dell R810.
Intel claims no less than 20 new RAS features for the new Xeon, most of them borrowed from the Itanium. Some of the RAS features are for the most paranoid of IT professionals. Let's face it, who has experienced a server crash that was caused by a bad CPU? For each CPU failure there must be a million failures caused by buggy software. So we are not too concerned if a competing CPU lacks "hot physical CPU board" swapping, and it is reasonable to think that most IT professionals—even those with mission critical applications—will agree. The most paranoid people usually have the highest budgets, as the mission critical applications they manage could cost them their job if they go down. Not to mention that the company they work for might lose millions of dollars. So those people tend to favor a very long list of reliability features.
All ironic remarks about paranoid people aside, most of these RAS features make a lot of sense even for the "down to earth" people, the rest of us. Memory does fail a lot more than CPUs. According to Google research, 8% of the DIMMs see one correctable error per year, and 0.22% have uncorrectable errors. These machines can have up to half a Terabyte (!) of RAM, and with 32 to 64 DIMMs an uncorrectable error is conceivable. So it is no surprise that most of the RAS features try to cope with failing DRAM chips. Also as the number of VMs that you consolidate on one machine increases, the risk of a bad VM bringing the complete host machine down increases.
The idea behind the Machine Check Architecture is that errors in memory and L3 cache are detected before they are actually "used" by the running software. A firmware based memory scrubber constantly checks ("patrols") for unrecoverable errors, errors that ECC cannot correct. Those errors will make the (ESX) hypervisor create a purple screen—which is in most cases much worse than the famous blue screen—to make sure your data does not get corrupted.
With MCA in hardware and support in both firmware and the hypervisor, data errors are transmitted to the hypervisor's error handler before they cause havoc. The memory location is placed in quarantine (poisoned data containment) and the CPU will not use that address again. The software handler can then retry to get the data, and as a result the hypervisor keeps running. This "recover" mechanism can of course only work if the error is created by the occasional glitch and not by bad hardware.
So the basic idea behind these increased reliability features is that the more memory you have, the higher the chances that an occasional glitch occurs and thus the more features like demand and patrol scrubbing and recovery from single DRAM device failure are handy. You will need something better than simple ECC. The same is true for QPI. As the number of Nehalem EX CPUs and the speed of QPI links increases, the chances for bad addresses or bad data increases as well.
The Uncore Power of the Nehalem EX
Feeding an octal-core is no easy feat. You cannot just add a bit of cache and call it a day. Much attention was paid to the uncore part. When you need to feed eight fast cores, the L3 cache bandwidth is critical. Intel used a 32 byte wide dual counter-rotating rings system and eight separate banks of 3MB to make sure that the L3 cache could deliver up to 200GB/s at a low 21ns load to use latency. The Last Level Cache is also a snoop filter to make sure that cache coherency traffic does not kill the performance scaling.
An 8-port router regulates the traffic between the memory controllers, the caches, and the QPI links. It adds 18ns of latency and should in theory be capable of routing 120GB/s. Each memory controller has two serial interfaces (SMI) working in lockstep to memory buffers (SMBs) which have the same tasks as the AMBs on FB-DIMMs (Fully Buffered DIMMs). The DIMMs send their bits out in parallel (64 per DIMM) and the memory buffer has to serialize the data before sending it to the memory controller. This allows Intel to give each CPU four memory channels. If the memory interface wasn't serial, the boards would be incredibly complex as hundreds of parallel lines would be necessary.
Each SMI can deliver 6.4GB/s in full duplex or 12.8GB/s of total bandwidth. Each SMB has two DDR3-1066 memory channels, which can deliver 17GB/s half duplex. To transform this 17GB/s half duplex data stream into a 6.4GB/s full duplex, the SMB needs about 10W at the most (TDP). In practice, this means that each SMB needs to dissipate about 7W, hence the small black fans that you will see on the Dell motherboard later.
So each CPU has two memory interfaces that connect to two SMBs that can each drive two channels with two DIMMS. Thus, each CPU supports eight registered DDR3 DIMMs at 1066MHz. By limiting the channels to two DIMMs per DDR channel, the system can support quad-rank DIMMs. So in total, a carefully designed quad-Xeon 7500 server can contain up to 64 DIMMs. As each DIMM can be a quad-ranked 16GB DIMM, a quad-CPU configuration can contain up to 1TB of RAM. So Intel's Nehalem EX platform offers high bandwidth and enormous memory capacity. The flipside of the coin is increased latency and—compared to the total system—a bit of power consumed by the SMBs.
AMD Opteron and Intel Xeon SKUs
The Intel slide below gives you an overview of the available SKUs.
Only the top Xeon X7560 gets the massive 24MB L3 cache. The two top CPUs, the X7560 and X7550, have eight cores and can speed up their clock speed by 400MHz (via Turbo Boost) if several cores are idle. That is quite handy: if there is no virtualization layer, single threaded tasks do happen quite often even on an eight socket, 64 core machine. Speeding those tasks up by 20% can save some valuable time. When all cores are busy but not at 100% load, the CPU will probably be capable of running a speed bin higher. For example, Intel's performance engineers in Portland report that the SAP benchmark, not really a low CPU load workload, runs about 3% faster with Turbo Boost.
For all Windows 2008 users, Turbo Boost is probably going to be disabled. We learned through experimentation that the most likely power plan, "balanced", does not use Turbo Boost. Turbo Boost is only available when you use the "high performance" setting. Use this power plan setting and Windows 2008 R2 uses the highest clock possible. Look at the picture below.
Our current Linux build, SUSE SLES 11 (kernel 2.6.27 SMP x86-64), does not have that problem. The most aggressive performance plan "low latency computing" sets the Nehalem EX X7560 at 2.26 when running idle and not at 2.66GHz. Let us check out the pricing.
|Intel Xeon model||Cores||TDP||Speed (GHz)||Price||AMD Opteron model||Cores||ACP/TDP||GHz||Price|
The different strategies of Intel and AMD get tangible when you look at the price list. Intel wants "RISC-like" prices for its best CPUs, with four CPUs costing $8000 to $12000. Those markets that demand the reliability features for running expensive applications will not worry about this. But if those reliability features are not on the top of checklist and price/performance is, AMD's aggressive pricing is very attractive. AMD's cores might be slower, but AMD offers more cores at higher clock speeds at a lower price. Four of the best Opteron 6100s will lower your budget by $4000 to $5500.
The low power versions of the Xeon Nehalem EX are unattractive. For example, the TDP of the L7545 is only 10W lower than the E7540 but Intel demands a $700 premium. The fact that the Nehalem EX is still a 45nm chip seems to have limited the options for low power chips. Of course, the demand for lower power chips in quad-CPU machines is low, albeit growing.
The Xeon 6500 series are no bargains either. Limited to only two CPUs, they lack the scalability of the 7500 series. The only thing you get in exchange is a meager $300 price cut. The 6500 series do make sense, as they can use up to 32 DIMMs and have all the reliability features of their big brothers. But Intel missed a chance here: many people are RAM limited, not CPU limited, when virtualizing. And not all of them are willing to pay a premium for reliability features. These people will probably turn to AMD.
It all boils down to two questions: how much memory do you need and how much are you willing to pay for reliability features? The more memory and the more reliability you demand, the more the 7500 series reliability features make sense—ERP and database applications are a prime candidate. For virtualization, you have two options. Some of you might prefer fewer, more reliable machines. Others may leverage the fact that HA (High Availability) is pretty easy with modern virtualization platforms and go for less RAS feature rich servers and get the availability by HA software instead of RAS features. Our first bet is that the low pricing of the AMD quad-CPU servers might seduce a lot of system administrators to go for the latter. Let us know what you prefer and why.
Dell R810 and Intel Nehalem EX Platform
Dell no longer seems to focus on cost alone, but also on offering innovative features. In the new Dell servers we find two very interesting features that set them apart from the pack. The first one is a new redundant SD-card module for embedded hypervisors. This module is similar to previous embedded hypervisor SD card solutions, but adds mirroring to the feature set. You cannot install any form of Windows on it as Windows refuses to be installed on a "removable USB device". 1GB might be enough for some Linux installations, but the enterprise versions require more than 1GB most of the time. So it only seems fit for an ESXi hypervisor.
The server has no less than 32 DIMMs, which are available even if you install only two CPUs. This second innovation is called the "FlexMem Bridge" technology. You can see in the picture below that only two aluminum heatsinks with copper heatpipes are installed. The other two are rather simple black heatsinks.
When we remove those black heatsinks, we find the pass-through chip.
And below is the pass-through chip seen from the underside.
Having as many DIMMs on a quad-socket as on a dual-socket is pretty cool, though there are some limitations you should be aware of. The FlexMem Bridge is in fact a pass-through for the second memory controller, as you see can below.
This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers.
The Dell R810 is simply not meant to be the highest performing Xeon 7500 server out there. The reality is that a significant number of buyers out there will shrug their shoulders when reading that 32 Nehalem cores are not running at full speed. Those buyers view the two extra CPUs as an unnecessary cost to obtain their real goal: getting a server with a copious amount of memory. If you are consolidating lightly loaded but critical web servers, firewalls, software routers, LDAP, DNS and other infrastructure applications, chances are you will never even need 16 cores to power them.
With the R810 Dell created the "entry-level" server for the Xeon 7500 market. It offers the reliability features of Intel's newest Xeon, 32 DIMMs slots, and excellent expansion options with six PCIe slots.
So the natural processor configuration for the Dell R810 is the dual Xeon 6500 series. When we specced a system with a dual E6540 2GHz, 128GB (32x4GB) and redundant PSU costs, we ended up with a price tag of $14400. For reference, a similar R710 with two Xeon E5540 and 128GB arrived at $11400. The latter system has to use sixteen 8GB DIMMs which raises the price quite a bit. But still $3000 difference is acceptable as the R810 delivers more expansion possibilities and is in a different class when it comes to reliability. For RISC buyers, a fully equipped system with these reliability features in the range of $14-$20k must sound cheap.
Quad Opteron 6100 systems will offer up to 48 DIMM slots. At the time of writing, we could not find server systems based on quad Opterons. It is clear that these systems will be cheaper, but an in-depth analysis of how reliability features influence these "massive memory" systems is necessary to make any reality-based conclusion. For now, we can state that the Dell R810 is making the Xeon 7500 market more accessible.
Benchmark Methods and Systems
Our methods and configurations were identical to our previous review. The only system added was the Dell R810:
Dell R810 Configuration:
Dual Xeon X7560 2.26GHz
Dell 05W7DG Motherboard with Intel ICH10R Southbridge (BIOS version: 0.3.2)
128GB (32 x 4GB) of DDR3-1066 (HMT151R7BFR8C Hynix)
NIC: quad Broadcom BCM5709C NetXtreme II GigE (1GB)
Xeon Server 1: ASUS RS700-E6/RS4 barebone
Dual Intel Xeon "Gainestown" X5570 2.93GHz, Dual Intel Xeon “Westmere” X5670 2.93 GHz
6x4GB (24GB) ECC Registered DDR3-1333
NIC: Intel 82574L PCI-EGBit LAN
PSU: Delta Electronics DPS-770 AB 770W
Opteron Server 1 (Dual CPU): AMD Magny-Cours Reference system (desktop case)
Dual AMD Opteron 6174 2.2 GHz
AMD Dinar motherboard with AMD SR5690 Chipset & SB750 Southbridge
8x 4 GB (32 GB) ECC Registered DDR3-1333
NIC: Broadcom Corporation NetXtreme II BCM5709 Gigabit
PSU: 1200W PSU
Opteron Server 2 (Dual CPU): Supermicro A+ Server 1021M-UR+V
Dual Opteron 2435 "Istanbul" 2.6GHz
Dual Opteron 2389 2.9GHz
32GB (8x4GB) DDR2-800
PSU: 650W Cold Watt HE Power Solutions CWA2-0650-10-SM01-1
vApus/Oracle Calling Circle Client Configuration
First client (Tile one)
Intel Core 2 Quad Q9550 2.83 GHz
GB (2x2GB) Kingston DDR2-667
NIC: Intel PRO/1000
Second client (Tile two)
Single Xeon X3470 2.93GHz
Intel 3420 chipset
8GB (4 x 2GB) 1066MHz DDR3
Our benchmarking is relatively limited. We have gone from typically 12 to 16 threads per server system to 48 and 64 thread systems in less than a year! The sharp increase in available threads is making an in-depth analysis of our benchmarks necessary. Our current choices of Oracle Calling Circle and vApus Mark I are being improved to measure the full potential of these high-thread servers. So the number of benchmarks performed by our own lab is rather limited. This situation should improve soon.
Understanding the Performance Numbers
A good analysis of the memory subsystems helps to understand the strengths and weaknesses of these server systems. We still use our older stream binary. This binary was compiled by Alf Birger Rustad using v2.4 of Pathscale's C-compiler. It is a multi-threaded, 64-bit Linux Stream binary. The following compiler switches were used:
-Ofast -lm -static –mp
We ran the stream benchmark on SUSE SLES 11. The stream benchmark produces four numbers: copy, scale, add, triad. Triad is the most relevant in our opinion; it is a mix of the other three.
The Xeon X7560 fails to impress. Intel's engineers expected 36GB/s with the best optimizations. Their own gcc compiled binary (–O3 –fopenmp –static) achieves 25 to 29GB/s, in the same range as our Pathscale compiled binary.
It is interesting to note that single threaded bandwidth is mediocre at best: we got only 5GB/s with DDR3-1066. Even the six-core Opteron with DDR2-800 can reach over 8GB/s, while the newest Opteron DDR3 memory controller achieves 9.5GB/s with DDR3-1333, almost twice as much as the Xeon 7500 series. The best single-threaded performance comes out of the Xeon 5600 memory controller: 12GB/s with DDR3-1333. Intel clearly had to sacrifice some bandwidth too to achieve the enormous memory capacity (64 slots and 1TB without "extensions"). Let's look at latency.
|CPU||Speed (GHz)||L1 (clocks)||L2 (clocks)||L3 (clocks)||Memory (ns)|
The L3 cache latency of our Xeon X7560 is very impressive, considering that we are talking about a 24MB L3. Memory latency clearly suffers from the serial-buffer-parallel DRAM transitions. We also did a cache bandwidth test with SiSoft Sandra 2010.
|CPU||Speed (GHz)||L1 CPU (GB/s)||L2 CPU (GB/s)||L3 (GB/s)|
The most interesting number here is the L3 cache since all cores must access it, and it matters for almost all applications. The throughput of the L1 and L2 caches is mostly important for the few embarrassingly parallel applications. And here we see that the extra engineer on the Nehalem EX pays off: it clearly has the fastest L3 cache. The Opteron are the second fastest, but the exclusive nature of the L3 caches may need quite a bit more bandwidth. In a nutshell: the Xeon 7500 comes with probably the best L3 cache on the market, but the memory subsystem is quite a bit slower than on other server CPU systems.
Decision Support benchmark: Nieuws.be
|Decision Support benchmark Nieuws.be|
|Operating System||Windows 2008 Enterprise RTM (64 bit)|
|Software||SQL Server 2008 Enterprise x64 (64 bit)|
|Benchmark software||vApus + real-world "Nieuws.be" Database|
|Database Size||> 100GB|
The Flemish/Dutch Nieuws.be site is one of the newest web 2.0 websites, launched in 2008. It gathers news from many different sources and allows the reader a personalized view on all the news. The Nieuws.be site sits on top of a large database—more than 100GB and growing. This database consists of a few hundred separate tables, which have been carefully optimized by our lab (the Sizing Server Lab).
99% of the loads on the database are selects, and about 5% of them are stored procedures. Network traffic is 6.5MB/s average and 14MB/s at the most, so our Gigabit connection still has a lot of headroom. Disk Queue Length (DQL) is at 2 in the first round of tests, but we only report the results of the next rounds where the database is in a steady state. We measured a DQL close to 0 during these tests, so there is no tangible intervention of the hard disks.
Attention: since our twelve-core Opteron review, we are using a new heavier log. As the Nieuws.be application became more popular and more complex, the database has grown and queries have become more complex too. The results are no longer comparable to previous results. They are similar, but much lower.
First, you may notice that our dual Xeon X5670 is 6% higher compared to what we reported in our twelve-core Opteron review. As we changed the benchmark and tested a lot of configurations in less than a week (before the launch of the Magny-Cours Opteron), we made an error. We tested with the power option set at "balanced power", which lowered the score of the Xeon X5670. We now tested with the "high performance" power setting, as we did on the Opterons.
Our data mining benchmark scales well with more cores, so the performance delivered by the X7560 is a bit lower than expected. The high memory latency and relatively low bandwidth per core might slow the octal-core Xeon down.
SAP S&D 2-Tier
|SAP S&D 2-Tier|
|Operating System||Windows 2008 Enterprise Edition|
|Software||SAP ERP 6.0 Enhancement package 4|
|Benchmark software||Industry Standard benchmark version 2009|
|Typical error margin||Very low|
The SAP SD (Sales and Distribution, 2-tier internet configuration) benchmark is an interesting benchmark as it is a real world client-server application. We decided to look at SAP's benchmark database. The results below all run on Windows 2003 Enterprise Edition and MS SQL Server 2005 database (both 64-bit). Every "2-tier Sales & Distribution" benchmark was performed with SAP's latest ERP 6 enhancement package 4. These results are not comparable with any benchmark performed before 2009. The new "2009" version of the benchmark obtains scores that are 25% lower. We analyzed the SAP Benchmark in-depth in one of our previous server oriented article. The profile of the benchmark has remained the same:
- Very parallel resulting in excellent scaling
- Low to medium IPC, mostly due to "branchy" code
- Somewhat limited by memory bandwidth
- Likes large caches (memory latency!)
- Very sensitive to sync ("cache coherency") latency
Since we gather the benchmark data from the SAP site, we have to work with what we found so far. A quad Xeon X7560 outperforms an octal-core Opteron 8435 at 2.6GHz by small margin (3%). A quad Opteron 6176 at 2.3GHz should score about 48k-50k. That is competitive performance, but this market will probably prefer the Xeon platform, as price is less an issue and reliability features are on top of the checklist. The Power 7 servers outperform the Nehalem EX CPUs, but the top models (3.55GHz) cost around $100k.
Virtualization and Consolidation
VMmark—which we discussed in detail here—tries to measure typical consolidation workloads: a combination of a light mail server, database, fileserver, and website with a somewhat heavier Java application. One VM is just sitting idle, representative of workloads that have to be online but which perform very little work (for example, a domain controller). In short, VMmark goes for the scenario where you want to consolidate lots and lots of smaller apps on one physical server.
Very little VMmark benchmark data has been available so far, but it is obvious that this is favorite playing ground of the Xeon 7500. It outperforms an octal 2.8GHz Opteron by a large margin. Granted, the octal Opterons scale pretty badly in most applications, but VMmark is not one of them. It is reasonable to expect that a quad twelve-core Opteron 6100 series will outperform older higher clocked octal six-core Opterons in many applications including SAP, OLTP and data mining benchmarks. After all, the communication between the cores has vastly improved. But VMmark is running many small independent applications, which usually run on the same node, so the chances are slim that the quad Opteron 6100 will come even close to the quad Xeon X7560.
vApus Mark I: Performance-Critical Virtualized Applications
As we've discussed previously, our vApus Mark I benchmark is due for a major overhaul. We found out that the 24 cores of the Opteron 6172 were not at the expected 85-95% CPU load, and thus the numbers reported were under the potential of the twelve-core Opteron. To get an idea of where the Xeon X7560 would land, we disabled Hyper-Threading, as our test is capable of stressing 16 cores/threads easily. The dual Xeon X7560 was about 5% slower than the Xeon X5670 with Hyper-Threading enabled, and about 13% faster than the dual octal-core Opteron 6136 2.4GHz. Considering that we found that performance is about 15% higher due to Hyper-Threading, we estimate that the dual Xeon X7560 at 2.26GHz is about 10% faster than a Xeon X5670 at 2.93GHz, and about 29% faster than the octal 2.4GHz Opteron 6136. So core per core, clock per clock the Xeon X7560 has probably in the neighborhood of a 30% performance advantage over the Opteron. Once vApus Mark II is ready, we'll provide more accurate numbers.
However, that is not enough to win the price/performance or performance/watt comparison. An octal-core Xeon X7560 costs four times more and the server consumes a lot more than a similar (clock speed, core count) Opteron 6136.
At this point in time, performance/watt comparisons are impossible. The current AMD Opteron 6100 systems available to reviewers are very basic reference systems. They consist of a motherboard, server CPUs, and low RPM desktop fans. The Xeon X7560 systems available are fully featured quad-socket systems with remote management cards, SAS/SATA backplanes, an extra daughterboard for PCIe expansion slots and so on. A decent power comparison can only happen when we receive a similar OEM Opteron system. We still expect that the Xeon X7560 systems will need more power. For example, a medium range server will come with eight SMBs, good for about 60W extra power consumption. High-end quad systems will have 16 SMBs, probably good for 120W. Add to that most Xeon 7500 are in the 130W TDP range (Opterons: 115W TDP) and that the chipset consume slightly more too, and you can see why the Xeon X7560 will likely require more power.
At this moment, we have started with our tests on the quad Xeon X7560. That is the natural "habitat" for this Xeon if you want to measure performance. The dual Xeon Nehalem EX is not meant to be a top performer, but instead enables servers with high amounts of memory protected by a battery of reliability features. We have to admit that we do not have the tests at this time to check Intel's RAS claims. But Google's own research shows that we should not underestimate soft errors, and VMware's enthusiasm for MCA tells us that we should take this quite serious. Microsoft has pledged to support this in Windows 2008 R2; Red Hat and Novell are supporting MCA too in their next releases. Software support will be here pretty soon.
From that point of view, the Dell R810 makes a lot of sense: running close to hundred VMs on a 256GB system with only ECC for protection seems risky: one bad VM can take down a complete host system. High availability at the software level as found in VMware's vSphere is fine, but 100 failing VMs can wreak havoc on the performance of your virtualized cluster. So in the "workloads needing large amounts of memory and availability" market, we don't see any real competition for the newest Intel Xeon when running ERP, OLTP or virtualization loads. Dell's R810 has made these kinds of high reliability servers more accessible with the R810, and for that we have to congratulate them. It's too bad that Intel does not play along.
We feel Intel has missed an opportunity with the pricing of the Xeon 6500 series, which is high for a dual-CPU capable server processor. Intel could make it easier for Dell to bring "mainframe RAS" to IT departments with a smaller budget. Right now, two server CPUs (X6550) can easily be up to 35% of the cost of a R810 server, which is luckily still an affordable server. Those with Oracle or IBM software licenses don't care, but the rest of us do.
The situation is very different if we look at the quad Xeon servers. The Dell R910 in the midrange and the IBM X3950 servers in the high-end really bring the x86 server to new heights. For $50,000 it is possible to spec a 32-core quad Xeon X7560 system with 512GB of RAM. For comparison, for that kind of money, you'll get 16 Power 7 cores at 3GHz and only 64GB of RAM in an IBM Power 750 system. The Power 7 might still be the fastest server around, but the Xeon 7500 servers are a real bargain in this market.
The Xeon 7500 is not for the HPC processing and bandwidth craving people; AMD has a better and most of all cheaper alternative. Likewise, the 7500 does not offer the price/performance or performance/watt that the popular dual-CPU servers offer. And there is a key market where we prefer the AMD Opteron 6100: the data mining market. AMD's twelve-core Opteron performs great here, and a very rare memory glitch should not be a disaster for the data mining people: you just start your reporting query again. Many work on a copy of the production database anyway.
But for the rest, the Xeon 7500 series does what it's supposed to do. It out-scales the other Xeons in key applications such as ERP, OLTP, and heavy virtualization scenarios while offer RISC—or should we say "Itanium"—like reliability. And looking at the published benchmarks, it is a winner in the Java benchmarks too. The Xeon 7500 is the most attractive expandable/MP Xeon so far, and the first one that can really threaten the best RISC CPUs in their home market.
I like to thank Tijl Deneut of the Sizing Servers Lab (Dutch) for his help with the benchmarking.