Original Link: http://www.anandtech.com/show/2201
Intel Clovertown: Quad Core for the Massesby Jason Clark & Ross Whitehead on March 30, 2007 12:15 AM EST
- Posted in
- IT Computing
The age of multi-core is upon us, and the game of who has the highest clock speed has turned into who has the most cores (at least for now). Intel released Clovertown in Q4 of 2006, a bit ahead of its originally scheduled 2007 launch date. Obviously, the reason for the early launch was at least partially to ensure they were the first to market with quad core, ahead of rival AMD.
Clovertown is targeted at dual socket servers, typically in a 1-2U form factor. It launched with speeds up to 2.66 GHz, with 3.0 GHz on the horizon. Intel has also recently launched low voltage parts, which are rated at 50W and are clocked at 1.86 and 1.60 GHz.
So, what applications could benefit from eight cores? Today, the obvious choice is virtualization, although database servers, exchange servers, and compute clusters would also be good candidates. Virtualization is the primary target for Clovertown; a rack of ESX servers running on 2U Clovertown boxes would consolidate a significant amount of business applications in a relatively small foot-print.
Last year, at an IBM technical conference, one of their senior technical representatives said the following: "In the coming years, the operating systems we use today will be merely applications running in a single operating system". Although you could say that's true today, it's only the beginning of what is going to be a complete shift in the traditional way we approach and think about "servers". Virtualization is growing at an exponential rate, and the shift to multi-core is only going to accelerate that growth.
Although a significant portion of Clovertown systems will be deployed in virtualized environments, there will be some used in the more traditional single purpose server scenarios. However, there's something to keep in mind if you plan to throw eight cores at your database server or any other server that is I/O intensive. You have now increased your processing power by at least two fold relative to a dual core configuration, and ensuring that your I/O subsystem is capable of keeping up with that extra processing power may be difficult. As you will read later in the article, we ran into significant issues with our test suite with eight cores and our I/O subsystem.
Architecture & Roadmap
It's no secret that Clovertown isn't what the purists would call a "true quad core" architecture; it is two Woodcrest processors joined together in a single package. Does it matter? In our opinion, no. Clovertown performs very well, as you will see later in the article.
Clovertown is going to be with us for most of 2007 until Penryn is released, which is essentially a die shrink to 45nm. It is doubtful we will see a "true" quad core Intel part until the next generation architecture is released in 2008, code-named Nehalem. Below is the most recent server roadmap we have for the server platform. The part marked "Future Processor" in the Xeon DP Platform and UP Platform is Nehalem. You can read more about Nehalem and Penryn in our recent article on that subject.
Clovertown at its heart is two Woodcrest parts connected together on a single package. Each pair of cores shares a single 4MB of L2 Cache, just like Woodcrest and the pair of cores shares a single 1066/1333 MHz pipe. For most Woodcrest systems, Clovertown will be a drop-in replacement after a BIOS upgrade. We tested the Clovertown in a spare Supermicro board we had in the lab, and had no issues upgrading it from dual core to quad core. For a more in-depth analysis of Clovertown architecture, check out a Johan's very thorough write-up on Clovertown.
What do you compare Clovertown to? Since there are no other quad core solutions to compare it to, we were stuck with how to compare it. Do you compare it only to existing dual socket options, Woodcrest and Opteron? Do you compare it only to existing eight way options, quad socket Opteron? There is no perfect answer, but we decided that comparing it to the previous Intel solution, Woodcrest, allowed us to explore scalability of the quad core architecture vs. the dual core architecture.
We also decided to include quad socket Opteron numbers for reference. We recognize that comparing a quad socket server to a dual socket server is a bit like comparing apples and oranges, but we decided to provide the results regardless until we see K10 and can do a proper comparison of quad core technologies. Let's not lose track that we are comparing two different technologies with totally different cost structures and power consumption profiles.
Another problem we had was the additional processing power that Clovertown provided with two sockets and eight cores. We found that we could no longer run our previous benchmarks, Dell DVD Store and our Forums Benchmark, as we did not have enough I/O throughput to handle the additional processing power. In our lab we have a Promise VTrak J300s which is a 12 disk SAS chassis, but we found that using 12 disks was not enough for our old benchmarks. We estimated we needed approximately 36-48 disks to be able to continue running our OLTP benchmarks. We were not able to "obtain" the required chassis and spindles so we decided to change our benchmark suite.
We did consider several SSD flash based disks but it seems there are more "announced" SSD Flash based drives than there are "shipping" drives. Until we can significantly increase our IP capacity in our lab we will no longer be running OLTP based benchmarks. Our preference to increase our I/O capacity would be an SSD solution as it would not require spinning dozens of drives in several chassis but... neither is available to us at this time.
Quest Software Benchmark Factory
We mentioned that the benchmarks we previously used were no longer useful, as we did not have the I/O capacity required to support them. We went looking for alternative benchmarks, and stumbled upon Benchmark Factory from Quest Software. Below, is a description of the product and the benchmarks we used in this article.
Benchmark Factory for Databases is a performance and code scalability testing tool that simulates users and transactions on the database and replays a production or synthetic workload in non-production environments. This enables organizations to validate database scalability as user loads increase, application changes are made, and platform changes are implemented. Benchmark Factory is available for Oracle, SQL Server, DB2, Sybase, MySQL and other databases via ODBC & Native connectivity.
Benchmark Factory provides many tests you can run, and has a very nice and customizable metric reporting engine. We decided to run the AS3AP test, and the Scalable Hardware CPU, Reads, and Mixed tests. Here is what Quest's help file says about these tests:
The AS3AP benchmark is an American National Standards Institute (ANSI) Structured Query Language (SQL) relational database benchmark. The AS3AP benchmark provides the following features:
- Tests database processing power
- Built-in scalability and portability that tests a broad range of database systems
- Minimizes effort in implementing and running benchmark tests
- Provides a uniform metric and straightforward interpretation of benchmark results
The Scalable Hardware benchmark measures relational database systems. This benchmark is a subset of the AS3AP benchmark and tests the following:
- Any combination of the above three entities
The Clovertown system was supplied by Intel, a standard white box Starlake based platform. We outfitted the system with 8x2GB FB-DIMM 667MHz modules from Micron (MT18HTF25672FDY-667). A single SATA drive was used for the OS, and the system was powered by the OEM supply it came with. We would have liked to use the same power supply for both systems; however, the Clovertown board requires a special 4-pin cable that the Enermax supply doesn't have. Note also that we were not able to get samples of the fastest currently shipping Clovertown processors, and as such the Intel quad core configuration is not the best performing Intel CPU available. (2.66 and now 3.00 GHz versions are available.)
Quad Socket Opteron System
Our contacts over at Tyan provided us with their Thunder n4250QE, which can support up to eight sockets with the M4985 add on module. We outfitted the Thunder with 16x1GB Kingston KVR667D2D8P5/1G 667MHz DDR2 modules. A single SATA drive was used for the OS, and we powered the board with an Enermax Galaxy 1000W.
RAID Storage for both systems
LSI Logic 8480E MegaRaid Controller
Promise VTRAK J300s SAS Chassis
12 x 146GB Fujitsu 15,000 RPM SAS Drives configured in RAID 0
Windows 2003 Enterprise SP1 x64
SQL 2005 Enterprise x64 SP1
Quest Benchmark Factory v5.0 Build 296
Dual Opteron 2220 using a Tyan S3992 motherboard with 8GB of memory
Dual Woodcrest 3.0GHz using a Tyan S2696 motherboard with 8GB of memory
Dual Clovertown 1.86 using a SuperMicro X7DBE+ motherboard with 2GB of memory
We'd like to thank Claudia Martinez over at Kingston, Kelly Sasson over at Micron, Randy Saucedo over at Quest Software, and a big thanks to Frank Chang and our friends over at Tyan for supplying the motherboard for the quad socket Opteron and for helping us out with an additional test client motherboard.
The 2.33 GHz Clovertown shows a 32% scaling over the 3.0 GHz Woodcrest, not too bad considering the drop in clock from 3.0 GHz to 2.33 GHz. It is interesting to note that the performance of the 1.86 GHz Clovertown is comparable to the 3.0 GHz Woodcrest, which again shows very strong scaling with double the cores and almost half the clock. Ignoring the apples to oranges comparison, the quad Opteron is able outperform the Clovertown 2.33 GHz by as much as 21%.
It's no surprise that Woodcrest ramps up the CPU utilization the quickest. What is surprising is that the Opteron is not able to go higher than 86% utilization. We can only speculate that this high transaction benchmark has uncovered a bottleneck in the Opteron architecture.
AS3AP Power Numbers
All of the Intel products are 80W parts, and there are only two sockets on each platform. Both Intel configurations exhibit very similar power consumption profiles. On the other hand, the Opteron is a 95W part and there are four sockets on the 8220 platform. thus it uses more power.
With the power of the Intel products being almost identical, the 2.33 GHz Clovertown is able to translate its 32% lead in performance over Woodcrest into a 33% lead in Performance/Watt.
Scalable Hardware CPU Transactions/sec
In this test, the 2.33 GHz Clovertown is able to outperform Woodcrest by as much as 51%. We also see that the 1.86 GHz Clovertown is as much as 19% slower than the 2.33 GHz part, which demonstrates very good scaling considering the clock is 20% slower. The Opteron is not as dominant in this benchmark but still averages a 10% lead.
Again, Woodcrest ramps up very quickly, and then we see the other three with very similar CPU utilization profiles.
Scalable Hardware CPU Power Consumption
This test produces a similar power consumption graph with the Intel products all together and consistent and the Opteron a little hungrier due to having four sockets/processors instead of two, and 95W/socket instead of 80W/socket.
In this graph we see that Clovertown is very close to Woodcrest for the first three lead points, but after that Woodcrest runs out of steam and Clovertown is able to translate it's 51% higher Transactions/Second rate into a 51% higher Performance/Watt rating.
Scalable Hardware Mixed Transactions/sec
In this test we see the 2.33 GHz Clovertown and Opteron performing almost identically with the 1.86 GHz Clovertown and Woodcrest falling behind by as much as 20% and 36% respectively.
We get similar results as other tests with Woodcrest ramping first and the others similarly grouped.
Scalable Hardware Mixed Power Consumption
Again, we get similar power consumption results to the previous tests.
Clovertown continues to demonstrate excellent scaling over Woodcrest by as much as 42%.
Scalable Hardware Reads Transactions/sec
The results of this test produce a different profile. We see that the Clovertown products are similar, except for the 24% difference in performance. The 2.33 GHz Clovertown is able to outperform Woodcrest by as much as 42%, again nice scaling considering the clock difference. It is interesting to note that the Opteron took a dive at load point 3; we are not sure why this is, but all three executions were within 1.7% of each other.
Again we see Woodcrest ramping CPU the fastest, and also see a CPU dip at load point 3 for the Opteron which matches the performance drop we saw in the Transactions/Second graph.
Scalable Hardware Power Consumption
The power consumption results continue to be similar to the previous tests.
Here, the 2.33 GHz Clovertown is able to produce a Performance/Watt results which is as much as 42% higher than Woodcrest and 26% higher than the 1.86 GHz Clovertown.
As we stated earlier on, it was difficult to review Clovertown without having K10 (Barcelona) or other competing quad core technology. Intel was first to market with quad core in a single package, and with very solid performance and power consumption figures. The purists were skeptical about Intel taking their approach to quad core, and we believe Intel has proved them wrong. In a recent article over at The Register, an AMD Executive VP apparently said, "If I could do something different, I wish we would have immediately done an MCM - two dual cores and call it a quad core - because, I guess, the market sucks it up."
The next 6-9 months are going to be very interesting with the arrival of Barcelona and Penryn. It looks like K10 will be first to market and we already know it will be delivered with maximum clocks in the 2.1 GHz - 2.3 GHz range. Intel just announced price cuts and availability of 3.0 GHz Clovertown CPUs. As for Penryn, how much of a clock bump, extra cache, and power savings is Intel going to be able to recognize from the shrink to 45nm? If nothing else, it should allow Intel to be able to deliver its higher clock CPUs in lower TDP envelopes with more cache than the existing 120W for the 2.66 GHz Clovertown.
As we said earlier this week in our Penryn coverage, it's too early to say for sure which parts will come out on top. AMD is stating that their native quad core solution will be faster clock for clock than anything Intel has at the same time. That's a great marketing statement, but with clock estimates of 2.30GHz for Barcelona and 3.20GHz for Penryn, such statements don't really tell us much. In order to remain at parity when comparing the top performing parts, AMD will apparently need to be about 40% faster clock for clock, and they haven't put forth that claim yet. Needless to say, we will be eagerly awaiting actual hardware for testing from both companies, and very likely we will end up with a situation where each platform will have certain applications where it excels.