Original Link: http://www.anandtech.com/show/2330



The clock has "ticked" and Intel has released a refresh to the quad-core Xeon line-up, code-named Harpertown. AMD has also finally released their quad-core Opteron, code-named Barcelona. Intel is on what they like to call a tick-tock release cycle of processors. Every "tick" is a refresh of the current architecture, and a "tock" represents a new architecture. AMD doesn't seem to be on any pattern of release cycles, and the Barcelona launch is a bit late and not as well organized as some of their previous product launches.


Harpertown will launch with clock speeds all the way up to 3.16GHz, and will also ship two low voltage parts (2.3GHz and 2.6GHz). The rumor mill speculates that Intel may be able to reach 3.4GHz with the new 45nm process shrink. Barcelona on the other hand is launching at 2.0GHz with speeds down to 1.7GHz. There will be three low voltage Barcelona parts at launch: 1.7GHz, 1.8GHz and 1.9GHz. Frankly, it's more than a bit disappointing that AMD wasn't able to launch at higher clock-speeds; however, they are planning to have higher-clocked parts towards year-end that will only require a few more watts to run.

For quite some time now Intel has been living the high-life in the quad-core arena, even though both AMD and the media criticized them for gluing two dual-core processors together to create their quad-core product line. AMD has lost market share to Intel over the past couple of years, mostly due to the success Intel has had with their current Core architecture. One does wonder if AMD might have sat too long on the Opteron before making head-way into a new design or moving along a bit quicker to quad-core; yes, there was work happening, including an aborted architecture, but when you're fighting the reigning heavyweight such mistakes can be costly. Obviously, AMD has had a rough year with respect to their finances, but hopefully they are on the mend and Barcelona is the beginning of an upswing.

We've already looked at Barcelona in several previous articles, but Harpertown is the new kid on the block this week. That being the case, we'll start with a closer look at Intel's latest addition to their lineup.



What's new with the Harpertown Xeon

Although Harpertown represents a "tick" (or minor update according to Intel's nomenclature), a lot has changed. Harpertown not only includes a variety of micro-architecture changes, but it also is also based on a 45nm manufacturing process. You'll notice that most of the tweaks that Intel has introduced focus on keeping the processor from going out to main memory. Below is a list of the main highlights of what's new:

45nm

The new Xeon is a 45nm part, which lowers power consumption, reduces die size (and increases transistor count), and helps Intel reach higher clock speeds. Harpertown will top off at 3.2GHz at launch, but higher clock speeds are rumored to follow. For an in-depth look at Intel's 45nm process, read the following article.

1600MHz FSB

With the new Stoakley platform, the Front Side Bus (FSB) now tops out at 1600MHz. This bus increase should help Intel fight off their bus speed bottleneck woes awhile longer until QuickPath (Intel's on-die memory controller) makes its debut.

12MB L2

Each set of two cores has a total of 6MB of L2, which brings the total L2 cache up to 12MB. Again, this will allow Intel to stay out of memory as long as possible which should increase performance.

New SSE4 Instructions

Harpertown includes Intel Streaming SIMD Extensions 4 (SSE4) instructions, the largest unique instruction set addition since the original SSE Instruction Set Architecture (ISA).

High-K Process Technology

In order to extend Moore's Law, Intel uses a new material in their transistors which is a combination of high-k gate dielectrics and metal gates. This new technology increases the switching speed of the transistors and helps reduce power consumption to allow Intel to continue to deliver faster processors that consume less power.




AMD Quad-Core Opteron (Barcelona)

The new quad-core Opteron from AMD is the first true (x86) quad-core processor that features one die with four cores. Barcelona is based on a 65nm fabrication process, and rumor has it late next year AMD will move to a 45nm process. More than just four cores on one die, the new quad-core Opteron features several micro-architectural changes from the previous dual-core Socket-F platform. Below are the main highlights of quad-core Opteron.

Independent Dynamic Core Technology

AMD has always been strong with their performance-per-watt numbers, and with this new technology AMD can alter the frequency of each core individually. This should allow AMD to more tightly control their power levels thus decreasing overall power consumption and lowering TCO.

AMD CoolCore Technology

This unique technology evaluates which parts of the die (cores/memory or both) are required by the application, and can cut power to unused transistor areas to reduce power consumption and lower heat generation.

SSE128

The previous generation Opteron had to use two clock cycles to execute 128-bit SSE operations. Barcelona can now execute 128-bit SSE operations in a single clock cycle.

2MB L3 Shared Cache

In order to keep up with multi-threaded applications, AMD added 2MB of L3 cache that all four cores share. The cache breakdown for the new Barcelona is as follows: 64KB L1 cache (64K each for data and instructions) and 512KB L2 cache per core, and then the 2MB L3 shared cache.

Drop-in to existing Socket-F

Barcelona is based on the previous generation Socket-F platform, allowing most modern Socket-F servers (with a BIOS update) to drop in a quad-core Barcelona part. We can attest to this fact, as our S3992 Tyan Board received a BIOS upgrade and was off to the races with Barcelona.




Quest Software Benchmark Factory

We mentioned that the benchmarks we previously used are no longer useful, as we did not have the I/O capacity required to support them. We went looking for alternative benchmarks, and stumbled upon Benchmark Factory from Quest Software. Below is a description of the product and the benchmarks we used in this article.

Benchmark Factory for Databases is a performance and code scalability testing tool that simulates users and transactions on the database and replays a production or synthetic workload in non-production environments. This enables organizations to validate database scalability as user loads increase, application changes are made, and platform changes are implemented. Benchmark Factory is available for Oracle, SQL Server, DB2, Sybase, MySQL and other databases via ODBC and Native connectivity.

Benchmark Factory provides many tests you can run, and has a very nice and customizable metric reporting engine. We decided to run the AS3AP test, and the Scalable Hardware CPU, Reads, and Mixed tests. Here is what Quest's help file says about these tests:

AS3AP

The AS3AP benchmark is an American National Standards Institute (ANSI) Structured Query Language (SQL) relational database benchmark. The AS3AP benchmark provides the following features:
  • Tests database processing power
  • Built-in scalability and portability that tests a broad range of database systems
  • Minimizes effort in implementing and running benchmark tests
  • Provides a uniform metric and straightforward interpretation of benchmark results
Systems tested with the AS3AP benchmark must support common data types and provide a complete relational interface with basic integrity, consistency, and recovery mechanisms. The AS3AP benchmark can test systems ranging from a single-user microcomputer Database Management System (DBMS) to a high-performance parallel or distributed database.

Scalable Hardware

The Scalable Hardware benchmark measures relational database systems. This benchmark is a subset of the AS3AP benchmark and tests the following:
  • CPU
  • Disk
  • Network
It can also test any combination of the above three entities

We run three iterations of each load point, and then average the results. We also monitor deviations to ensure they are within an acceptable range. We like to see a max deviation of +/- 3%.

Choosing the contenders

In previous articles, we've been asked to explain why we chose the parts we did for an article. For this article Intel sent us their 3.0 GHz Harpertown CPUs. We requested the 3.0 GHz Clovertown CPUs, which are 120 Watt TDP parts, to allow us to do a clock to clock comparison of Harpertown to Clovertown. We also tried to get Harpertown 2.66 GHz or 2.5 GHz CPUs but none were available. These would have provided us with the closest cost comparison to the Opteron 2350's, but it was not possible. We resourcefully acquired two of AMDs newest Opteron 2350's and we requested the Opteron 2222 3.0 GHz Opteron CPUs, which are the highest clock in the 95 Watt TDP envelope. We did review the results of the Opteron 2224SE 3.2 GHz 119W TDP CPUs but their performance was only marginally better than the 2222's and their performance/watt was consistently lower and thus we concluded of less of interest for this article.



AMD System

The AMD system uses two 2.0GHz Barcelona (Opteron 2350) or two 3.0GHz Santa Rosa (Opteron 2222) processors mounted on a Tyan S3992 main board, with 8x1GB of DDR2-667 OEM memory. Internal cooling consists of five 3.5" fans and two CPU fans. Internal storage is provided by one WD1600YD hard drive, which is where the OS is installed.

Intel System

The Intel system uses two 3.0GHz Harpertown (Xeon E5472) or two 3.0GHz Clovertown (Xeon X5365) processors mounted on a Supermicro X7DWN+ main board, with 4x2GB 800MHz OEM FB-DIMMs. Internal cooling consists of three 3.5" fans. Internal storage once again comes from one WD1600YD hard drive with the OS installed.


Harpertown System

Barcelona System

RAID Storage
LSI Logic 8480E MegaRaid Controller
Promise VTRAK J300s SAS Chassis
12 x 146GB Fujitsu 15,000 RPM SAS Drives configured in RAID 0

Operating System/Software
Windows 2003 Enterprise x64 SP2
SQL 2005 Enterprise x64 SP2



AS3AP


This graph shows that at idle the Intel systems use approximately 28% more power than the AMD systems. We can safely assume this is mostly due to Intel's use of FB-DIMMs -- and the results will be even worse as additional memory is added.


For the first two load points it is very close but AMD is able to lead by a small margin. Once we hit load point three the Xeon E5472 is able to take the lead with the X5365 close behind. You can clearly see the improvement of Harpertown over Clovertown here as Harpertown is as much as 13% faster than an equivalently clocked Clovertown. Harpertown is also able to outpace Barcelona by as much as 27%. You can also see that Barcelona is able to outperform dual-core Opteron by as much as 38%, despite the much slower clock speed.


All quad-core processors exhibit similar CPU utilization for all load points. Again, we see the improvements of Harpertown as it consistently uses less CPU than Clovertown.


Keep in mind that the Xeon X5365 parts are 120W TDP and the Xeon E5472 parts are 80W TDP. Intel is getting much closer to competing with AMD on system power consumption but is still not there. Harpertown still uses as much as 14% more power than AMD.


At the lower load points, where the platforms are very similar in performance, the AMD systems are able to take the lead by as much as 17%. From load point four and above the Xeon E5472 is able to pull ahead slightly, ending with a maximum lead of 11%.



Scalable CPU


Unlike the AS3AP benchmark where AMD is able to lead at the lower load points, Intel is able to dominate this benchmark. Again we see the improvements which Harpertown brings over Clovertown. The Xeon E5472 is able to lead the Opteron 2350 by as much as 59%. We also see that the quad-core 2350 does provide much more headroom than its dual-core 2222 sibling.


All quad-core systems exhibit similar CPU utilization with the dual-core Opteron ramping significantly quicker.


The Harpertown system is much more competitive on this benchmark with regards to power consumption. In fact, all quad-core parts are within 4-8% of each other.


With Intel's better performance and much closer power consumption on this benchmark, Harpertown is the clear leader for all load points. The Xeon E5472 is able to lead by as much as 46% over the Opteron 2350. Barcelona is competitive with Clovertown, but it has a long way to go to match Harpertown



Scalable Mixed


Again, Intel is able to dominate the Opterons. Harpertown is noticeably faster than the same clocked Clovertown. Harpertown is also able to lead by as much as 56% over Barcelona.


In this test we see that the quad-core CPUs do not exhibit results similar to previous tests. The Intel quad-core parts use a little more CPU than the AMD quad-core part.


Similar results to AS3AP with Xeon E5472 not being as conservative on power as the Opterons.


Despite slightly higher power use, the performance advantage results in the Xeon E5472 dominating the results. It is as much as 41% better than the other offerings. Again, Barcelona is comparable to Clovertown in performance/watt.



Scalable Reads


This is yet another benchmark where Intel dominates and the Xeon E5472 performs better than the same clocked Xeon X5365. Harpertown is as much as 39% faster than Barcelona.


Here we see similar results to previous CPU utilization graphs.


Again, Intel is getting closer to AMD power use, but they're not quite there (and likely won't be while using FB-DIMMs).


Harpertown is once more the clear leader here. Harpertown is able to lead Barcelona by as much as 22%. In this test, however, Barcelona is able to surpass Clovertown efficiency.



Conclusion

Finally we had an opportunity to pit two quad-core parts from the CPU giants against each other and see who has the better part. The question is, what makes a better processor? Is it how quickly it can accomplish a given workload? Is it how much performance it offers over how much it costs? Is it how much performance it offers over how much power it consumes? The answer is more than likely all of the above in some proportion.

Performance

Intel has made some successful changes to the quad-core Xeon that have helped it achieve as much as a 56% lead in performance over the 2.0GHz Barcelona part. Of course this is mostly due to the fact that the Harpertown part has a 1GHz clock speed advantage, and the various micro-architecture tweaks surely help fill in the rest. It's clear that AMD has potential with Barcelona, and it will be extremely interesting to see where they end up as clock-speed ramps. With 2.5GHz parts due out before the end of the year, the difference between AMD and Intel may not be all that great - barring any other announcements, of course.

Performance / Watt

AMD has always been extremely strong in performance/watt, especially at the lower load levels and even more so at idle. Barcelona uses the least amount of watts at idle and manages to come close to the new Harpertown parts on AS3AP; however, Intel due to its 1GHz clock advantage takes the lead on every other benchmark, particularly at higher loads. Again, AMD needs to ramp clock speed in order to compete with Intel, and it looks like that will happen over the next few months. The question is, will it come soon enough to start winning back some market-share?

Price

While Barcelona is still difficult to get ahold of, the expected price of the Opteron 2350 should be around $400. Harpertown is brand new, so we're not yet able to find any prices in the retail market, but the expected price for the E5472 will be around $1000 when the 1600FSB parts launch. The new Harpertown E5430 (2.66GHz) is expected to cost close to $450 while the E5420 (2.33GHz) will cost closer to $320. FB-DIMMs carry a slight price premium over registered DDR2-667 ECC memory, but these days RAM prices are pretty much comparable. The bottom line is that for 2S systems, it appears that AMD may have a small pricing advantage at the low-end (at least until any Intel price cuts occur). However, considering the overall cost of a well equipped 2S server/workstation, saving a few hundred dollars for equivalent performance may not be enough to sway purchasing decisions.

Two weeks ago, AMD's standing in the IT world was definitely in question. Barcelona may not be the knockout punch that many were hoping for, but it definitely makes them far more competitive. The fact that Barcelona is a drop-in replacement for existing Socket-F systems certainly doesn't hurt, although we could say the same thing about Harpertown and existing Intel Core systems. There is of course one area where AMD still does have an advantage: 4-way and higher server configurations, where their Direct Connect topology has some distinct advantages that may not be overcome for quite some time. All we need now is to see how fast AMD can ramp up production and availability of Barcelona, and how far they are able to push clock speeds. It will still be difficult for them to gain market share, but at least they should be able to stop the bleeding and hopefully return to profitability.

Update: For those that are looking for more details and wondering why certain other chips aren't included, at the time testing was conducted we did not have any of the faster 2.5GHz Barcelona chips (or the slower Harpertowns). That situation has been remedied in terms of AMD's CPUs, and we will have some update articles looking at how the faster Barcelona compares with other processors. Stay tuned....

Log in

Don't have an account? Sign up now