Introduction

The age of multi-core is upon us, and the game of who has the highest clock speed has turned into who has the most cores (at least for now). Intel released Clovertown in Q4 of 2006, a bit ahead of its originally scheduled 2007 launch date. Obviously, the reason for the early launch was at least partially to ensure they were the first to market with quad core, ahead of rival AMD.

Clovertown is targeted at dual socket servers, typically in a 1-2U form factor. It launched with speeds up to 2.66 GHz, with 3.0 GHz on the horizon. Intel has also recently launched low voltage parts, which are rated at 50W and are clocked at 1.86 and 1.60 GHz.

So, what applications could benefit from eight cores? Today, the obvious choice is virtualization, although database servers, exchange servers, and compute clusters would also be good candidates. Virtualization is the primary target for Clovertown; a rack of ESX servers running on 2U Clovertown boxes would consolidate a significant amount of business applications in a relatively small foot-print.

Last year, at an IBM technical conference, one of their senior technical representatives said the following: "In the coming years, the operating systems we use today will be merely applications running in a single operating system". Although you could say that's true today, it's only the beginning of what is going to be a complete shift in the traditional way we approach and think about "servers". Virtualization is growing at an exponential rate, and the shift to multi-core is only going to accelerate that growth.

Although a significant portion of Clovertown systems will be deployed in virtualized environments, there will be some used in the more traditional single purpose server scenarios. However, there's something to keep in mind if you plan to throw eight cores at your database server or any other server that is I/O intensive. You have now increased your processing power by at least two fold relative to a dual core configuration, and ensuring that your I/O subsystem is capable of keeping up with that extra processing power may be difficult. As you will read later in the article, we ran into significant issues with our test suite with eight cores and our I/O subsystem.

Architecture & Roadmap
POST A COMMENT

56 Comments

View All Comments

  • Viditor - Saturday, March 31, 2007 - link

    quote:

    Ah, right. I think that's part of what Ross was talking about when he discusses the difficulties in coming up with appropriates tests for these systems

    In general I've been finding that at 4 cores, K8 and Clovertown run about the same...anything over that goes to AMD. Of course (as Ross points out) a lot of this assumes that the software can actually scale to use the 4 or more cores. For example, MySQL doesn't appear to scale at all...

    We can be fairly certain that Barcelona will easily beat out any of the quad core Intel chips...I say this because based on the tests that you and Johan have done, even if Barcelona used old K8 cores they should beat them. However, things will not stand still for long on this front...
    1. Barcelona is a transitional chip which won't be on the market for long. The "+" socketed K10s start coming out the following quarter with HT3, and the added bandwidth should be a nice boost.
    2. Penryn comes out almost immediately afterwards, and with a 1600 FSB and a much higher clockspeed, it might be able to catch up to the K10s (I think a lot will be determined by what clockspeeds AMD is able to get out of the K10 at 65nm).
    3. The most interesting (and closest) area will be the dual cores (where most of us will be living). Because the FSB bottleneck is nowhere near as bad at dual core level, I suspect that Penryn and K10 will be absolutely neck and neck here. This is where we will absolutely need to see benches...I don't think anyone can predict what will happen in the desktop area until Q3 (and the benchmarks) comes around.

    As to the power section of the review, you guys did a fine job based on what you had to work with. Certainly it has nothing to do with Barcelona (as you say), and my guess is that you guys are absolutely salivating to get a Barcy for just that reason (I know I can't wait for you to get one!).
    The power section is (IMHO) going to be a main event on that chip...I can't wait to see how well the split plane power effects the numbers during benchmarks!
    I would like to put in my vote now...when you get your Barcy, could you do a review that encompasses power for real-world server applications? By that I mean could we see what the power draw is during normal use as well as peak and idle...?

    Cheers, and thanks for the article!
    Reply
  • ButterFlyEffect78 - Friday, March 30, 2007 - link

    I also was little confused to see AMD outperforming the Intel counterparts but then I asked myself how far the gap will be when K10 opteron comes out. And then just imagine one more time having 2xquad K10 in a 4x4 setup...Godly power? Reply
  • JarredWalton - Friday, March 30, 2007 - link

    Remember that the difference is four sockets vs. two sockets. AMD basically gets more bandwidth for every socket (NUMA), so that's why it's not apples to apples. Four dual core Opterons in four sockets is indeed faster in many business benchmarks than two Clovertowns in two sockets. Also remember that we are testing with 2.33 GHz Clovertown and not the 2.66 or 3.0 GHz models, which would easily close the performance gap in the case of the latter.

    Don't forget that four Opteron 8220 chips cost substantially more than two Xeon 5355 chips. $1600 x 4 vs. $1200 x 2. Then again, price differences of a few thousand dollars aren't really a huge deal when we're talking about powerful servers. $25000 vs. $27000? Does it really matter if one is 20% faster?

    One final point is that I've heard Opteron does substantially better in virtualized environments. Running 32 virtual servers on an 8-way Opteron box will apparently easily outperform 8-way Xeon Clovertown. But that's just hearsay - I haven't seen any specific benches outside of some AMD slides.
    Reply
  • yyrkoon - Saturday, March 31, 2007 - link

    quote:

    Don't forget that four Opteron 8220 chips cost substantially more than two Xeon 5355 chips. $1600 x 4 vs. $1200 x 2. Then again, price differences of a few thousand dollars aren't really a huge deal when we're talking about powerful servers. $25000 vs. $27000? Does it really matter if one is 20% faster?


    Yes, and no. You having worked in a data center, you know these types of system are often specialized for certain situations. That is why I said, majority Intel, and a few high performance AMD. I really dont know what these people are getting riled up about . . .

    quote:

    One final point is that I've heard Opteron does substantially better in virtualized environments. Running 32 virtual servers on an 8-way Opteron box will apparently easily outperform 8-way Xeon Clovertown. But that's just hearsay - I haven't seen any specific benches outside of some AMD slides.


    I follow virtualization fairly close, but I do not examine every_single_aspect. However, I can tell you right now, that AMD at minimum does hold the advantage here, because their CPUs do not require a BIOS that is compatable with said technology, Intel CPUs currently do. As for the performance advantage, this could have to do with the Intel systems having to have their BIOS act as middle man. Also last I read, fb-dimms were slower than DDR2 dimms, so perhaps this also plays a factor ? Another thing, how many Intel boards out there support 4-8 procesors ? The reason I ask, is that I havent seen any recently, and this could also play a factor *shrug*
    Reply
  • TA152H - Friday, March 30, 2007 - link

    It makes one wonder why the processors were compared in the first place. Did you guys throw processors in a hat and then pull them out and decide to benchmark them against each other? Why not throw a Tualatin in there just for kicks?

    OK, all sarcasm aside, does anyone actually think about these articles before they are written? It's not OK to put a disclaimer in to say you're making an unfair comparison, and then make it. I know it seems it is, but a lot of people don't read it, and there's an assumption, however false, that the article will be written with some common sense in it. A lot of people will see the charts, and that's what they'll base their reaction on. Is it their fault, to some extent, but still, you know they're going to do it, and they're going to get the false impression, and it's thus your fault for spreading misinformation (it amounts to this, even though it is qualified).

    If you compared on cost, that would be one thing. If you compared in market segment, still fair, if you compared the only quad core to the only quad core, I wouldn't like it, but I'd still think it was at least supportable. But, when you compared a high end 8-way Opteron with a low end Clovertown, you get these reactions from people where they see AMD beating Intel and they consider it significant. Look at your responses, there is no doubt of this.

    I'm not saying Opterons are vastly inferior in every situation, or I should say Opteron based systems, only that this article gives a false impression of that because of how people react to articles. They don't read every little thing, they don't even remember it, and they often walk away with a false impression because of poor choices on the part of the reviewers. People are people, you need to deal with that, however annoying it can be. But, even then, the choices were remarkably poor in that they tell a lot less than one based on closer competitors would have. The best Clovertown versus the best Opteron. The same power envelope. The same cost system. All are better choices.

    I agree with your response, but the problem is, the charts speak a lot louder than responses, and disclaimers. That's why you put charts in, after all, to get attention to convey ideas effectively. When you do this with improper comparisons, I think you can see the inherent illogic in that while at the same time defending it by talking about disclaimers and such. Again, look at your responses here.

    I also think the FB-DIMMS have a lot to do with Intel's relatively poor performance, and I don't think this was emphasized enough. It would be interesting to try to isolate how much of a performance penalty they have, although I don't know if this could be done precisely. Intel seems intent on using them more and more, and I fear they are heading into another RDRAM situation, where it may be a very good technology, but they introduce it too soon where it shows disadvantages and people get a negative impression on it. Obviously, they aren't pushing it the way they did RDRAM, but it seems to come with a much greater performance penalty (the 840 actually performed as well as the 440BX, overall, although the single channel 820 was kind of poor) and the cost is pretty high too, although probably not as bad as RDRAM was.

    One last tidbit of information about virtualization, since it's mentioned in the article. It's kind of ironic that such a poor selling machine had so much advanced technology in it, but the IBM RT PC not only paved the way for RISC based processors, but also had virtualization even back in 1986. AIX ran on top of the VRM (Virtual Resource Manager). The VRM was a complete real-time operating system with virtual memory management and I/O subsystem. With it, you could run any OS on top of it, (in practice, it was only AIX), and in fact several at the same time. In fact, it went even further with the VMI, which had a well-defined interface for things like I/O control, processor allocation functions, etc... I'm not sure what my point is, except that most of the "new" stuff today isn't new at all. Intel was talking about multicores in the early 1990s, in fact. I guess the trace-cache and double pumped ALUs were new, but their end product didn't seem to work that great :P.
    Reply
  • Jason Clark - Friday, March 30, 2007 - link

    First off, thanks for the feedback. We spent some time considering what to compare the Clovertown to, and ultimately made the decision to compare based on core count. Was it the right decision? We *think* so, but would have rather compared it to an equivalent part. Is it unfair? Yes. Do people skim read and make comments without having read the article? Sure. Would people have freaked out if we compared a Clovertown to a 4-way socket-f configuration? Absolutely

    The decision becomes one based on levels of "unfair", either decision would have been unfair which makes it pretty darn difficult to choose. It's a shame people don't read before commenting, although aren't just about all facets of life full of this? Your comment about comparing cost is a good one, although do you really thing given that people don't read power consumption numbers that they'd read a cost based graph? (Doubtful).

    The end game is that Intel made the right decision, and Clovertown is great product because of that. We are as anxious as everyone else to see what happens with K8L, and then Penryn.
    Reply
  • TA152H - Friday, March 30, 2007 - link

    Jason,

    I think the central premise of your remark is that it's not possible to choose completely equal setups, in this instance, and someone would cry foul regardless of your choices because it was not possible to make such firm selections. I am going to proceed with my response based on this premise, and I apologize in advance if I am misundertanding you.

    I do agree with what you're saying, but on the other hand I think you could have made it much closer than it was. I don't agree you minimized the "unfair" factor as well as you could. In fact, the Opteron cost more, ran at much higher clock speeds, and used more power. I'm not even going to complain about FB-DIMMs, or the FSB limitations Intel systems have, because they are inherent to that design and I think are completely legitimate. The benefits of using a memory controller on the chipset are obvious enough in certain configurations (it's kind of odd no one ever brings them up, and simply says the FSB is purely bad, but did anyone ever notice how much bigger the caches on Intel's processors? Hmmmm, maybe the saved space from not having a memory controller on board? How about video cards accessing memory? Do you really want to make them use the memory controller on the CPU, or add redundant logic to do it without it?). I'm not saying the memory controller on the chipset is better, overall, just that it has advantages that are almost never brought up. However, less and less as lithographies shrink and the size of the memory controller becomes less significant.

    OK, back from the digression. I'm saying you should have compared something like a 2.66 GHz Clovertown to a 2.8 GHz Opteron setup. Or taken a lower Opteron to compared with a 2.33 GHz Clovertown. You should stick with the same segment. To put it another way. You might have people that have "x" dollars to spend on a server. So you'd make a valid comparison based on price. It won't work for everyone, but it will for that segment and the others can at least see the logic in it. Or, how about people that have a power bracket they have to go under. The same would apply to them, it would just be a different group (or some would fall into both). Or how about the guy that wants the fastest possible system and has a devil may care attitude towards energy, noise levels, and cost. Your comparison didn't relate well to any group I am aware of. As I mentioned, the Opteron uses more power, is more expensive, while the Clovertown does not represent the best Intel has for that segment.

    So, I'm not talking about adding another chart that says something about being cost based. I'm saying compare valid processors in the first place, based on something that will be useful to a segment, as aforementioned, and created a whole bunch of useful charts instead of creating less useful ones and adding a chart at the end to somehow illustrate why it is less useful. I agree, most people won't read it, or even pay much mind to it if they did. That's why I think it's more important to make an intrinsic change to the review, rather than compare unequal processors and show how they are.

    I'll try to preempt your most obvious response by saying I realize true equality is impossible in something like this. However, I think we can both agree that you could have gotten something closer than what was done. A lot closer, really.
    Reply
  • JarredWalton - Friday, March 30, 2007 - link

    This is something that almost always comes up in our IT server type reviews. There are numerous facets of this that people never seem to take into consideration. For example, given the cost, there is absolutely no way that we are going to go out and purchase any of this hardware. That means we're basically dependent upon our industry contacts to provide us with appropriate hardware.

    Could we push harder for Intel to ship us different processors? Perhaps, but the simple fact of the matter is that Intel shipped us their 2.33 GHz Clovertown processors and AMD shipped us their 2.8 GHz Opteron processors. Intel also shipped us some of their lower clocked parts, which surprisingly didn't use much less power. Clovertown obviously launched quite a while ago, and Ross has been working on this article for some time, trying to come up with a set of reasonable benchmarks. Should we delay things further in order to try and get additional processors, or should we go with what we have?

    That's the next problem: trying to figure out how to properly benchmark such systems. FB-DIMMs have some advantages over other potential memory configurations, particularly for enterprise situations where massive amounts of RAM are needed. We could almost certainly, with benchmarks that show Opteron being significantly faster, or go the other way and show Intel being significantly faster -- it's all dependent upon the benchmark(s) we choose to run. I would assume that Ross took quite a bit of time in trying to come up with some representative benchmarks, but no benchmark is perfect for all situations.

    Most of the remaining stuff you mention has been dealt with in previous articles at some point. Continuously repeating the advantages and disadvantages of each platform is redundant, which is why we always went back to previous/related articles. We've talked about the penalties associated with using FB-DIMMs, we talked about overall bus bandwidth, but in the end we're stuck with what is currently available and speculating on how things might be improved by a different architecture is simply that: speculation.

    The final point that people always seem to miss is that price really isn't a factor in high-end server setups like this, at least to a point. In many instances, neither is power consumption. Let's take power as an example:

    In this particular testing, the quad Opteron system generally maxed out at around 500W while the dual Clovertown system maxed out at around 350W. 150 W is certainly significant, but in the big scheme of things the power cost is not what's important. Let's just say that the company pays $.10 per kWHr, which is reasonable (and probably a bit high). Running 24/7, the total power cost differential in a year's time is a whopping $131.49. If the system is significantly faster, know IT department is really going to care. What they really care about, in some instances, this power density -- performance per watt. A lot of data centers have the maximum amount of power that they can supply (without costly upgrades to the power circuitry), so if they need a large number of servers for something like a supercomputer, they will often want the best performance per watt.

    Going back to price, that can be a very important factor in small to medium business situations. Once you start looking at octal core servers, or even larger configurations, typically prices scale exponentially. Dual socket servers are cheap, and single socket servers are now basically high-end workstations with a few tweaks. The jump from two sockets to four sockets is quite drastic in terms of price, so unless you truly need a lot of power in a single box many companies would end up spending less money if they purchased two dual socket servers instead of one quad socket server. (Unless of course the IT department leads them astray because they want to play with more expensive hardware....) So basically, you buy a $20,000 or more setup because you really need the performance.

    As I mentioned above, looking at the price of these two configurations that we tested, the quad Opteron currently would cost a couple thousand dollars more. If the applications that you run perform significantly better with that configuration, does it really matter that quad Opteron is a bit more expensive? On the other hand, as I just finished explaining, a large cluster might very well prefer slightly slower performance per server and simply choose to run more servers (performance per watt).

    While I am not the author of this article, I take it as a given that most of our articles are necessarily limited in scope. I would never consider anything that we publish to be the absolute definitive end word on any argument. In the world of enterprise servers, I consider the scope of our articles to be even more limited, simply because there are so many different things that businesses might want to do with their servers. The reality is that most businesses have to spend months devising their own testing scenarios in order to determine what option is the best for upgrades. That or they just ask IBM, Dell, HP, etc. and take whatever the vendor recommends. :|
    Reply
  • TA152H - Friday, March 30, 2007 - link

    Jarrod,

    Your remarks about power are off. I'm not sure if you guys are really understanding what I'm talking about, or are just arguing just to argue. People see the performance charts, and assume you guys did a decent job of picking appropriate processors without reading the fine print. You didn't, you did a poor job of it, and people often times miss that because they didn't take the time to read it. A lot of remarks are about that. So, many people will read your performance charts and assume they are reasonable comparable parts, when they are not. Taking almost 50% more power isn't a reasonable comparison, sorry. It's a terrible one.

    I'm not at all sure what you're talking about when you bring up the memory and the benchmarks. I had no complaints against them, and you are just stating the obvious when you can choose benchmarks that would make each processor look better. Benchmarks, taken alone, are always dangerous. People love things simplified so very much cling to them, but they never tell the whole story. So, I agree, but I have no idea why you bring it up.

    With regards to bringing stuff up that you have already, are you saying you've pointed out the advantages of the memory controller on the chipset? I don't see that stuff brought up much at all, and it was unrelated to this article. As I said, I digressed, but the impression I get from the hobbyist sites like this is that they all say the integrated memory controller is so much better and the FSB is perfectly horrible and nothing but a bad thing. It's simply not true, integrated memory controllers have been around a long time, and I almost laugh when I see idiots saying how Intel will be copying AMD in doing this. Like AMD was the first to think of it. Or the point to point stuff. It's so uninformed, it's both comical and annoying. Intel knows all about this, as does every company making designs, and they each make different tradeoffs. Was it a mistake for Intel to wait as long? I would say no, since the P8 walks away from the K8, and Intel obviously found really good uses for their transistor budget rather than put a memory controller on the chip. It's not like they don't know how, they obviously chose not to until the P9. One oddity with Intel chips is the odd number ones almost always suck. 186 anyone? 386 sucked bad, it wasn't any better the 286 clock normalized, and it's claim to fame was running real mode elegantly. Pentium wasn't that bad, but still wasn't great and got super hot, and most of the performance was from making the memory bandwidth four times greater. Pentium 4? Oh my. What were they thinking???? P9 has a long, and bad, history :P.

    The FB-DIMMs have been spoken about, that isn't what I was commenting on. I do think a lot of people are confused, even just comparing one dual core processor, how close the Opteron comes to the P8 based Xeons when they are used to seeing a much greater difference in the desktop parts. It's not just the benchmarks, the FB-DIMMs have serious performance handicaps. I don't think it's mentioned enough, although I agree a full description of it would be superfluous and unnecessary.

    My remarks about price were more in line with making an apples to apples comparison, where you compare things at the processor level so you could see comparitive products. Price always matters, always, and always will, even at an end user level. I agree, the cost for four sockets is way too high for most applications, and thus they sell relatively poorly compared to dual socket systems. It's like comparing a Chevrolet Cavalier to a Dodge Viper and comparing them on a multiple of tests, and then saying how the Viper costs more, and that's the nature of sports cars like that, so we should be OK with it. Bring out the Corvette so we can see a real comparison between the two companies, not lame excuses as to why you chose the wrong processors and how it really doesn't matter. They are just rationalizations, and no one will believe them. Cost does matter. Always has, always will. Why do you think IBM doesn't dominate the PC market anymore. They made great computers, but, they cost those extra few hundred dollars more. Over 1000s of machines, it adds up. And even if it didn't, why would you compare the 2.66 with the 2.8 Opteron, which would have closer cost, and would actually illustrate what was available for those people that really needed that performance! You talk about how they need performance, and don't care about money, and then have a processor that is cheaper and doesn't have as good performance anyway! No contradiction there, huh?

    OK, now I love to argue, so I am putting this last. You should have ended your argument with your accessibility to products, and I would have had nothing to say and that would be that. You are right, you can't buy it and Intel should have sent you better parts, and I would guess you actually did try to get the 2.66 Clovertown. No argument there. It's just the stuff after it that made no sense. But, thanks for the explanation, despite all the arguments I think are invalid, you did make one that I can completely understand and makes sense. Really, Intel is kind of stupid for not sending the better parts and improving their image, and if you can't get them, you can't get them and shouldn't pay for them.



    Reply
  • JarredWalton - Friday, March 30, 2007 - link

    I worked in an IT department for three years that bought extremely expensive servers that were often completely unnecessary. They just purchased the latest high-end Dell servers and figured that would be great. That's still what a lot of large enterprise businesses do. You mention desktops and IBM... and now who's bringing up irrelevant stuff? This article is only about servers, really, and a limited amount of detail at that.

    I also have no idea how you say my power remarks are "off". In what way? Cost? Importance? The calculations are accurate for the factors I stated. If a systems is slower and cheaper and uses more power, it will take ages to overcome the power cost in this particular instance. 150W more on a desktop is a huge deal, when you're talking about systems that probably cost $1500 at most. 150W more on a $20000 server only matters if you are near your power capacity. The company I worked for (at that location) had a power bill of about $50,000 per month. Think they care about $150 per server per year? Well, they only had about a dozen servers, so of course not. A shop that has thousands of servers would care more, but if each server that used less power costs $3000 more, and they only used servers for three years, again it would only be a critical factor if they were nearing their peak power intake.

    In the end it seems like you don't like our graphs because they don't convey the whole story. If people don't read graphs properly and don't have the correct background, they will not draw the appropriate conclusions. End of line. When we specifically mention 2.66 and 3.0 GHz parts, even though these weren't available for testing, that's about as much as we can do right now. If I were writing this article, I'm certain I would have said a lot more, but as the managing editor I felt the basic information conveyed was enough.

    The fact of the matter is that while Intel is now clearly ahead on the desktop, on servers it's not a cut-and-dried situation. AMD still has areas where they do well, but there are also areas where Intel does well. The two platforms are also wildly different at present, so any comparisons are going to end up with faults, short of testing about 50 configurations. I defend the article because people attack it, and I don't think they're looking at the big picture. Is the article flawed? In some cases, yes. Does it still convey useful information? I certainly think so, and that's why it was published.

    We have no direct financial ties to either company (or we have ties to both companies if you want to look at it from a different perspective), and every reason to avoid skewing results (i.e. sites like us live and die by reputation). Almost every server comparison I've seen in recent years ends up with some of the faults you've pointed out, simply because the hardware is not something review sites can acquire on their own. They either differ on cost, performance segment, features, or some other area. It's the nature of the business.
    Reply

Log in

Don't have an account? Sign up now