AnandTech Home IT Portal Home Increase Font Size Decrease Font Size Change Page Size
Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
Date: June 7th, 2006
Topic: IT Computing
Manufacturer: Various
Author: Johan De Gelas
Buy the Gigabyte GA-G41M-ES2L Motherboard
Blank
 Newegg $64.99
 TigerDirect $59.99
 Buy.com $60.23
 
 

Secure Socket Layers RSA Performance

Secure Web communication is possible through the utilization of the Secure Sockets Layer (SSL) protocol. Using the command "openssl speed rsa" we can measure the number of RSA public key operations (signs) that a system can perform per second.

While "openssl speed rsa" is sufficient to test the Xeons and Opterons, the Sun T1 can speed up the Rivest Shamir Adleman (RSA) and Digital Signal Algorithm (DSA) encryption and decryption operations needed for SSL processing, thanks to a modular arithmetic unit (MAU) that supports modular exponentiation and multiplication. Each T1 core has a MAU, thus one 8 core T1 has 8 MAUs. To make use of those 8 MAUs, you have run the SSL calculations through the Solaris Cryptographic Framework (SCF). To test the T1 with the MAU crunching at full speed we used the command: "openssl speed -engine pkcs11 rsa". The Solaris 10 OS also provides in-kernel SSL termination, offering greater security than SSL termination outside the kernel.

We included the HP DL585 to see whether 8 cores of complex general purpose CPUs (Opteron 880) can keep up with the 8 MAU of the Sun T1. If you want to compare Woodcrest and the Opteron, you should check the 2 and 4 concurrency numbers. You can find our 1024-bit numbers in the graph below. One thread per core is optimal, so we tested the DL585 with a maximum of 16 threads, to show you that the peak is attained at 8 threads. The Xeon Irwindale was tested with 8 threads to show you that 4 threads (4 logical cores) is optimal and so on.



Notice that the 8 MAUs of the Sun T1 can only get in full action if we fire off 32 "SSL RSA signing" threads. Once that happens, the little 1 GHz T1 is able to keep up with the massive 2.4 GHz 8 core DL585. Without MAU, the T1 is as fast as a 1.8 GHz Xeon Irwindale. It is thus very important to check that your favorite web server works with SCF if you want to run your secure web services on the Sun T2000.

It looks like we've discovered the first - but rather insignificant to most people - "weakness" of the new Core architecture: decryption and encryption. The Opteron at 2.4 GHz has no trouble keeping up with the 3 GHz Woodcrest. This might be a result of the fact that the Woodcrest can only perform one rotate per cycle, while the Opteron can do 3. Although the RSA algorithm doesn't really use rotations, the hash algorithms needed to sign or encrypt a key make use of rotations. However, the most important reason is probably that the Opteron can sustain 2 ADC (Add with Carry) instructions per clock cycle, while Woodcrest can only do one. As ADC is good for about 17% of the instruction mix of the RSA algorithm, this might be enough to negate the extra integer power (Memory disambiguation, 4 wide decode ...) that the Woodcrest has.

Also notice that the previous NetBurst architecture, represented by the Xeon Irwindale, does very badly. The reason is that the P4 doesn't have a barrel shifter, a circuit in the chip which can shift or rotate any number in one clock cycle. Without this shifter, rotates and shifts take much longer, resulting in high latency. Most x86 code couldn't care less, but most encrypting code makes heavy use of rotates or shifts or both. We also did a quick test with Hyper-Threading on and off. In this case Hyper-Threading sped up the encryption (signs/s) with 20 to 28%.

To end the RSA sign/s benchmark, we'll make a quick comparison between quad core AMD Opteron 2.4 GHz, quad-core Intel Xeon Woodcrest and Sun's T1 with MAU enabled across different RSA bit lengths.

RSA Encryption (Signs/s)
  Opteron 2.4 GHz
4 threads
Xeon 5160 3 GHz
4 threads
SUN T1 with MAU
32 threads
512 bit 19003 21194 35613
1024 bit 6098 6240 10722
2048 bit 1145 1087 1918
4096 bit 185 164 1


Notice that the hardware acceleration of the T1 does not work beyond 2048-bit keys. Considering that most secure applications use 1024-bit and only a few "high security" ones use 2048-bit, this is not an issue.

In case of doing verifies as opposed to signs, the server has to authenticate the identity of the client. This is a lot less intensive, and we'll show you the verifies per second numbers at 2048-bits. At 1024-bits length, both the Woodcrest and Opteron were able to verify more than 50000 keys per core, and that is a hard limit of the OpenSSL benchmark.



Again, the Opteron takes the lead. The Sun T1 even with the 8 MAUs is half as slow as four Opterons or Woodcrests, but this is hardly an issue. Encrypting or signing will slow down a server much quicker than verifying keys.

Both verifies/s and signs/s benchmark are rather synthetic. It is much more realistic to test with a real web server running SSL, and that is what we are currently doing. We followed Sun's instructions to enable RSA hardware acceleration for Apache, but for some reason, the Apache web server is still not making use of the Solaris Cryptographic Framework. So our Web server SSL test is work in progress.

Apache/PHP/MySQL Performance   Next Page

 
  Index

Tools Share
Find lowest prices Find the lowest prices
Digg   del.icio.us   E-mail  
Print This Article Print this article  

91 Comments - Last by duploxxx, 1331 days ago
Username:
Password:
Price. by MrKaz, 1343 days ago
How much will it cost?

If Conroe XE 2.9Ghz is 1000$.
Then I assume that this will cost more.

I think looks good, but it will depends a lot of the final price.

Also does that FBdimm have a premium price over the regular ones?

Reply
RE: Price. by zsdersw, 1343 days ago
Umm.. no. Woodcrests won't cost $1000. Xeons have always cost less than the EE chips.

Reply
It does look good, but by BaronMatrix, 1343 days ago
why are we running servers with only 4GB RAM. I have that in my desktop. Not ot nitpick but I think you shuld load up 16GB and rerun the tests. If not this is a low end test, not HPC. I saw the last Apache comparison and it seems like the benchmark is different. Opteron was winning by 200-400% in those tests. What happened?

Reply
RE: It does look good, but by JohanAnandtech, 1343 days ago
Feel free to send me 12 GB of FBDIMMs. And it sure isn't a HPC test, it is a server test.

"I saw the last Apache comparison and it seems like the benchmark is different. Opteron was winning by 200-400% in those tests. What happened? "

A new Intel architecture called "Core" was introduced :-)

Reply
RE: It does look good, but by BaronMatrix, 1343 days ago
I didn't say the scores, I said the units in the benchmark. I'm not attacking you. It just stuck out in my head that the units didn't seem to be the same as the last test with Paxville. By saying HPC, I mean apps that use 16GB RAM, like Apache/Linux/Solaris. I'm not saying you purposely couldn't get 12 more GB of RAM but all things being equal 16GB would be a better config for both systems.

I've been looking for that article but couldn't find it.

Reply
RE: It does look good, but by JarredWalton, 1343 days ago
Most HPC usage models don't depend on massive amounts of RAM, but rather on data that can be broken down into massively parallel chunks. IBM's BlueGene for example only has 256MB (maybe 512MB?) of RAM per node. When I think of HPC, that's what comes to mind, not 4-way to 16-way servers.

The amount of memory used in these benchmarks is reasonable, since more RAM only really matters if you have data sets that are too large to fit with the memory. Since our server data sets are (I believe) around 1-2GB, having more than 4GB of RAM won't help matters. Database servers are generally designed to having enough RAM to fit the vast majority of the database into memory, at least where possible.

If we had 10-14GB databases, we would likely get lower results (more RAM = higher latency among other things), but the fundamental differences between platforms shouldn't change by more than 10%, and probably closer to 5%. Running larger databases with less memory would alter the benchmarks to the point where they would largely be stressing the I/O of the system - meaning the HDD array. Since HDDs are so much slower than RAM (even 15K SCSI models), enterprise servers try to keep as much of the regularly accessed data in memory as possible.

As for the Paxville article, click on the "IT Computing" link at the top of the website. Paxville is the second article in that list (and it was also linked once or twice within this article). Or here's the direct link.

Reply
RE: It does look good, but by BaronMatrix, 1343 days ago
Thx for the link, but the test I was looking at was Apache and showed concurrency tests. At any rate, just don't think I was attacking you. I was curious as to the change in units I noticed.

Reply
RE: It does look good, but by JohanAnandtech, 1343 days ago
No problem. Point is your feedback is rather unclear. AFAIK, I haven't tested with Paxville. Maybe you are referring to my T2000 review, where we used a different LAMP test, as I explained in this article. In this article the LAMP server has a lot more PHP and MySQL work.

http://www.anandtech.com/IT/showdoc.aspx?i=2772&p=6
See the first paragraph

And the 4 GB was simply a matter of the fact that Woodcrest had 4 GB of FB DIMM.


Reply
Intel is back in the 2S server market by blackbrrd, 1343 days ago
Finally Intel can give AMD some real competition in the two socket server market. This shows why Dell only wanted to go with AMD for 4S and not 2S server systems...

245w vs 374w and a huge performance lead over the previous Intel generation is a huge leap for Intel.

It will be interesting to see how much these systems are going to cost:
1) is the fb-dimm's gonna be expensive?
2) is the cpu's gonna be expensive?
3) is the motherboards gonna be expensive?

For AMD neither the ram nor the motherboards are expensive, so I am curious how this goes..

If anybody thinks I am an Intel fanboy, I have bought in this sequence: intel amd intel intel, and I would have gotten and amd instead of an intel for the last computer, except I wanted a laptop ;)

Reply
RE: Intel is back in the 2S server market by JarredWalton, 1343 days ago
For enterprise servers, price isn't usually a critical concern. You often buy what runs your company best, though of course there are plenty of corporations that basically say "Buy the fastest Dell" and leave it at that.

FB-DIMMs should cost slightly more than registered DDR2, but not a huge difference. The CPUs should actually be pretty reasonably priced, at least for the standard models. (There will certainly be models with lots of L3 cache that will cost an arm and a leg, but that's a different target market.)

Motherboards for 2S/4S are always pretty expensive - especially 4S. I would guess Intel's boards will be a bit more expensive than equivalent AMD boards on average, but nothing critical. (Note the "equivalent" - comparing boards with integrated SCSI and 16 DIMM slots to boards that have 6 DIMM slots is not fair, right?)

Most companies will just get complete systems anyway, so the individual component costs are only a factor for small businesses that want to take the time to build and support their own hardware.

Reply
Comments Page 1 of 10

Free Forrester Risk Management Report
Demystifying Enterprise Risk Management. Download Free With Registration.
DOWNLOAD vWire Today - FREE TRIAL
Take Control of Your Virtual Infrastructure. Manage VI Data & Prevent Problems.
Report Unlicensed Business Software Use
Earn Up to $1 Million by Reporting Unlicensed Software Use. Fill Out Our Form!
Download Microsoft Visual Studio ® Team System
Streamline Dev processes, Reduce time to market. Try Microsoft Visual Studio Team System, FREE!
Supermicro Barebone Servers
We Carry Everything Supermicro. Low Price, Top Service, FREE Shipping, and more.




Click Here Click Here
Latest news by
DailyTech

 February 9, 2010

Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank
Blank

 February 8, 2010

Blank




pipeboost
Copyright © 1997-2010 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information