Secure Socket Layers RSA Performance

Secure Web communication is possible through the utilization of the Secure Sockets Layer (SSL) protocol. Using the command "openssl speed rsa" we can measure the number of RSA public key operations (signs) that a system can perform per second.

While "openssl speed rsa" is sufficient to test the Xeons and Opterons, the Sun T1 can speed up the Rivest Shamir Adleman (RSA) and Digital Signal Algorithm (DSA) encryption and decryption operations needed for SSL processing, thanks to a modular arithmetic unit (MAU) that supports modular exponentiation and multiplication. Each T1 core has a MAU, thus one 8 core T1 has 8 MAUs. To make use of those 8 MAUs, you have run the SSL calculations through the Solaris Cryptographic Framework (SCF). To test the T1 with the MAU crunching at full speed we used the command: "openssl speed -engine pkcs11 rsa". The Solaris 10 OS also provides in-kernel SSL termination, offering greater security than SSL termination outside the kernel.

We included the HP DL585 to see whether 8 cores of complex general purpose CPUs (Opteron 880) can keep up with the 8 MAU of the Sun T1. If you want to compare Woodcrest and the Opteron, you should check the 2 and 4 concurrency numbers. You can find our 1024-bit numbers in the graph below. One thread per core is optimal, so we tested the DL585 with a maximum of 16 threads, to show you that the peak is attained at 8 threads. The Xeon Irwindale was tested with 8 threads to show you that 4 threads (4 logical cores) is optimal and so on.



Notice that the 8 MAUs of the Sun T1 can only get in full action if we fire off 32 "SSL RSA signing" threads. Once that happens, the little 1 GHz T1 is able to keep up with the massive 2.4 GHz 8 core DL585. Without MAU, the T1 is as fast as a 1.8 GHz Xeon Irwindale. It is thus very important to check that your favorite web server works with SCF if you want to run your secure web services on the Sun T2000.

It looks like we've discovered the first - but rather insignificant to most people - "weakness" of the new Core architecture: decryption and encryption. The Opteron at 2.4 GHz has no trouble keeping up with the 3 GHz Woodcrest. This might be a result of the fact that the Woodcrest can only perform one rotate per cycle, while the Opteron can do 3. Although the RSA algorithm doesn't really use rotations, the hash algorithms needed to sign or encrypt a key make use of rotations. However, the most important reason is probably that the Opteron can sustain 2 ADC (Add with Carry) instructions per clock cycle, while Woodcrest can only do one. As ADC is good for about 17% of the instruction mix of the RSA algorithm, this might be enough to negate the extra integer power (Memory disambiguation, 4 wide decode ...) that the Woodcrest has.

Also notice that the previous NetBurst architecture, represented by the Xeon Irwindale, does very badly. The reason is that the P4 doesn't have a barrel shifter, a circuit in the chip which can shift or rotate any number in one clock cycle. Without this shifter, rotates and shifts take much longer, resulting in high latency. Most x86 code couldn't care less, but most encrypting code makes heavy use of rotates or shifts or both. We also did a quick test with Hyper-Threading on and off. In this case Hyper-Threading sped up the encryption (signs/s) with 20 to 28%.

To end the RSA sign/s benchmark, we'll make a quick comparison between quad core AMD Opteron 2.4 GHz, quad-core Intel Xeon Woodcrest and Sun's T1 with MAU enabled across different RSA bit lengths.

RSA Encryption (Signs/s)
  Opteron 2.4 GHz
4 threads
Xeon 5160 3 GHz
4 threads
SUN T1 with MAU
32 threads
512 bit 19003 21194 35613
1024 bit 6098 6240 10722
2048 bit 1145 1087 1918
4096 bit 185 164 1


Notice that the hardware acceleration of the T1 does not work beyond 2048-bit keys. Considering that most secure applications use 1024-bit and only a few "high security" ones use 2048-bit, this is not an issue.

In case of doing verifies as opposed to signs, the server has to authenticate the identity of the client. This is a lot less intensive, and we'll show you the verifies per second numbers at 2048-bits. At 1024-bits length, both the Woodcrest and Opteron were able to verify more than 50000 keys per core, and that is a hard limit of the OpenSSL benchmark.



Again, the Opteron takes the lead. The Sun T1 even with the 8 MAUs is half as slow as four Opterons or Woodcrests, but this is hardly an issue. Encrypting or signing will slow down a server much quicker than verifying keys.

Both verifies/s and signs/s benchmark are rather synthetic. It is much more realistic to test with a real web server running SSL, and that is what we are currently doing. We followed Sun's instructions to enable RSA hardware acceleration for Apache, but for some reason, the Apache web server is still not making use of the Solaris Cryptographic Framework. So our Web server SSL test is work in progress.

Theoretical Performance Apache/PHP/MySQL Performance
POST A COMMENT

91 Comments

View All Comments

  • Questar - Thursday, June 08, 2006 - link

    Why? Because AMD got creamed? Reply
  • ashyanbhog - Thursday, June 08, 2006 - link

    and Intel woodcrest may have fantastic performance when compared to earlier xeons,

    but Intel is 3 years late to the party, Opteron was here in 2003!

    also remember, woodcrest is a brand new design from PIII base, manufactured on 65nm process. It is still to make its debut in the market and be available in volumes. Amd its indeed nice to see it being compared to a 3 year old design manufactued on 90nm process.

    AMD still has two product launches to come this year. Move to DDR2 for opterons which should cut some power usage for the total system AND introduction of products manufactured on 65nm at the fag end of the year. Will woodcrest and conroe still retain their performance margins then? if not, for how many months or weeks has Intel grabbed this "performance crown"?
    Reply
  • zsdersw - Thursday, June 08, 2006 - link

    Consider the following:

    - If comparisons could be made between new products from both companies (i.e., Woodcrest versus K8L), they would be made. In the game of leapfrog that we have betweeen AMD and Intel, the comparisons will always be between existing tech and new tech. Will you be pointing out how AMD is "late to the party" when they release their new stuff?

    - Making its debut and availability in volume is an issue for both AMD and Intel. It's not a valid point unless you make it across the board.

    - 65nm will allow clock speeds of Opterons/A64's to increase.. but Conroe/Woodcrest speeds will be increasing as well.
    Reply
  • ashyanbhog - Thursday, June 08, 2006 - link

    not because AMD got creamed!

    a 35 billion$ dollar turnover company (Intel) is bound to make a comeback one day.

    it Anandtech's review setup, its full of holes

    the mysql benchmark on Dual Dual core opterons where they see a 30% drop against single core dual processor numbers in this becnhmark contradicts their own earlier benchmark where they see a 10% performance increase.

    http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...

    they also use a substandard MSI motherboard in one of the Opteron systems and fail to mention which system was used for the benchmarks

    mistakes like this, genuine or intentional, are rife throughout the review report

    the whole thing looks like the rig was setup to push the performance diff b/w woodcrest and Opterons to the max,

    why would anybody two months to tweak settings before they publish the review!
    Reply
  • Questar - Thursday, June 08, 2006 - link

    Why? Because AMD got creamed? Reply
  • duploxxx - Thursday, June 08, 2006 - link

    yeah right its a workstation motherboard it uses an nforce controller so maybe they rate it as server board it still is a budget board used for workstations, not a real server board or server chipset like they used on the intel woodcrest.

    check the servers like sun galaxy and hp dl385 they have amd chipsets... big difference.
    the nforce has a shared memory bus...
    Reply
  • zsdersw - Thursday, June 08, 2006 - link

    Yeah, that's one of the 3 Opteron servers. At any rate, the MSI board is a basic server board.. it's still a server board. Reply
  • duploxxx - Thursday, June 08, 2006 - link

    yeah they have done 1 real bench with an hp. all other benches were done with the 2 MSI basic boards...

    still waiting for the wintel benches
    Reply
  • wolaris - Thursday, June 08, 2006 - link

    In corporate environments, no-one with any hardware budget at all runs webserver and database on the same machine, as it hurts both performance and reliability. This affects T1 most, as its low clock speed and simple cores are not meant for database workloads.
    I think that you should run web serving tests using common, high-performance Opteron DB server and separate webservers, as it would be the case in real-world scenarios.
    Reply
  • MrKaz - Thursday, June 08, 2006 - link

    So Power consuming of the new Intel processor on .65nm at already high clock speed of 3.0Ghz is already consuming more than the older AMD Opteron on .90nm 2.8Ghz and DDR.

    When AMD releases socket F will go DDR2 (less power) and better .90nm samples (lower power). So then "new" Intel is already getting beaten...

    And those tests where done with Cool&Quite?

    Also don’t forget this tests where done with Woodcrest 3.0Ghz VS Opteron 2.2Ghz and 2.4Ghz, so when AMD releases the 2.8Ghz and 3.0Ghz with socket F the performance lead of Intel will vanish…

    I think the biggest surprise here is how bad Xeon (P4) was (IS!!), and people keep buying it.
    Reply

Log in

Don't have an account? Sign up now