Secure Socket Layers RSA Performance

Secure Web communication is possible through the utilization of the Secure Sockets Layer (SSL) protocol. Using the command "openssl speed rsa" we can measure the number of RSA public key operations (signs) that a system can perform per second.

While "openssl speed rsa" is sufficient to test the Xeons and Opterons, the Sun T1 can speed up the Rivest Shamir Adleman (RSA) and Digital Signal Algorithm (DSA) encryption and decryption operations needed for SSL processing, thanks to a modular arithmetic unit (MAU) that supports modular exponentiation and multiplication. Each T1 core has a MAU, thus one 8 core T1 has 8 MAUs. To make use of those 8 MAUs, you have run the SSL calculations through the Solaris Cryptographic Framework (SCF). To test the T1 with the MAU crunching at full speed we used the command: "openssl speed -engine pkcs11 rsa". The Solaris 10 OS also provides in-kernel SSL termination, offering greater security than SSL termination outside the kernel.

We included the HP DL585 to see whether 8 cores of complex general purpose CPUs (Opteron 880) can keep up with the 8 MAU of the Sun T1. If you want to compare Woodcrest and the Opteron, you should check the 2 and 4 concurrency numbers. You can find our 1024-bit numbers in the graph below. One thread per core is optimal, so we tested the DL585 with a maximum of 16 threads, to show you that the peak is attained at 8 threads. The Xeon Irwindale was tested with 8 threads to show you that 4 threads (4 logical cores) is optimal and so on.



Notice that the 8 MAUs of the Sun T1 can only get in full action if we fire off 32 "SSL RSA signing" threads. Once that happens, the little 1 GHz T1 is able to keep up with the massive 2.4 GHz 8 core DL585. Without MAU, the T1 is as fast as a 1.8 GHz Xeon Irwindale. It is thus very important to check that your favorite web server works with SCF if you want to run your secure web services on the Sun T2000.

It looks like we've discovered the first - but rather insignificant to most people - "weakness" of the new Core architecture: decryption and encryption. The Opteron at 2.4 GHz has no trouble keeping up with the 3 GHz Woodcrest. This might be a result of the fact that the Woodcrest can only perform one rotate per cycle, while the Opteron can do 3. Although the RSA algorithm doesn't really use rotations, the hash algorithms needed to sign or encrypt a key make use of rotations. However, the most important reason is probably that the Opteron can sustain 2 ADC (Add with Carry) instructions per clock cycle, while Woodcrest can only do one. As ADC is good for about 17% of the instruction mix of the RSA algorithm, this might be enough to negate the extra integer power (Memory disambiguation, 4 wide decode ...) that the Woodcrest has.

Also notice that the previous NetBurst architecture, represented by the Xeon Irwindale, does very badly. The reason is that the P4 doesn't have a barrel shifter, a circuit in the chip which can shift or rotate any number in one clock cycle. Without this shifter, rotates and shifts take much longer, resulting in high latency. Most x86 code couldn't care less, but most encrypting code makes heavy use of rotates or shifts or both. We also did a quick test with Hyper-Threading on and off. In this case Hyper-Threading sped up the encryption (signs/s) with 20 to 28%.

To end the RSA sign/s benchmark, we'll make a quick comparison between quad core AMD Opteron 2.4 GHz, quad-core Intel Xeon Woodcrest and Sun's T1 with MAU enabled across different RSA bit lengths.

RSA Encryption (Signs/s)
  Opteron 2.4 GHz
4 threads
Xeon 5160 3 GHz
4 threads
SUN T1 with MAU
32 threads
512 bit 19003 21194 35613
1024 bit 6098 6240 10722
2048 bit 1145 1087 1918
4096 bit 185 164 1


Notice that the hardware acceleration of the T1 does not work beyond 2048-bit keys. Considering that most secure applications use 1024-bit and only a few "high security" ones use 2048-bit, this is not an issue.

In case of doing verifies as opposed to signs, the server has to authenticate the identity of the client. This is a lot less intensive, and we'll show you the verifies per second numbers at 2048-bits. At 1024-bits length, both the Woodcrest and Opteron were able to verify more than 50000 keys per core, and that is a hard limit of the OpenSSL benchmark.



Again, the Opteron takes the lead. The Sun T1 even with the 8 MAUs is half as slow as four Opterons or Woodcrests, but this is hardly an issue. Encrypting or signing will slow down a server much quicker than verifying keys.

Both verifies/s and signs/s benchmark are rather synthetic. It is much more realistic to test with a real web server running SSL, and that is what we are currently doing. We followed Sun's instructions to enable RSA hardware acceleration for Apache, but for some reason, the Apache web server is still not making use of the Solaris Cryptographic Framework. So our Web server SSL test is work in progress.

Theoretical Performance Apache/PHP/MySQL Performance
POST A COMMENT

91 Comments

View All Comments

  • JohanAnandtech - Saturday, June 10, 2006 - link

    The test you link is running apachebench while testing how fast STATIC html can be sent. Our LAMP test has to run PHP, access the MYSQL database, make calculations on that data ... this called DYNAMIC content.

    If you do not understand why a static HTML page can be served many times faster than a complex one with dynamic content, well...

    You are basically saying that a test is wrong because it doesn't give the same results as another test which tests with different software, different dataset. Duh.
    Reply
  • BasMSI - Wednesday, June 14, 2006 - link

    I noticed Johan.

    But still, it's stupid to use and publish benchmark results from a test that can't handle/test the systems at their max.
    Come on, get real, it's like testing a Lada and a Ferrari on a track that can't do more then 100KM/H and then state, look how well the Lada keeps up with the Ferrari.

    Also, what's wrong with static HTML tests?
    I see no harm in those, many websites are still static.
    And you used them before to show how fast the Opterons where, so why not again?
    Now we have absolutly nothing to compare or verify....so bogus test-results.
    Reply
  • BrechtKets - Saturday, June 10, 2006 - link

    quote:

    If you don't know how to setup a server, then stay away from trying to do such.


    Maybe you should check the author of the aces hardware article.

    Also not that those tests were done with apachebench en the tests now have been done with httperf and and autobench...
    Reply
  • FreakyD - Friday, June 09, 2006 - link

    Dell has released some new servers with the new Intel Woodcrest platform. The pricing is less than for the older Netburst architecture servers... It looks like we'll have a price war on our hands, and of course AMD will end up losing that battle since Intel has lower production costs with higher volume.

    Also interesting to note, the 3.0Ghz Woodcrest Intel processor that was quite competitive in this review is the lowest end processor on the new Dell servers. Their highest end one is a 3.73 Ghz part. AMD's highest end dual core server processor is currently 2.6 Ghz. So there's additional performance gains for Intel vs AMD in a highest end server processor shootout.

    I'm disappointed that AMD hasn't done more since they released the K8 architecture. AMD has also been slow to release their new server platform with Pacifica enhancements.

    It's too bad that Dell has taken so long to begin using AMD in servers. They've held the performance lead for quite some time. With technology and market leaders changing so fast, they should have been faster to adjust their product lineup.
    Reply
  • duploxxx - Friday, June 09, 2006 - link

    duh my dear friend.... the dell servers you are pointing to can be checked where? link?
    you are mixing woodcrest that is at max 3000mhz and the dempsey 3.73 both on the same platform. dempsey is still no match for the woodcrest and opterons, so thats normal that the price tag is that low...... and its already dead before it is even launched

    check this review, the dempsey is still wiped out on 90% of all the benches by an old architecture and certainly if you would check the power consumption/performance chart.

    http://www.gamepc.com/labs/view_content.asp?id=xeo...">http://www.gamepc.com/labs/view_content.asp?id=xeo...
    the proc cost of intel is certainly not lower than the amd ones... looking at the die size the woodcrest and conroe are bigger

    @anand, those type of benches would be nice on a woodcrest, if you fail to give them now by "any reason" they will be available in the near future by other reviewers. so its always better to be the first :)
    Reply
  • FreakyD - Friday, June 09, 2006 - link

    Ahh, my mistake, thanks for the correction so nobody else gets the wrong idea. Once again I'm confused by Intel's naming and numbering scheme to not know exactly what's being sold. Reply
  • Aileur - Thursday, June 08, 2006 - link

    This is a sad sad display. And i dont mean the review, i mean everybody bashing this article and each other like their lives depended on it.
    Its a cpu review on a hardware site, try to put it into perspective.

    You read it, you draw your own conclusions if you want to, you go on with your life.
    Reply
  • ashyanbhog - Friday, June 09, 2006 - link

    Sure our lives dont depend on it,

    but Anandtech was a site you could rely onto get unbiased reviews. I have configured specs for atleast 25 machines based on Anandtech reports. Whenever somebody asked which CPU or someother part was better, I would suggest that they search for its review on Anandtech.

    Even in the IDF conroe demo, Anandtech failed to identify some parts of the Intel setup that could have impacted performance, it was only after readers expressed their displeasure that Anandtech did a second review with the updates that should have actually been part of the Intel setup preview

    If this new found low of Anandtech continues, I'll have to choose a different site to base my decisions on.

    Also remember, Intel has previously used and continues to use Anandtech review of its processors in its analysts meet and at other places. As somebody pointed out, even a $0.15 swing in Intel share prices alters its valuation by one billion dollar!!! Intel could buy a handful of review reports by favoring advertising budgets for a fraction of that money.

    Anandtech made my life a little easier by giving unbiased reviews, looks like I'll have to get back to comparing results from a few reviews as I used to do before I discovered Anandtech
    Reply
  • Slappi - Thursday, June 08, 2006 - link

    The Message is Clear.......

    ....Anand is getting paid by the big Intel.


    Seriously.... you guys should at least TRY to hide your bias.

    I mean months of setting up and you miss a known error that falsely reports extremely low dual OP. numbers?!?


    Woodcrest ROCKS?~?~?

    Something tells me that is gonna come back to bite you one day in the near future.
    Reply
  • AnandThenMan - Thursday, June 08, 2006 - link

    well ya gotta love this statement:
    quote:

    "In one word: Woodcrest rocks!"


    That's two words LOL
    Reply

Log in

Don't have an account? Sign up now