The Slim T1 CPU

It is very unfair of us to compare one of the eight very slim T1 cores to mammoths like the Opteron or the Xeon, which have about 10 to 20 times more transistors. Still, we are curious. We know that Sun sacrificed single-threaded performance on the altar of power consumption, multi-threaded performance and die space. How far did they go? Let us find out with LMBench 3.0a. By the way, you can find much more information about the T1 CPU in our previous article.

First, we check the cache latency and RAM latency. For fat modern superscalar cores like the Opteron and Xeon, these numbers are extremely important. The T1 CPU is less sensitive to the latency of the memory subsystem as long as it has enough threads. The T1 swaps threads waiting for the memory to respond for more responsive threads.

CPU (LMBench) OS Clockspeed L1 (ns) L1 (cycles) L2 (ns) L2 (cycles) RAM (ns) RAM (cycles)
Opteron 275 SunOS 5.10 2211 1.357 3 5.436 12 67.5 149
Pentium- M 1.6 GHz Linux 2.6.15- 1593 1,880 3 6 10 92.1 147
Sun T1 1 GHz SunOS 5.10 980 3.120 3 22.1 22 107.5 105
Opteron 275 Linux 2.6.15- 2209 1.357 3 5 12 73 161
Xeon Irwindale 3.6 GHz Linux 2.6.15- 3594 1.110 4 8 28 48.8 175

Sun has definitely favoured power consumption here. A 3-cycle latency at 1 GHz on a 90 nm process is very conservative. A 22-cycle L2-cache latency is even a bit slow, but again, the thread Gatling gun takes care of that. The built-in memory controllers pay off: latency is about 105 cycles, while even the Pentium-M needs 147 cycles. This helps to keep the average latency (seen from viewpoint of the CPU) low.

Let us see if there is some integer crunching power in the little Sparc core.

CPU (LMBench) OS Bit Add mul div mod
Opteron 275 SunOS 5.10 0.45 0.45 1.36 18.60 19.00
Pentium- M 1.6 GHz Linux 2.6.15- 0.63 0.63 2.51 19.50 11.50
Sun T1 1 GHz SunOS 5.10 1.01 1.00 29.10 104.00 114.00
Opteron 275 Linux 2.6.15- 0.45 0.45 1.36 18.60 19.00
Xeon Irwindale 3.6 GHz Linux 2.6.15- 0.28 0.28 2.79 17.30 23.30

The very common ADD instruction is executed in one cycle, but it takes no less than 29 cycles to multiply and 104 to divide. Faster mul and division would have taken up much more die space and consumed much more power. Considering that those instructions are very rare in most server workloads, this is a pretty clever trade-off. Update: the Sun documentation tell us 7-11 cycles for multiply and 72 for division.

Let us check out what the lonely FPU of the T1 can do.

CPU (LMBench) OS FADD FMUL FDIV
Opteron 275 SunOS 5.10 1.80 1.80 10.90
Pentium- M 1.6 GHz Linux 2.6.15- 1.88 3.14 23.90
Sun T1 1 GHz SunOS 5.10 26.50 29.30 54.20
Opteron 275 Linux 2.6.15- 1.81 1.81 9.58
Xeon Irwindale 3.6 GHz Linux 2.6.15- 1.39 1.95 12.60

FADD and FMUL are a little faster than what we first reported (40 cycles), and the main part of that latency might just consist of getting the data to the FPU of the T1. It is clear that the Sun T1 doesn't like FP code at all.

Words of thanks PHP/MySQL: de T2000 as a heavy SAMP web server
Comments Locked

26 Comments

View All Comments

  • JackPack - Friday, March 24, 2006 - link

    Pleasant to read as usual, Johan.

    BTW, are they letting you keep the T2000?
    http://blogs.sun.com/roller/page/jonathan?entry=ni...">http://blogs.sun.com/roller/page/jonathan?entry=ni...
  • PandaBear - Friday, March 24, 2006 - link

    In terms of Branded server it is a good price, but as benchmark have shown, a Dual Opteron running Linux both perform better and use less power. I think people who buy these class of server want support and service (and build quality) and in that case Sun certain would win the whitebox builder no matter how good a Dual Opteron is.

    Nonetheless it is a good product, for the one who demand this kind of quality. Now Intel's solution really looks bad.
  • Calin - Friday, March 24, 2006 - link

    I don't know what you are talking about - if you would up the memory on the Opteron HE (2CPU of 2 cores) to 32GB, the power consumption would be almost the same (assuming 6W per 4GB of RAM, it would be at 234W. Close enough to be considered equal, I'd say.
    Also, wouldn't populating all the possible memory slots on the Opteron decrease a bit its performance? I don't know about Opteron, but Athlon64 decrease its command rate (Help, Johan! :) ) when working with all the memory channels filled.
    I agree about the better performance of the Opteron server, but regarding the power use, it is the same as the Sun's recent offering. Maybe the introduction of the DDR2 Opterons would change the power envelope, but until then, the T1 might have some aces up its sleeve
  • JohanAnandtech - Friday, March 24, 2006 - link

    You must calculate about 4-5 Watt per 2 GB Dimm. Based on the measurements I did and slightly guessing I think a 32 GB Opteron HE with 32 GB would definitely consume more than The T2000 as also have to count a few Watts per memory channel.

    Indeed, fully loaded DIMM channels will probably throttle back to lower speeds. I am not sure about Command rate though (BTW, it increases on the Athlon 64 not decreases :-), as it is possible less important with buffered DIMMs.

    About performance, we still have to test a lot of scenario's (jsp, databases). The impression of the T2000 might still change.
  • Zoomer - Sunday, April 9, 2006 - link

    2xx Opterons use rigistered ram, so its not an issue like with the 1xx 939s.
  • Calin - Friday, March 24, 2006 - link

    I just took the difference measured between the 2xOpteron HE with 4 and 8 GB or RAM (192 and 198W), shown in the table on the last page. I know that even rounding errors might change that between 4 and 8W, but anyway, Opterons won't use less power than the T1.
    Very interesting article, and I eagerly await for the sequels :D

Log in

Don't have an account? Sign up now