The Slim T1 CPU

It is very unfair of us to compare one of the eight very slim T1 cores to mammoths like the Opteron or the Xeon, which have about 10 to 20 times more transistors. Still, we are curious. We know that Sun sacrificed single-threaded performance on the altar of power consumption, multi-threaded performance and die space. How far did they go? Let us find out with LMBench 3.0a. By the way, you can find much more information about the T1 CPU in our previous article.

First, we check the cache latency and RAM latency. For fat modern superscalar cores like the Opteron and Xeon, these numbers are extremely important. The T1 CPU is less sensitive to the latency of the memory subsystem as long as it has enough threads. The T1 swaps threads waiting for the memory to respond for more responsive threads.

CPU (LMBench) OS Clockspeed L1 (ns) L1 (cycles) L2 (ns) L2 (cycles) RAM (ns) RAM (cycles)
Opteron 275 SunOS 5.10 2211 1.357 3 5.436 12 67.5 149
Pentium- M 1.6 GHz Linux 2.6.15- 1593 1,880 3 6 10 92.1 147
Sun T1 1 GHz SunOS 5.10 980 3.120 3 22.1 22 107.5 105
Opteron 275 Linux 2.6.15- 2209 1.357 3 5 12 73 161
Xeon Irwindale 3.6 GHz Linux 2.6.15- 3594 1.110 4 8 28 48.8 175

Sun has definitely favoured power consumption here. A 3-cycle latency at 1 GHz on a 90 nm process is very conservative. A 22-cycle L2-cache latency is even a bit slow, but again, the thread Gatling gun takes care of that. The built-in memory controllers pay off: latency is about 105 cycles, while even the Pentium-M needs 147 cycles. This helps to keep the average latency (seen from viewpoint of the CPU) low.

Let us see if there is some integer crunching power in the little Sparc core.

CPU (LMBench) OS Bit Add mul div mod
Opteron 275 SunOS 5.10 0.45 0.45 1.36 18.60 19.00
Pentium- M 1.6 GHz Linux 2.6.15- 0.63 0.63 2.51 19.50 11.50
Sun T1 1 GHz SunOS 5.10 1.01 1.00 29.10 104.00 114.00
Opteron 275 Linux 2.6.15- 0.45 0.45 1.36 18.60 19.00
Xeon Irwindale 3.6 GHz Linux 2.6.15- 0.28 0.28 2.79 17.30 23.30

The very common ADD instruction is executed in one cycle, but it takes no less than 29 cycles to multiply and 104 to divide. Faster mul and division would have taken up much more die space and consumed much more power. Considering that those instructions are very rare in most server workloads, this is a pretty clever trade-off. Update: the Sun documentation tell us 7-11 cycles for multiply and 72 for division.

Let us check out what the lonely FPU of the T1 can do.

CPU (LMBench) OS FADD FMUL FDIV
Opteron 275 SunOS 5.10 1.80 1.80 10.90
Pentium- M 1.6 GHz Linux 2.6.15- 1.88 3.14 23.90
Sun T1 1 GHz SunOS 5.10 26.50 29.30 54.20
Opteron 275 Linux 2.6.15- 1.81 1.81 9.58
Xeon Irwindale 3.6 GHz Linux 2.6.15- 1.39 1.95 12.60

FADD and FMUL are a little faster than what we first reported (40 cycles), and the main part of that latency might just consist of getting the data to the FPU of the T1. It is clear that the Sun T1 doesn't like FP code at all.

Words of thanks PHP/MySQL: de T2000 as a heavy SAMP web server
Comments Locked

26 Comments

View All Comments

  • phantasm - Wednesday, April 5, 2006 - link

    While I appreciate the review, especially the performance benchmarks between Solaris and Linux on like hardware, I can't help but feel this article falls short in terms of an enterprise class server review which, undoubtedly, a lot of enterprise class folks will be looking for.

    * Given the enterprise characteristics of the T2000 I would have liked to see a comparison against an HP DL385 and IBM x366.

    * The performance testing should have been done with the standard Opteron processors (versus the HE). The HP DL385 using non HE processors have nearly the same power and thermal characteristics as the T2000. DL385 is a 4A 1615 BTU system whereas the T2000 is a 4A 1365 BTU system.

    * The T2000 is difficient in serveral design areas. It has a tool-less case lid that is easily removable. However, our experience has been that it opens too easily and given the 'embedded kill switch' it immediately shuts off without warning. Closing the case requires slamming the lid shut several times.

    * The T2000 only supports *half height* PCI-E/X cards. This is an issue with using 3rd party cards.

    * Solaris installation has a nifty power savings feature enabled by default. However, rather than throtteling CPU speed or fans it simply shuts down to the OK prompt after 30 minutes of a 'threshold' not being met. Luckily this 'feature' can be disabled through the OS.

    * Power button -- I ask any T2000 owner to show me one that doesn't have a blue or black mark from a ball point pen on their power button. Sun really needs to make a more usable power button on these systems.

    * Disk drives -- The disk drives are not labeled with FRU numbers or any indication to size and speed.

    * Installing and configuring Solaris on a T2000 versus Linux on an x86 system will take a factor of 10x longer. Most commonly, this is initially done through a hyperterm access through the remote console. (Painful) Luckily subsequent builds can be done through a jumpstart server.

    * HW RAID Configuration -- This can only be done through the Solaris OS commands.

    I hope Anandtech takes up the former call to begin enterprise class server reviews.
  • JohanAnandtech - Thursday, April 6, 2006 - link

    DL385 will be in our next test.

    All other issues you adressed will definitely be checked and tested.

    That it falls short of a full review is clearly indicated by "first impressions" and it has been made clear several times in the article. Just give us a bit more time to get the issues out of our benchmarks. We had to move all our typical linux x86 benchmarks to Solaris and The T1 and keep it fair to Sun. This meant that we had to invest massive amounts of time in migrating databases and applications and tuning them.
  • davem330 - Friday, March 24, 2006 - link

    You aren't seeing the same kind of performance that Sun is claiming
    regarding Spec Web2005 because Sun specifically choose workloads
    that make heavy use of SSL.

    Niagara has on-chip SSL acceleration, using a per-core modular
    arithmetic unit.

    BTW, would be nice to get a Linux review on the T2000 :-)
  • blackbrrd - Saturday, March 25, 2006 - link

    Good point about the ssl.

    I can see both ssl and gzip beeing used quite often, so please include ssl into the benchmarks.

    As mentioned in the article 1-2% of FP operations affect the server quite badly, so I would say that getting one FPU per core would make the cpu a lot better, looking forward to seeing results from the next generation.

    .. but then again, both Intel and AMD will probably have launched quad cores by then...

    Anyway, its interesting seeing a third contender :)
  • yonzie - Friday, March 24, 2006 - link

    Nice review, a few comments though:

    quote:

    Eight 144-bit DDR DIMM slots allow...
    I think that should have been
    quote:

    184-pin
    , although you might mean dual channel ECC memory, but if that's the case it's a strange way to write it IMHO.

    No mention of the Pentium M on page 4, but it shows up in benchmarks on page 5 but not further on... Would have been interesting :-(

    quote:

    There are two ways that the T2000 could be useful as a web server. The first one is to use Solaris zoning (a.k.a. "Solaris containers") techniques to run a lot of light/medium web servers in parallel virtual zones. As virtualisation is still something that requires quite a bit of expertise, and we didn't have much experience with Solaris Zones, we decided to test the second scenario.

    And the second scenario is what exactly? ;-) (yeah, I know it's written a few paragraphs later, but...)

    Oh, and more pretty pictures pls ^_^
  • sitheris - Friday, March 24, 2006 - link

    Why not benchmark it on a more intensive application like Oracle 10g
  • JohanAnandtech - Friday, March 24, 2006 - link

    We are still tuning and making sure our results are 100% accurate. Sounds easy, but it is incredible complex.
    But they are coming

    Anyway, no Oracle, we have no support from them so far.
  • JCheng - Friday, March 24, 2006 - link

    By using a cache file you are all but taking MySQL and PHP out of the equation. The vast majority of requests will be filled by simply including the cached content. Can we get another set of results with the caching turned off?
  • ormandj - Friday, March 24, 2006 - link

    I would agree. Not only that, but I sure would like to know what the disk configuration was. Especially reading from a static file, this makes a big difference. Turn off caching and see how it does, that should be interesting!

    Disk configurations please! :)
  • kamper - Friday, March 31, 2006 - link

    No kidding. I thought that php script was pretty dumb. Once a minute you'll get a complete anomaly as a whole load of concurrent requests all detect an out of date file, recalculate it and then try to dump their results at the same time.

    How much time was spent testing each request rate and did you try to make sure each run came across the anomaly in the same way, the same number of times?

Log in

Don't have an account? Sign up now