The SUN benchmarks ...

Although we haven't run benchmarks yet, the benchmarks that SUN presents[2] are still interesting. We'll delve deeper once we have our own benchmarks. The power consumption numbers are estimates. We tried to give you both the typical and the maximum values. Some manufacturers give only typical numbers (Intel, IBM) while others only give maximum numbers (AMD), so we had to find other sources and base our estimates upon them.

JBB2005 represents an order processing application for a wholesale supplier written in Java.

Specjbb2005

System CPU Power Dissipation CPUs (Estimated) Number of cores Number of Active threads Score Percentage score
Sun Fire T2000 1x 1.2GHz UltraSPARC T1 72-79 W 8 32 63,378 160%
Sun Fire X4200 2x 2.4GHz DC Opteron 150-180 W 4 4 45,124 114%
IBM p5 550 2x 1.9GHz POWER5+ 320-360 W 4 8 61,789 156%
IBM xSeries 346 2x 2.8GHz DC Xeon 270-300 W 4 8 39,585 100%

The performance of the T1 is simply amazing. Of course, this is an ideal benchmark for the T1 with many java threads. The Power 5+ is the only one that comes close, as it can process 8 threads simultaneously just like the T1. But it consumes +/- 4 times more than the T1.

SPECweb2005 emulates users sending browser requests over broadband Internet connections to a web server. It provides three new workloads: a banking site (HTTPS), an e-commerce site (HTTP/HTTPS mix), and a support site (HTTP). Dynamic content is implemented in PHP and JSP.

Specweb2005

System Processors Power Dissipation CPUs (Estimated) Number of cores Number of Active threads Score Percentage score
Sun Fire T2000 1x 1.2GHz UltraSPARC T1 72-79 W 8 32 14,001 289%
IBM p5 550 2x 1.9GHz POWER5+ 320-360 W 4 8 7,881 162%
IBM xSeries 346 2x 3.8GHz Xeon 220-260 W 4 4 4,348 90%
Dell 2850 2x 2.8GHz DC Xeon 260-300 W 4 8 4,85 100%

Here, the T1 is by far the best CPU. This is, however, a very hard to interpret benchmark. For example, back in 2003, I did some benchmarking on a JSP server. Our first results were very weird: a single Xeon performed just as well as a dual Xeon, despite the fact that the Gigabit PCI NIC was not at its limits at all (about 180 Mbit/s). Once we used an Intel NIC, things became better, but the network bottleneck wasn't gone before we used a CSA (directly connected to the Northbridge) Intel NIC. The benchmark depends more on the quality of the NIC driver, the latency from the NIC to the memory (DMA) and of course, the quality of the NIC chip itself than on the CPU. That being said, it is clear that Web servers spawn a lot of threads that do not require a lot of processing unless they are encrypted. So, this is the natural habitat of the T1 CPU. As long as you can make sure that the CPU is the bottleneck, the CPU which can perform the most threads per cycle will win.

SAP 2 Tier is based on the number one ERP software. The database back-end and application run on the same machine.

System Processors Power Dissipation CPUs (Estimated) Number of cores Number of Active threads Score Percentage score
Sun Fire T2000 1x 1.2GHz UltraSPARC T1 72-79 W 8 32 4780 97%
IBM p5 550 2x 1.9GHz POWER5+ 320-360 W 4 8 5020 102%
HP DL580 4x 3.33GHz Xeon MP 440-520 W 4 8 4700 96%
HP DL385 2x 2.2GHz DC Opteron 140-180 W 4 4 4920 100%

SAP 2-tier is a typical example of a benchmark with very low IPC. However, some of the queries are more complex, so the T1 cannot outperform the fatter cores. Still, the performance per watt is unbeatable.


Unbeatable?

The words "paradigm shift" and "disruptive" technology have been abused so many times that we don't like to use them. But in the case of the T1 CPU, it wouldn't be exaggerated to say that it is the herald of a new generation of server CPUs, and that it has disrupted the server market. Single core, single threaded CPUs do not have a chance in this market anymore. Does this also signal the end of superscalar CPUs in the server environment? Is the massive multi-core with scalar cores the future for the entire server world? The SUN UltraSparc T1 simply wipes the floor with the competition when it comes to performance per Watt. According to this metric, the UltraSparc T1 is 4 to 12 times better.


Fig 7: The cores of the T1 processor are hardly warmer than the rest of the die. A "fat" core has much more hotspots.

However, we think that there are also opportunities for the fatter cores. The main weakness of the T1 is the shallow pipeline and clock speed. The need to be compatible with the previous Sparcs and thus, the need for the relatively big Register Window system (with 1 cycle access) also limits clock speed. While the competition has bigger cores, it does not need as many cores as the T1. Each superscalar core could make better use of its resources by using Coarse Grained Multi threading (Montecito), FMT or SMT (Power 5). That should allow these kinds of cores to achieve higher IPC per core. Clock speed can be 2- 3 times higher, allowing two dual cores or one quad core "fat" CPUs to outperform the T1.

These kinds of CPUs consume quite a bit more power, but as long as this extra power usage is not dramatically higher, fat cores might still have a good chance in the market. After all, it is total system power that counts, and large RAID arrays and AC units often represent larger power draws than just the CPU. With the exception of the web server market, power consumption is not the number one priority most of the time, although it is important.

A study sponsored by SUN[3] shows that the best results in commercial server loads are achieved with 4 to 6 threads per core, combined with 2 to 3-way superscalar in order cores. This is another indication that there is a lot of room for very different multi-core approaches such as Intel's Montecito, IBM Power 6+ and upcoming multi-core Xeons and Opterons. A multi-threaded 64-bit version of Sossaman (31 Watt TDP per two cores) could also threaten the UltraSparc T1.

In some server related markets, fat multi-cores might even be more preferable. Once such market is the OLAP databases, where very complex queries are sent by a limited number of users. The response time of the T1 could be rather mediocre there, while a higher clocked CPU with fewer cores could be quite a bit more responsive in these loads. Also, OLAP queries that calculate statistical data will use more FP instructions.
The 8 little cores that could Virtualization
Comments Locked

49 Comments

View All Comments

  • thesix - Thursday, December 29, 2005 - link

    "Hypervisor" is a technology used mostly by IBM from mainframe days. Every system vendor can implement this technology in their systems.
  • pmurphy - Thursday, December 29, 2005 - link

    Actually lets start by saying you're missed on aceshardware.. and I do have to wonder how you felt about the oath of allegiance to Intel anandtech requires?

    Ah well, all that aside the most glaring omission with respect to the Niagara II is the fact that it has a full floating point component in each core - meaning that the current floating point limitation will largely go away.

    In addition: you cite (as a lot of other people do to) this 1.2Ghz "maximum" as if it had reality - it does not. As issued, the T1 incorporates some design trade-offs that make higher cycle rates impractical, but those are the result of engineering vs. marketing (time and cost) trade-offs, not inherent consequences of the technology. Sun has faster test units running now - with very high end products in the pipeline.
  • defter - Thursday, December 29, 2005 - link

    "Ah well, all that aside the most glaring omission with respect to the Niagara II is the fact that it has a full floating point component in each core - meaning that the current floating point limitation will largely go away."

    Floating point limitation won't go away, 8 FPUs@1.4GHz will just make floating point capabilities of the chip somehow useful. For the comparison dual-core Opteron has 6 FPUs@2.4GHz NOW and in 2007 there will be quad-core Opterons (12 FPUs) available.

    As somebody already mentioned, performance/$ is also very important. While T1 is way faster than any other chip, I guess it will cost much more, probably more than 2 high end dual-core Opterons.

    I'm not saying that T1 isn't good. It is, but only in certain tasks.
  • JohanAnandtech - Thursday, December 29, 2005 - link

    I don't think it is a tradition at Anandtech to swear allegiance to Intel, or either they have forgotten to tell me.:-)

    All jokes aside, When I say Intel has the advantage on hardware VT technology and the software support needed, that is solely based on facts. Sun is actively trying to get full support of Xen (VM), and also Linux and FreeBSD OS support, but for the moment T1 is Solaris only if you want good software support.

    AFAIK there is no indication that SUN can go much faster than 1.2 GHz. To let the 4 threads access a 5.7 KB register file in one cycle is probably limiting the clockspeed, and the 6 stage pipeline is another clear indication that this CPU won't clock much higher. SUN counting on 65 nm to increase the clockspeed higher (1.4 GHz and more) is another indication.


  • ravedave - Thursday, December 29, 2005 - link

    When might we expect to see Anandtech benchmarks? 1-2 months?
  • Puddleglum - Thursday, December 29, 2005 - link

    The [2] SUN T1 benchmarks reference link is pointing to a bizarre location at intel.com. The text says sun.com, but the link points to intel.com.

    It should be fixed to point to: http://www.sun.com/servers/coolthreads/t1000/bench...">http://www.sun.com/servers/coolthreads/t1000/bench...
  • ncage - Thursday, December 29, 2005 - link

    It looks like sun is back with a vengance. This thing seems perfect for the server market. I am really suprised that they were able to get their $hit back together. I dought the single threaded performance on this thing would be that great but, then again, who cares this thing is a server not a workstation made for single threaded use. This thing would be perfect for virtualization. I don't know if this is possible for solaris or maybe vmware/ms virtual server will have this feature in the future but hopefully they will allow you to allocate which core to which virtualization layer that you want. So say your running 4 OS and you have 8 cores. You allocate 2 cores to each OS. You notice that 2 of the four high really high cpu utilization. You could then dynamically add one more core to each of the virtualized OS that had high cpu usage from the ones that had low cpu usage. For those of you who think virtualization isn't a big deal...now wouldnt' this be cool.
  • Slaimus - Thursday, December 29, 2005 - link

    Are these benchmarks all running similar TCP/IP stacks? We all know solaris 10 has a new TCP/IP stack that is much faster than linux.
  • Puddleglum - Thursday, December 29, 2005 - link

    The benchmarks are from Sun's website (http://www.sun.com/servers/coolthreads/t1000/bench...">link)
    "SPECjAppServer2004 is the only industry-standard benchmark used for Java Enterprise Edition application servers."

    So, yes, you can assume they're all using the same TCP/IP stack. But, as the article mentions: "Of course, this is an ideal benchmark for the T1 with many java threads."

Log in

Don't have an account? Sign up now