Virtualization

Virtualization is an important trend in the server world. Our own experiences with it (for example, VMWare ESX server and MS Virtual Server) show that it is not completely ready for prime time. As an example, we experienced a crash of the Console OS, the linux based OS that controls the Virtual Layer. There is also no support for a 64-bit Guest OS, the OS needs to be binary translated and so on. All this will change with the introduction of hardware supported Virtualization.

The UltraSparc T1 has support for a Hypervisor, which is IBM talk for Virtual Monitor or the virtual layer that runs under the Guest OS. Solaris has excellent support for containers or zones. These are software based partitions[4] in Solaris, and the objective is similar to virtualization: high isolation. Each zone can be individually re-booted, dynamically created and errors in one zone won't affect other zones. This makes the T1 even more suited as a host for multiple tens of websites supporting different clients, as each web server can run in a separate zone on the Solaris OS.

However, when it comes to running different OS, Intel has the advantage. VMWare is going to introduce several server products that make use of Intel's VT technology, and Vmware workstation, Xen and MS Virtual Server can already use Intel's VT technology. (It must be noted that MS Virtual Server is not really a Virtual Machine Monitor as Xen and VMWare ESX server: it needs Windows 2003 or XP to run). So, Intel has the advantage in this arena, while SUN is apparently working hard to get Xen and Linux support for the T1.


Niagara 2

Right now, SUN is definitely a few steps ahead of the competition and it is not sitting still. The 65 nm Niagara 2 is due in 2007 and will feature a slightly higher clock speed (1.4 GHz and higher) and two pipelines [3] per core instead of one. Combined with 8 threads per core, this should allow the new CPU to achieve nearly twice as high IPC per core. The integration will go one step further: X8 PCI Express, a multi-port Gbit Ethernet switch, and more encryption hardware support will be integrated in Niagara-2. The integrated memory controller will also support fully buffered DIMMs.

Based on the technology in the current T1, SUN seems to be on schedule, and they are creating some very compelling designs. There are certainly many ways to tackle computing problems, and it's good to see some new approaches other than the standard "more cache" and "higher clock speeds" that are so common.

References

[1] NIAGARA: A 32-WAY MULTITHREADED SPARC PROCESSOR
- Poonacha Kongetira,Kathirgamar Aingaran, Kunle Olukotun, Sun Microsystems

[2] SUN T1 benchmarks
http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp

[3] Maximizing CMP Throughput with Mediocre Cores - John D. Davis, James Laudon†, Kunle Olukotun

[4] Solaris 10 - What's coming in 2004- Chris Rijk
http://www.aceshardware.com/read_news.jsp?id=75000449

[5] Niagara, a Torrent of threads- Chris Rijk
http://www.aceshardware.com/read.jsp?id=65000292

[6] APPLICATIONS ON ULTRASPARC T1 CHIP MULTITHREADING SYSTEMS
Denis Sheahan, UltraSPARC T1 Architecture Group

The SUN benchmarks …
Comments Locked

49 Comments

View All Comments

  • thesix - Friday, December 30, 2005 - link

    If you're talking about POWER5's SMT, currently it provides two HW threads per core:
    http://publib.boulder.ibm.com/infocenter/pseries/i...">http://publib.boulder.ibm.com/infocente...x.doc/ai...

    If you look closer at T1, the best one has 8 cores, each core supports four HW threads.
    http://www.sun.com/processors/UltraSPARC-T1/">http://www.sun.com/processors/UltraSPARC-T1/

    SMT and CMT appear to be the same type of technology (at least conceptual wise) with different names from two vendors.

    > The very very poor FP performance of T1 is the truth.
    > We have to remind ourselves that it is only a integer CPU. It's FP performance is too terrible.

    OK. Since you have repeated so many times, I am sure everyone who's reading this will remember, and I do not disagree :-).

    Thanks.
  • Betwon - Friday, December 30, 2005 - link

    We think that it is diffirent between CMT and SMT.

    For exapmle:
    P4 630 is a kind of SMT CPU, but not a CMT CPU.
    AthlonX2 is a kind of CMT CPU, but not a SMT CPU.

    From anandtech:
    T1 has no branch prediction,and it has only one-instruction-issue/core, 8KB L1D/core(too few for 4 threads to use).

    POWER5 has 32KB L1D/core, which is used by two threads.

    We think that the SMT of T1 may be OK, unless 4 threads only use very few L1D cache(It is impossible for most cases)
  • Betwon - Friday, December 30, 2005 - link

    edit:
    The only explain about how to improve the efficiency(very poor) is to use SMT to hide the stall's latency(by branch miss/cache miss ect.)

    But a core has only 8KB L1(which will be used by 4 threads), the cache miss will increase. It is possible to become worst.
  • Betwon - Friday, December 30, 2005 - link

    edit: T1 have no branch prediction and it has only one_inst_issue/core.
  • Brian23 - Friday, December 30, 2005 - link

    Obviously the apps that they used to benchmark in this article like running on the chip. Also, this chip doesn't run windows. It runs Sun's proprietary operating system. (I forgot what it's called.) Sun will give this new chip software support because they want it to do well.

    I think I read in the article that the chip is backwards compatable with the previous design Sun chips, meaning a lot of software is already available that will run on the chip.
  • Betwon - Friday, December 30, 2005 - link

    NO!

    It is too narrow for the areas of 32-thread-parallel-well apps.

    'have many threads' is not equal to '32-thread-parallel-well'!

    Even there are 32 threads, but without parallel-well , This new CPU will waste more than 90% of it's potential.

    The efficiency of Itanium( Itanium is capable of a 1.3-1.5 IPC) is much better than x86-CPU(0.7-0.9 IPC). Itanium never used OOO logic and long pipelines.
  • Betwon - Friday, December 30, 2005 - link

    The efficiency of Itanium2 is still better than IBM's POWER5, and a Itanium2 core may retire 6 instrutions/cycle,and POWER5's can retire 5-instrutions/cycle.

    But a core of this new CPU is only one instrutions/cycle.
  • Brian23 - Friday, December 30, 2005 - link

    I think you missed the part where x86 chips spend 400 cycles waiting on memory accesses when the Sun chip just keeps chugging with another thread while the load is happening.
  • Calin - Tuesday, January 3, 2006 - link

    Those 400 cycles are related to the higher clock speed (if your processor would be twice as slow, it would wait only 200 cycles). I assume the 400 cycles are based on the Xeon processor (that has high clock speed and slower FSB).
  • Betwon - Friday, December 30, 2005 - link

    NO!
    It is not true for all the x86 CPU.When Athlon64 spend many cycles waiting on memory accesses,
    For P4 with HT,P4 just keeps chugging with another thread while the load is happening.

    Do you understand what I want to say?

Log in

Don't have an account? Sign up now