The Xeon Phi at work at TACC

Name: The Xeon Phi at work at TACC
Item: The Xeon Phi at work at TACC
Author: Johan De Gelas

by Johan De Gelas on November 14, 2012 1:44 PM EST

46 Comments | Add A Comment

46 Comments

We had the chance to briefly visit Stampede, the first Supercomputer based upon the Xeon Phi. This is one of the supercomputers at the Texas Advanced Computing Center (TACC).

Stampede consist of 6400 PowerEdge C8220X and C8220 server Sleds. Typically these servers contain two octal core Xeon E5s, 32 GB of RAM and one GPU/MIC.

Eight of those server sleds find a home inside the C8000 4U Chassis, together with two power sleds.

Dual ported Mellanox ConnectX with FDR infiniband interfaces connects all those servers together to form one large supercomputer. In each rack you can find on 8 C8000s on average.

Connect 200 racks together and you get the Stampede supercomputer:

The Xeon E5s deliver two Petaflops at the moment. When all Xeon Phi are in place, an additional 8 Petaflops will be available to researchers on Stampede.

Intel Xeon Phi is not a standalone replacement to a GPU. For example, the Xeon Phi has no texture units. As a result remote visualization is done by 128 NVIDIA Tesla K20 GPUs. The rest of the supercomputer: 272 TB total memory and 14PB of disk storage. The complete supercomputer and the necessary cooling will require up to 6 megawatts of power.

The Xeon Phi Cards Coding for Xeon Phi

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

46 Comments

View All Comments

Kevin G - Saturday, November 17, 2012 - link
It is dependent upon the bus encoding. PCI-E 1.0/2.0 use an 8/10 encoding scheme to handle traffic while PCI-E 3.0 uses 128/130 encoding. PCI-E only increases the clock speed of the bus by 66% with the rest of the bandwidth increase stemming from the more efficient encoding schema. Xeon Phi seems to have kept the PCI-E 1.0/2.0 encoding but supports the higher clock rate of PCI-E 3.0. This appears to be nonstandard but the LGA 2011 Xeons appear to support this for additional bandwidth.

Any overhead is likely adding full PCI-E 3.0 support in addition to PCI-E 1.0/2.0.
mayankleoboy1 - Thursday, November 15, 2012 - link
Assuming you can buy a single Xeon phi card, can it work in desktop motherboards and processors ?
Can it work with AMD processors ? Can it work in tandem with Nvidia and ATI GPU's ?
Joschka77 - Thursday, November 15, 2012 - link
i think the answers would be: Yes, Yes, no! ;-)
Jaybus - Thursday, November 15, 2012 - link
No, it is yes, yes, and yes. The Stampede also uses 128 NVIDIA Tesla K20 GPUs, as stated in the article.
Kevin G - Saturday, November 17, 2012 - link
That's with in the cluster, not necessarily in the same host system. I strongly suspect that the visual nodes featuring K20 GPU's are isolated from the Xeon Phi nodes.
maximumGPU - Thursday, November 15, 2012 - link
can't deny that openMP code that automatically runs faster on the phi would represent a great solution for those looking for the speed up without the cost and time of modifying code for gpus. There certainly is a market to cater for with these cards.
creed3020 - Thursday, November 15, 2012 - link
Johan,

The numbers you describe as to configuration of the units doesn't add up.

Eight of the compute sleds plus two PSU sleds cannot fit into a 4U unit. Judging by the photo it appears that in this vertical configuration of nodes it goes something like this: {Compute, Compute, PSU, PSU, Compute, Compute } 4U. It appears this leads to a total of 5 chassis per cluster within the rack.

This is then compounded with two distinct clusters per rack with their own Infiniband switch + regular Ethernet switch. This makes for a total of 10 C8000 chassis per rack.

This makes sense when considering a 48U rack. 22U per cluster x 2 clusters per rack = 44U with a few spare slots at the top and middle. I could two at top and one in the middle.
llninja1 - Thursday, November 15, 2012 - link
I think the author got some Dell facts mixed up. Looking at Dell.com you can fit 8 compute sleds in, but those compute sleds are half width and don't contain the necessary double wide PCIe slots to accommodate a Xeon Phi card. So in a single 4U unit, you are correct it can only hold 4 compute sleds and two power units as depicted in the Stampede picture.
creed3020 - Friday, November 16, 2012 - link
Thanks for the explanation. I didn't go over to the Dell site but it would explain that a slimmer sled is possible if you don't have to stick one these huge 2 slot Xeon Phi cards in.

It makes me wonder how big a difference there is in total PFLOPS/rack when configured with the half height sled vs. full height sled with Phi.
mfilipow - Thursday, November 15, 2012 - link
"Eight of those server sleds find a home inside the C8000 4U Chassis, together with two power sleds." did you write - but on the photo I only see 4x computing sleds plus 2x power. Where are the other 4?!

The Xeon Phi at work at TACC

Post Your Comment

46 Comments

View All Comments

Kevin G - Saturday, November 17, 2012 - link

mayankleoboy1 - Thursday, November 15, 2012 - link

Joschka77 - Thursday, November 15, 2012 - link

Jaybus - Thursday, November 15, 2012 - link

Kevin G - Saturday, November 17, 2012 - link

maximumGPU - Thursday, November 15, 2012 - link

creed3020 - Thursday, November 15, 2012 - link

llninja1 - Thursday, November 15, 2012 - link

creed3020 - Friday, November 16, 2012 - link

mfilipow - Thursday, November 15, 2012 - link

Log in

Don't have an account? Sign up now