Visual Inspection

I knew server boards were large, but coming from the ATX and E-ATX standards, this thing is huge.  It measures 330mm x 305mm (13” x 12”) which correlates to the SSI EEB specification for server motherboards.  This is the size exact size of an official E-ATX motherboard (despite a small amount of loose definition), but to put the icing on the cake, the mounting holes for the motherboard are different to the normal ATX standards.  If we took a large case, like the Rosewill Blackhawk-Ultra, it supports ATX, SSI CEB, XL-ATX, E-ATX and HPTX, up to 13.6” x 15”, but not SSI EEB.  Thus drilling extra holes for standoffs may be required.

Unlike the SR-X or Z9PE-D8 WS, the GA-7PESH1 supports two memory modules per channel for all channels on board.  In terms of specifications this means support for up to 128 GB UDIMM (i.e. regular DDR3), 128 GB UDIMM ECC, and 512 GB RDIMM ECC.  Due to the nature of the design, only 1066-1600 MHz is supported, but the GA-7PESH1 supports 1600 MHz when all slots are populated.  For our testing, Kingston has kindly supplied us with 8x4GB of their 1600 C11 ECC memory.

As with the majority of server boards, stability and longevity is a top priority.  This means no overclocking, and Gigabyte can safely place a six phase power delivery on each CPU – it also helps that all SB-E Xeons are multiplier locked and there is no word of unlocked CPUs being released any time soon.  As we look at the board, standards dictate that the CPU on the right is designated as the first CPU.  Each CPU has access to a single fan header, and specifications for coolers are fairly loose in both the x and the y directions, limited only by memory population and the max z-height of the case or chassis the board is being placed into.  As with all dual CPU motherboards, each CPU needs its own Power Connector, and we find them at the top of the board behind the memory slots and at opposite ends.  The placement of these power connectors is actually quite far away for a normal motherboard, but it seems that the priority of the placement is at the edge of the board.  In between the two CPU power connectors is a standard 24-pin ATX power connector.

One of the main differences I note coming from a consumer motherboard orientation is the sheer number of available connectors and headers on such a server motherboard.  For example, the SATA ports have to be enabled by moving the jumpers the other side of the chipset.  The chipset heatsink is small and basic – there is no need for a large heatsink as the general placement for such a board would be in a server environment where noise is not particularly an issue if there are plenty of Delta fans to help airflow.

On the bottom right of the board we get a pair of SATA ports and three mini-SAS connections.  These are all perpendicular to the board, but are actually in the way of a second GPU being installed in a ‘normal’ motherboard way.  Users wishing to use the second PCIe x8 slot on board may look into PCIe risers to avoid this situation.  The heatsink on the right of this image covers up an LSI RAID chip, allowing the mSAS drives to be hardware RAIDed.

As per normal operation on a C602 DP board, the PCIe slots are taken from the PEG of one CPU.  On some other boards, it is possible to interweave all the PCIe lanes from both CPUs, but it becomes difficult when organizing communication between the GPUs on different CPUs.  From top to bottom we get an x8 (@x4), x16, x8 (@x4), x16 (@x8), x4(@x1).  It seems odd to offer these longer slots at lower speed ratings, but all of the slots are Gen 3.0 capable except the x4(@x1).  The lanes may have been held back to maintain data coherency.

To those unfamiliar with server boards, of note is the connector just to the right of center of the picture above.  This is the equivalent of the front panel connection on an ATX motherboard.  At almost double the width it has a lot more options, and where to put your cables is not printed on the PCB – like in the old days we get the manual out to see what is what.

On the far left we have an ASPEED AST2300 chip, which has multiple functions.  On one hand it is an onboard 2D graphics chip which powers the VGA port via its ARM926EJ (ARM9) core at 400 MHz.  For the other, it as an advanced PCIe graphics and remote management processor, supporting dual NICs, two COM ports, monitoring functions and embedded memory.  Further round this section gives us a removable BIOS chip, a COM header, diagnostic headers for internal functions, and a USB 2.0 header.

The rear IO is very bare compared to what we are normally used to.  From left to right is a serial port, the VGA port, two gigabit Ethernet NICs (Intel I350), four USB 2.0 ports, the KVM server management port, and an ID Switch button for unit identification.  There is no audio here, no power/reset buttons, and no two-digit debug LED.  It made for some rather entertaining/hair removing scenarios when things did not go smoothly during testing.

Board Features

Gigabyte GA-7PESH1
Price Contact:
17358 Railroad St.
City of Industry
CA 91748
+1-626-854-9338
Size SSI EEB
CPU Interface LGA 2011
Chipset Intel C602
Memory Slots Sixteen DDR3 DIMM slots supporting:
128GB (UDIMM) @ 1.5V
512GB (RDIMM) @ 1.5V
128GB DDR3L @ 1.35 V
Quad Channel Arcitecture
ECC RDIMM for 800-1600 MHz
Non-ECC UDIMM for 800-1600 MHz
Video Outputs VGA via ASPEED 2300
Onboard LAN 2 x Intel I350 supporting uo to 1000 Mbps
Onboard Audio None
Expansion Slots 1 x PCIe 3.0 x16
1 x PCIe 3.0 x16 (@ x8)
2 x PCIe 3.0 x8 (@ x4)
1 x PCIe 2.0 x4 (@ x1)
Onboard SATA/RAID 2 x SATA 6 Gbps, Supporting RAID 0,1
2 x mini-SAS 6 Gbps, Supporting RAID 0,1
1 x mini-SAS 3 Gbps, Supporting RAID 0,1
USB 6 x USB 2.0 (Chipset) [4 back panel, 2 onboard]
Onboard 2 x SATA 6 Gbps
2 x mSAS 6 Gbps
1 x mSAS 3 Gbps
1 x USB 2.0 Header
4 x Fan Headers
1 x PSMI header
1 x TPM header
1 x SKU KEY header
Power Connectors 1 x 24-pin ATX Power Connector
2 x 8-pin CPU Power Connector
Fan Headers 2 x CPU (4-pin)
2 x SYS (4-pin, 3-pin)
IO Panel 1 x Serial Port
1 x VGA
2 x Intel I350 NIC
4 x USB 2.0
1 x KVM NIC
1 x ID Switch
Warranty Period Refer to Sales
Product Page Link

Without having a direct competitor to this board on hand there is little we can compare such a motherboard to.  In this level having server grade Intel NICs should be standard, and this board can take 8GB non-ECC memory sticks or 32GB ECC memory sticks, for a maximum of 512 GB.  If your matrix solvers are yearning for memory, then this motherboard can support it.

The Perspective Gigabyte GA-7PESH1 BIOS
POST A COMMENT

64 Comments

View All Comments

  • nevertell - Sunday, January 06, 2013 - link

    The K version may not, but the standard i7-3770 does in fact support VT-D, TXT and ECC memory from the get go. Vt-D has to be also supported by the motherboard, which may be problematic on consumer motherboards. I have a i5-2400 myself, and Vt-d is a pain to setup and to this day I still haven't found out whether is it that I am unable to set up Xen properly or just that my cheap motherboard worn't support VT-d, to properly assign a video card to a virtual machine. Reply
  • KAlmquist - Sunday, January 06, 2013 - link

    The 3770K lacks those features, but that doesn't invalidate my point.

    Using ECC memory improves system availability, and likely decreases the probability of undetected errors resulting in incorrect computations. If these are important to you, then you should be thinking about full double or triple redundancy. Why not buy three 3770K based systems and run the same simulation on all of them? Most of the time you will get identical results on all three systems, but on rare occasions one of the systems will die during the run. No problem; you have the simulation results from the other two systems. On even rarer occasions, one of the systems will produced an incorrect result due to an undetected bit error. Again no problem; you take the results from the two simulations that agree.

    With full redundancy it doesn't matter where in the system the error occurs because full redundancy addresses faults anywhere in the system. This makes it superior to ECC memory, which only addresses faults in the memory subsystem. So the only reason to go with ECC memory instead of full triple redundancy is if the ECC memory approach costs less. Based on the numbers I posted, you aren't going to get a lower cost based on hardware costs alone. Possibly you could get there by including administrative costs and the like.

    I'm not saying that the system Ian tested wouldn't make sense under *any* circumstances. My point is that the system has a poor price performance ratio, so it only makes sense when a lot of things are working in its favor.

    The second feature you mention is VT-D, which makes it more efficient to emulate device hardware in virtual machines. I don't have any benchmarks, but my guess is that the performance improvement from VT-D is fairly small. In any case, if you want VT-D you can buy the 3770 rather than the 3770K. You can't overclock the 3770, but my comments about the 3770K offering "similar performance" were based primarily on the performance of the 3770K at stock frequency. If you assume that everyone is going to take the time to find an optimal overclock for their CPU, then the E5-2690 (which cannot be overclocked) looks even worse.

    I suppose it's off topic to debate the merits of "trusted execution technology" here, so I will simply note that if for whatever reason you want a processor that supports it, the solution is the same as for VT-D: get the 3770 instead of the 3770K.
    Reply
  • Kevin G - Saturday, January 05, 2013 - link

    A very well written article that sticks toward its purpose: scientific computing. Really pleased to see articles like this on the site even if I have a few minor quibbles.

    On page 2 "To those unfamiliar with server boards, of note is the connector just to the right of center of the picture above." is either oddly worded to describe the front panel connector at the bottom the board (which is indeed right of center but not in the center of the picture) or describing a connector that isn't even documented in the manual. For clarification I'm looking at the connector just right of the top PCI-E 16x slot (above and to the left of the battery). Actually, what is that connector labeled as? I've seen it on other Xeon boards but have never seen it used.

    The last paragraph on page 2 should read omits the possibility of nonbuffered ECC memory and implies the usage of unbuffered non-ECC memory. I haven't found confirmation that this board can accept unbuffered, non-ECC memory (opposed to the possibility of an ECC requirement as some server vendors enforce).

    A couple of notes on the little processor talk on page 6. Dealing with cache thrashing between L3 and L2 is possible but when dealing with a high number of threads general coherency becomes a bigger factor. The overhead is beginning to exceed the benefit of having the additional hardware to run them. If you're lucky to be dealing with an algorithm that doesn't need such coherency overhead, then chances are it is very ideal for GPU compute (and memory capacity isn't a factor). A minor nicety would have been to see some more testing without Hyperthreading on the i7-3770k, i7-3930k, and i7-3960X to better indicate scaling with/without Hyperthreading. I suspect that those single socket processors would have been able to show some small gains with Hyperthreading where the dual socket system did not.

    An extension to the L2/L3 cache talk on page 6 is the move to dual sockets and NUMA. There is a performance penalty due to latency for having one thread access memory that is found on a remote socket. Memory mirroring between sockets can eliminate that remote penalty while increasing RAS but at the cost of halving effective memory capacity. The manual isn't clear if mirroring mode or the lockstep mode is across different sockets (it can be done across memory channels as well).

    I'd also would have loved to have heard some comparisons with the Gigabyte GA-X79S-UP5. While the name implies an X79 chipset, it uses the C606 chipset. It'll support ECC memory with socket 2011 Xeons and plenty of over clocking features (for the daring). Comparing the GA-7PESH1 to the GA-X79S-UP5 would have been able to answer if the move to dual sockets would have been worth the extra cost.
    Reply
  • Hakon - Saturday, January 05, 2013 - link

    Somehow does read like an anonymous peer review :-) Reply
  • Kevin G - Sunday, January 06, 2013 - link

    A little bit. :)

    Part of my criticism isn't about the article itself but rather the general state of massively multithreaded hardware and software. The hardware portion is quickly running into software limitation that were never expected to be reached in the professional space. A decade ago who thought that a scientist could purchase a 240 simultaneous thread processor that would fit on a mere expansion card? In some cases we don't reach Amdal's Law before hitting an artificial barrier due to scheduling or coherency overhead.

    I just noticed that the system was using Win 7 Professional which has a limit of 64 concurrent threads per process. A quad socket LGA 2011 config would actually be at the very limit of what Window 7 (or rather 2008R2 since professional only scales to two sockets) can handle. OpenMP can handle more than 64 concurrent threads but on Windows it has to submit this limitation.
    Reply
  • psyq321 - Sunday, January 06, 2013 - link

    As for the GA-X79S-UP5 Clocking features are only working for 1P Xeons, which are basically similar to HEDT i7 (36xx) line. With those, customer has an advantage of ECC RAM support and still some overclocking headroom.

    Clocking 2P/4P Xeons E5 (sadly, these are the only 8-core parts so far) is next to impossible due to the lack of ICC configuration data allowing changing BCLK ratios. These Xeons can only be bumped by direct BCLK increase, which is dangerous above few MHz. At most, 5-6 MHz is feasible as tested on ASUS Z9PE-D8-WS and EVGA SR-X boards.

    Memory overclocking is another matter, completely. I have excellent results with Samsung's 1.35v ("low voltage") ECC RAM. It is not just the cheapest 16 GB ECC option (~$160 for the 16 GB ECC stick last time I checked, I got mine for 140 EUR in Germany 7 months ago), but it is the fastest while still keeping the low voltage. This RAM can be overclocked to 2133 MHz by a simple voltage bump to 1.55v, which is still within Xeon's VSa limits.
    Reply
  • Kevin G - Sunday, January 06, 2013 - link

    Weird that Intel doesn't provide the ICC configuration data. The 'gear ratio' change is something I'd still expect to change on true X79 boards regardless of processors (I can see Intel crippling this on C600 series). Then again, I've heard some weird situations with LGA 2011 Xeons in desktop boards. There are some scattered reports of unlocked chips but as the internet goes there are lots of speculation and rumors but little real confirmation.

    Those Samsung 16 GB ECC sticks are registered? I thought that the GA-X79S-UP5 didn't registered DIMMs.

    As for the ability to overclock those low voltage DIMMs, not really surprised as they've historically been impressive in that regards. I have some older 4 GB 1.35v DDR3-1333 rated sticks that can go to 1866 Mhz at 1.5v. :) The timings had to be changed but still impressive.
    Reply
  • PEPCK - Saturday, January 05, 2013 - link

    Worth noting that the three miniSAS connectors yield 8 SAS and 4 SATA connectors in the specification table. Reply
  • krumme - Sunday, January 06, 2013 - link

    For this article Ian get the über nerds Gold Award only given ones in a century Reply
  • lowenz - Sunday, January 06, 2013 - link

    A brilliant article.

    More of these, please.
    Reply

Log in

Don't have an account? Sign up now