The Next Generation: Winterfell

With the PSU and hard disks removed from the server, the only items left in a Freedom chassis were the motherboard, fans and one boot drive. When Facebook's engineers put these things together in a smaller form factor, they created Winterfell, a compute node that's similar to a Supermicro twin node, except in ORv1 three nodes can be placed on a shelf. One Winterfell node is 2 OU high, and consists of a modified Windmill motherboard, a bus bar connector, and a midplane connecting the motherboard to the power cabling and fans. The motherboard is equipped with a slot for a PCIe riser card – on which a full size x16 PCIe card and a half size x8 card can be placed – and a x8 PCIe mezzanine connector for network interfaces. Further connectivity options include both a regular SATA and mSATA connector for a boot drive.


Winterfell nodes, fitted here with optional GPU

But Facebook's ever advancing quest for more efficiency found another target. After an ORv1 deployment in Altoona, it became apparent that having three power zones with three bus bars each was capable of delivering far more power than needed, so they took that information and went on to design the successor, the aptly named OpenRack v2. OpenRack v2 only uses two power zones instead of three, and FB's implementation has only one bus bar segment per power zone, bringing further cost reductions (though the PDU built into the rack is still able to power three). The placement of the power shelves relative to the bus bars was given another thought, this time they were put in the middle of a power zone because of the voltage drop in the bus bar when conducting power from the bottom to the top server. ORv2 also allows for some more ToR machinery by increasing the top bay height to 3 OU.

Open Rack v2, with powerzones on the left. ORv2 filled with Cubby chassis on the right. The chassis marked by the orange rectangle fit 12 nodes. (Image Courtesy Facebook)

The change in power distribution resulted in incompatibility with Winterfell, as the bus bars for the edge nodes are now missing, and so Project Cubby saw the light of day. Cubby is very similar to a Supermicro TwinServer chassis, but instead of having PSUs built in, it plugs into the bus bar and wires three internal power receptacles to power each node. The Winterfell design also needed to be shortened depth-wise, so the midplane was removed.

Another cut in upfront costs was realized by using three (2+1 redundancy) 3.3 kW power supplies instead of six per power zone. With the amount of power zones decreased, each of the two power zones can deliver up to 6.3 kW in ORv2. This freed up some space on the power shelf, so the engineers decided to end the separate battery cabinet that was placed next to the racks. The bottom half of the power shelf now contains three Battery Backup Units (BBU). Each BBU comes with a matrix of Li-ion 18650 cells, equal to those found in a Tesla's battery pack, capable of providing 3.6 kW to bridge the power gap until the generators kick in. Each BBU is paired to a PSU, when the PSU drops out the BBU is active until power is restored.


BBU Schematic (Image Courtesy Facebook)

To summarize, by broadening focus to your bog-standard EIA 19" rack and developing a better integration with the servers and battery cabinets, Facebook was able to reduce costs and added capacity for an additional rack per row.

Mass Storage The Latest and Greatest: Leopard
Comments Locked

26 Comments

View All Comments

  • Black Obsidian - Tuesday, April 28, 2015 - link

    I've always hoped for more in-depth coverage of the OpenCompute initiative, and this article is absolutely fantastic. It's great to see a company like Facebook innovating and contributing to the standard just as much as (if not more than) the traditional hardware OEMs.
  • ats - Tuesday, April 28, 2015 - link

    You missed the best part of the MS OCS v2 in your description: support for up to 8 M.2 x4 PCIe 3.0 drives!
  • nmm - Tuesday, April 28, 2015 - link

    I have always wondered why they bother with a bunch of little PSU's within each system or rack to convert AC power to DC. Wouldn't it make more sense to just provide DC power to the entire room/facility, then use less expensive hardware with no inverter to convert it to the needed voltages near each device? This type of configuration would get along better with battery backups as well, allowing systems to run much longer on battery by avoiding the double conversion between the battery and server.
  • extide - Tuesday, April 28, 2015 - link

    The problem with doing a datacenter wide power distribution is that at only 12v, to power hundreds of servers you would need to provide thousands of amps, and it is essentially impossible to do that efficiently. Basicaly the way FB is doing it, is the way to go -- you keep the 12v current to reasonable levels and only have to pass that high current a reasonable distance. Remember 6KW at 12v is already 500A !! And thats just for HALF of a rack.
  • tspacie - Tuesday, April 28, 2015 - link

    Telcos have done this at -48VDC for a while. I wonder did data center power consumption get too high to support this, or maybe just the big data centers don't have the same continuous up time requirements ?
    Anyway, love the article.
  • Notmyusualid - Wednesday, April 29, 2015 - link

    Indeed.

    In the submarine cable industry (your internet backbone), ALL our equipment is -48v DC. Even down to routers / switches (which are fitted with DC power modules, rather than your normal 100 - 250v AC units one expects to see).

    Only the management servers run from AC power (not my decision), and the converters that charge the DC plant.

    But 'extide' has a valid point - the lower voltage and higher currents require huge cabling. Once a electrical contractor dropped a piece of metal conduit from high over the copper 'bus bars' in the DC plant. Need I describe the fireworks that resulted?
  • toyotabedzrock - Wednesday, April 29, 2015 - link

    48 v allows 4 times the power at a given amperage.
    12vdc doesn't like to travel far and at the needed amperage would require too much expensive copper.

    I think a pair of square wave pulsed DC at higher voltage could allow them to just use a transformer and some capacitors for the power supply shelf. The pulses would have to be directly opposing each other.
  • Jaybus - Tuesday, April 28, 2015 - link

    That depends. The low voltage DC requires a high current, and so correspondingly high line loss. Line loss is proportional to the square of the current, so the 5V "rail" will have more than 4x the line loss of the 12V "rail", and the 3.3V rail will be high current and so high line loss. It is probably NOT more efficient than a modern PS. But what it does do is move the heat generating conversion process outside of the chassis, and more importantly, frees up considerable space inside the chassis.
  • Menno vl - Wednesday, April 29, 2015 - link

    There is already a lot of things going on in this direction. See http://www.emergealliance.org/
    and especially their 380V DC white paper.
    Going DC all the way, but at a higher voltage to keep the demand for cables reasonable. Switching 48VDC to 12VDC or whatever you need requires very similar technology as switching 380VDC to 12VDC. Of-course the safety hazards are different and it is similar when compared to mixing AC and DC which is a LOT of trouble.
  • Casper42 - Monday, May 4, 2015 - link

    Indeed, HP already makes 277VAC and 380VDC Power Supplies for both the Blades and Rackmounts.

    277VAC is apparently what you get when you split 480vAC 3phase into individual phases..

Log in

Don't have an account? Sign up now