Facebook Technology Overview

Facebook had 22 Million active users in the middle of 2007; fast forward to 2011 and the site now has 800 Million active users, with 400 million of them logging in every day. Facebook has grown exponentially, to say the least! To cope with this kind of exceptional growth and at the same time offer a reliable and cost effective service requires out of the box thinking. Typical high-end, brute force, ultra redundant software and hardware platforms (for example Oracle RAC databases running on top of a few IBM Power 795 systems) won’t do as they're too complicated, power hungry, and most importantly far too expensive for such extreme scaling.

Facebook first focused on thoroughly optimizing their software architecture, which we will cover briefly. The next step was the engineers at Facebook deciding to build their own servers to minimize the power and cost of their server infrastructure. Facebook Engineering then open sourced these designs to the community; you can download the specifications and mechanical CAD designs at the Open Compute site.

The Facebook Open Compute server design is ambitious: “The result is a data center full of vanity free servers which is 38% more efficient and 24% less expensive to build and run than other state-of-the-art data centers.” Even better is that Facebook Engineering sent two of these Open Compute servers to our lab for testing, allowing us to see how these servers compare to other solutions in the market.

As a competing solution we have an HP DL380 G7 in the lab. Recall from our last server clash that the HP DL380 G7 was one of the most power efficient servers of 2010. Is a server "targeted at the cloud" and designed by Facebook engineering able to beat one of the best and most popular general purpose servers? That is the question we'll answer in this article.

Cloud Computing = x86 and Open Source
POST A COMMENT

62 Comments

View All Comments

  • FunBunny2 - Saturday, November 05, 2011 - link

    RDBMS is numbered only for those who've no idea that what they think is "new" is just their grandpappy's olde COBOL crap. Just because you're so young, so inexperienced, and so stupid that you can't model data intelligently; doesn't mean you've got it right. But if you're in love with getting paid by LOC metrics, then Back to The Future is what you want. Remember, Facebook and Twitter and such are just toys; they ain't serious. Reply
  • Ceencee - Wednesday, November 09, 2011 - link

    This is completely false, RDBMS have their place but are also extremely inefficient ways to access large amounts of data as ACID compliance hamstrings many of the operations.

    As someone who would consider himself an Oracle Expert I would say that NoSQL databases like Cassandra and HBase are really exciting technology for the future.
    Reply
  • Starfireaw11 - Thursday, November 03, 2011 - link

    I can see how the OpenCompute compares well to a DL380G7 in terms of performance vs power consumption and may compare well in price (those details aren't readily available), but the things that the OpenCompute has going for it are that it has been stripped of unneeded components, fitted with efficient fans and matched to efficient power supplies. From what I have seen and done in and around datacenters, these are exactly the objectives of a blade based system, where you can have large, efficient power supplies, large fans and missing or shared devices that are non-critical. I would like to see this article modified to include a comparison against a blade-based solution of equivalent specification to see how that stacks up - if you can swing it, use a fully populated blade chassis and average out the results against the number of blades. The blades also have an advantage of allowing approximately 14 servers in a 9 RU space - allowing approximately 70 servers per 45 RU rack, vs the 30 odd of the OpenCompute.

    Whenever I need to put equipment into a datacenter, the important specifications are performance, cost price, power efficiency, size, weight and heat. Whenever a large number of servers are required, blades always stack up well, possibly with the exception of weight where there are limitations on floor-loading in a datacenter, but they do compare well with weight when compared to equivalent performing non-blade servers (such as 28 RU of DL380G7s).
    Reply
  • Doby - Saturday, November 05, 2011 - link

    Although I think blades could be favorable if, at least if you take into account the infrastructure reduction such as networking ports. Thing is, if you look at the HP products that are available there are better alternatives.

    HP, as the specific example, has a product call the SL6500. Its a second generation product specifically designed for these types of environements, and meant to compete with exactly the type of system that FaceBook created. A comparitive use case would be a 8 node configuration, which would take up 4U of rack space and could run off of 2-4 PSU that would be shared between the nodes. Additionally it has a shared redundant FAN configuration that uses larger, more efficient fans to cool the chassis. Its like blades, but doesn't have any shared networking, is made specifically to be lighter and cheaper, and has options for lower cost nodes.

    The DL380 has a few things working against it in this comparison, from hot swapable drives, to enterprise class onboard management (iLO, not just basic BMC), reduandant fans, scalable power infrastructure, 6 PCI-E slots, onboard, high perfromance RAID controller, 4 NICs, and simplified servicability with single tool and/or toolless servicibility, and even a display for component failures.

    The SL6500 would be able to have very basic nodes, with non hot swap SATA drives, basic SATA raid function, dual NICs, and features much more inline with the Facebook system. Sure, it woudln't be as specific to Facebooks needs, but would be a more interesting comparison as it would be at least comparing two systems designed for similar roles, not a general enterprise compute node to a purpose built scale out system, but a comparison of 2 scale out platforms.
    Reply
  • Ceencee - Wednesday, November 09, 2011 - link

    The SL6500 chasis with SL160s G6 servers seems to be a good solution to storage level nodes. Wonder if Facebook will release a storage node spec next? Reply
  • Penti - Saturday, November 05, 2011 - link

    You have different cooling requirement also. Obviously Googles or Facebooks option isn't about the maximum density per rack. But they are also not using any traditional hot aisle cold aisle setup. Not will all datacenters be able to handle your 20-30kW rack. In terms off cooling requirements and power. Reply
  • rikmorgan - Thursday, November 03, 2011 - link

    The idle power chart shows HP 160w, Open Compute 118w. That's 42w savings, not 32w. Reply
  • jhh - Saturday, November 05, 2011 - link

    I'm not sure how much of the benchmarks depend on network bandwidth, but Facebook certainly does a lot of it. Using SRIOV based NICs and supporting drivers allows the VM to access virtual NIC hardware directly, without having to go through the hypervisor. But, all NICs aren't built equal, many of them do not support SRIOV, and those that do, may not have drivers which support it in older kernels such as Centos 5.6. Unfortunately, since most Gigabit NICs were designed before SRIOV, most gigabit NICs don't support it. We have great difficulty getting hardware vendors to describe whether the provide SRIOV capable hardware or Linux drivers. The newer 10G NICs tend to support SRIOV, but whether the server needs more than 1G is unclear, and the 10G NICs are more expensive and use more power. Reply
  • CPU-Hog - Sunday, November 06, 2011 - link

    Good comparison of the servers however I couldn't help but think how much better it would be if we ran actual workloads that facebook etc plan to run in the datacenter vs. these enterprise workloads. How about running MemcacheD / Hadoop / HipHop etc. which are the key workloads the OpenCompute servers are designed to run well.

    Many of these workloads need large IO and memory vs. high compute. It will also be interesting to then use the same benchmarks to compare future servers based on technology from newbies like Calxeda, SeaMicro and AppliedMicro.

    Xeon and Opterons based servers vs. ARM and Atom based servers. Now that battle of the old guard vs, the upstarts will be worth seeing.
    Reply
  • trochevs - Wednesday, November 09, 2011 - link

    Johan,
    Thank you for excellent article. I love to read about cutting edge technology. Keep with the good work. But, I notice something that nobody in the comments has mention yet. In the last paragraph:

    "... being inspired by open source software (think ..., ..., iOS, ...)."
    iOS is a Open Source Software?! When this happen?
    Reply

Log in

Don't have an account? Sign up now