Cloud = x86 and open source

From a high-level perspective, the basic architecture of Facebook is not that different from other high performance web services.

However, Facebook is the poster child of the new generation of Cloud applications. It's hugely popular and very interactive, and as such it requires much more scalability and availability than your average website that mostly serves up information.

The "Cloud Application" generation did not turn to the classic high-end redundant platforms with heavy Relational Database Management Systems. A combination of x86 scale-out clusters, open source websoftware, and "no SQL" is the foundation that Facebook, Twitter, Google and others build upon.

However, facebook has improved several pieces of the Open Source software puzzle to make them more suited for extreme scalability. Facebook chose PHP as its presentation layer as it is simple to learn, write, and read. However, PHP is very CPU and memory intensive.

According to Facebook’s own numbers, PHP is about 39 times slower than C++ code. Thus it was clear that Facebook had to solve this problem first. The traditional approach is to rewrite the most performance critical parts in C++ as PHP Extensions, but Facebook tried a different solution: the engineers developed HipHop, a source code transformer. Hiphop transforms the PHP source code into faster C++ code and compiles it with g++.

The next piece in the Facebook puzzle is Memcached. Memcached is an in-RAM object caching system with some very cool features. Memcached is a distributed caching system, which means a memcached cache can span many servers. The "cache" is thus in fact a collection of smaller caches. It basically recuperates unused RAM that your operating system would probably waste on less efficient file system caching. These “cache nodes” do not sync or broadcast and as a result the memory cache is very scalable.

Facebook quickly became the world's largest user of memcached and improved memcached vastly. They ported it to 64-bit, lowered TCP memory usage, distributed network processing over multiple cores (instead of one), and so on. Facebook mostly uses memcached to alleviate database load.

Facebook Technology Overview The Facebook Open Compute Servers
POST A COMMENT

67 Comments

View All Comments

  • twhittet - Thursday, November 03, 2011 - link

    I would assume cost is also a major factor. Why pay for so many features you don't need? Manufacturing costs should be lower if they actually build these in bulk. Reply
  • jamdev12 - Thursday, November 03, 2011 - link

    I would definitely have to agree with you on this notion. HP servers are pretty expensive when you take into account 3 year warranties and 24/7 replacement options that going with a open compute server is a nice alternative to the "I can do everything" server. Better to stick to something you can do pretty well and efficiently than I can do many things poorly. Reply
  • haplo602 - Friday, November 04, 2011 - link

    this is an option for somebody with a custom built infrastructure and dedicated DC services. however a general purpose server CANNOT do without.

    since the server category is different (general purpose vs custom built) the HP one does well (I'd say even excelent).
    Reply
  • HollyDOL - Thursday, November 03, 2011 - link

    I would be quite interested how they determined Java and C# are 2/3x slower than C++. Since it seems pretty non-corresponding with reality to me. I have seen a few tests C++ vs. Java and the differences were in matter of %. As well as C# in my experience does the same jobs little bit faster than Java and the benchmark results generally confirm it.
    few links:

    http://blog.cfelde.com/2010/06/c-vs-java-performan...
    http://reverseblade.blogspot.com/2009/02/c-versus-...
    Reply
  • setzer - Thursday, November 03, 2011 - link

    I'm guessing they are comparing their algorithms and I hope they are good programmers for all the languages they tested otherwise the tests don't mean anything. Reply
  • Taft12 - Thursday, November 03, 2011 - link

    I'm not surprised that part of the article would lead to programming language holy wars, but general benchmarks are utterly useless for Facebook. They should (and surely do) care only about performance of the compiled code and hardware platforms that run the site. Reply
  • bji - Thursday, November 03, 2011 - link

    It's illogical to suggest that an interpreted language like Java or C# could ever approach C++ in speed when the same level of optimization is applied to each.

    In my experience, the least optimized C++ code can sometimes be approximated in performance by the best optimized Java code, depending on the task in question.

    Of course, once you spend time optimizing the C++ code then there is no way for Java to keep up.

    I have never used C# but I expect the result for it would be very similar to Java due to the similar mechanics of the language implementation.

    That being said, in many situations raw speed is not the most important factor, and Java and C# can have significant advantages in terms of mechanism of deployment, programmer productivity, etc, that can make those languages very much the best choice in some situations; which is why they are, in fact, used in those situations in which their advantages are best exploited and their weaknesses are least important.

    I think that Ruby takes the last paragraph even further; Ruby is so ungodly slow that it has to make up for it by allowing extreme productivity gains, and I expect that it must (I've never programmed in it to any significant extent), otherwise it wouldn't have any niche at all.
    Reply
  • data003 - Thursday, November 03, 2011 - link

    While I've lurked this site for many years I just created an account to correct this erroneous bit of fail above.

    1. C# and Java are not interpreted languages. The are compiled at runtime into machine code.

    2. The C# JIT compiler can actually produce more efficient machine code than a compiled C++ binary.

    Since you have never used C# and clearly don't understand how it works, I'd suggest you refrain from commenting on it.
    Reply
  • Jaybus - Friday, November 04, 2011 - link

    I agree that in some cases a JIT compiler can produce more efficient code, particularly when the application lends itself to runtime optimizations, however that is far from typical. Usually, for a single process, the JIT code, once compiled, will be reasonably close, though the static C/C++ code has the edge.

    But that is for the typical case. Facebook is not a typical case. Each web server is constantly starting many, many short-lived processes. Each process must start up its own copy of the code. This is where JIT fails badly to ahead-of-time compilation. It isn't the execution speed of the code after the JIT gets it compiled. The problem is the startup delay. Even with caching, the bytecode still must be compiled at least once for each new process, which in Facebook's case is millions of times. There is no such delay with ahead-of-time compilation. Therefore, Java and C# have no chance of competing in Facebook's environment.
    Reply
  • erwinerwinerwin - Thursday, November 03, 2011 - link

    i wonder whether power consumption justifies them to create a new hardware w/ green power architecture and the cost they spend to having a custom build power supply running on 270volt, if it's only saves about 10-20 percent average of power consumption, rather than lets say make a corporate deal to the best power/performance servers producer on the market and modified it with water cooling (for example)??? Reply

Log in

Don't have an account? Sign up now