Finding a Good Fit

The previous benchmarks have shown that the first Calxeda server is not for the general IT market. As the slide below shows, Calxeda targets four kinds of workloads:

  • Web applications
  • Middle-tier applications
  • Offline analytics
  • Storage and file serving

 

For applications such as Memcache, the ECX-1000 1.4GHz lacks bandwidth and memory capacity. Once a Cortex-A15 based server is available, this can change quickly as performance will improve significantly and the amount of memory per CPU can be quadrupled to 16GB.

We did not test it yet, but our own experience tells us that the majority of the "scale out" applications are out of reach. Especially in the financial and risk modeling world, top performance and ultra low response times are prioritized.

Calxeda based Boston servers are already making inroads as storage servers. There is little doubt that a low power processing unit makes a lot of sense in a storage server.

That leaves the question whether or not Calxeda's latest server can make it in the web server and Content Delivery world. Calxeda claims 5W per server node, and no more than 250W for the complete server chassis with 24 server nodes. That's pretty cool, but currently there is another solution. Two octal-core Xeon E5 deliver no less 32 threads running on top of 16 very potent cores. Add a virtualization layer and you get tens of servers. The only limitation is typically the amount of RAM.

So assume you are a hosting provider. Which server do you use as your building block? You've got two choices:

The standard one, the Intel Xeon E5 server. The advantages are excellent performance whenever you need it, whether your application scales well with more threads or not. The Xeon can address up to 384GB of affordable RAM (16GB DIMMs). If that's not enough, 768GB is possible with more expensive LR-DIMMs.

Those are impressive specs, but what if most of your customers just want to host medium sized web sites, sites that are rich on content but rather low on processing requirements? Can the Boston Viridis server attract such users with a much lower power consumption? How far can you go with slicing and dicing the Xeon's monstruous performance into small virtual pieces? We decided to find out.

 

Integer Processing, gcc Our Real World Test
Comments Locked

99 Comments

View All Comments

  • Gigaplex - Tuesday, March 12, 2013 - link

    I wouldn't call that a spectacular performance per watt ratio. It's a bit faster than the Xeon under a cherry picked benchmark (much slower under others), and is only marginally lower power. Best case it's an 80% improvement over Sandy Bridge with regards to performance per watt, and Atom wasn't represented. Considering all the hype, I was expecting something a little more... exciting. Ignoring Ivy Bridge improvements, Haswell isn't far off.
  • spronkey - Tuesday, March 12, 2013 - link

    Yeah... I agree. It also only seems to really come into its own in high concurrency. The Xeons idle quite similarly in terms of power - what happens if you compare it to more Xeon cores? It seems like on a per core basis, Intel still has the advantage on both fronts?
  • spronkey - Tuesday, March 12, 2013 - link

    I would also point out that the A15 has already been compared against Sandy and Ivy cores and come up short in performance per watt; so I'm very interested to see what the next step for these ARM node servers is.
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    I warned against the hype in the first sentences. :-) ARM CPUs are still rather weak and not a good match for most applications. However, the fact that we could actually find a case where they do a lot better than the current Xeon systems was surprising to me.
  • wsw1982 - Wednesday, April 3, 2013 - link

    No, it should not surprise any people regarding how picky the use case is. I mean, I do think you can find a use case the ARM 11 output perform Xeon. E.g. Serving 1 web request per hour :)
  • LogOver - Tuesday, March 12, 2013 - link

    24 servers ran inside 24 VM's on Xeon server, while for ARM server you used the 24 physical server nodes... Hmm... Does not seems to me like apple to apple comparison. Why not to compare, for example, 16 physical nodes on both, xeon and arm servers?
  • haplo602 - Wednesday, March 13, 2013 - link

    And how do you slice the Xeon server into 16 physical nodes ? It does not support any kind of HW partitioning that I am aware of. On the other hand the Calxeda machine is a cluster by design. If you try 16 Xeon nodes you'll go through the roof with power.
  • Colin1497 - Wednesday, March 13, 2013 - link

    I think the question is this:

    Was 24 VM's optimal for the Xeon? Since we're visualizing the Xeon, why 24? Just because you had 24 ARM nodes? Would the Xeon done better with 4VM's? Or 16? Or 1000? 24 seems arbitrary.
  • JohanAnandtech - Wednesday, March 13, 2013 - link

    We tested with 16 as I briefly mentioned in the conclusion. The 2650L did 170 responses/s per VM, or about 40% better. Total Throughput = 2.7k/s, while with 24, 2.9 K/s. THe flexibility that the Xeon has to reduce the number of VMs if higher throughput is necessary is definitely an advantage, but the performance numbers are not that different with different VM configs.
  • Kurge - Wednesday, March 13, 2013 - link

    How about with 0 VM's? Just run it on the metal.

Log in

Don't have an account? Sign up now