A Closer Look at the Server Node

We’ve arrived at the heart of the server node: the SoC. Calxeda licensed ARM IP and built its own SoC around it, dubbed the Calxeda EnergyCore ECX-1000 SoC. This version is produced by TSMC at 40nm and runs at 1.1GHz to 1.4GHz.

Let’s start with a familiar block on the SoC (black): the external I/O controller. The chip has a SATA 2.0 controller capable of 3Gb/s, a General Purpose Media Controller (GPMC) providing SD and eMMC access, a PCIe controller, and an Ethernet controller providing up to 10Gbit speeds. PCIe connectivity cannot be used in this system, but Calxeda can make custom designs of the "motherboard" to let customers attach PCIe cards if requested.

Another component we have to introduce before arriving at the actual processor is the EnergyCore Management Engine (ECME). This is an SoC in its own right, not unlike a BMC you’d find in conventional servers. The ECME, powered by a Cortex-M3, provides firmware management, sensor readouts and controls the processor. In true BMC fashion, it can be controlled via an IPMI command set, currently implemented in Calxeda’s own version of ipmitool. If you want to shell into a node, you can use the ECME's Serial-over-LAN feature, yet it does not provide any KVM-like environment; there simply is no (mouse-controlled) graphical interface.

The Processor Complex

Having four 32-bit Cortex-A9 cores, each with 32 KB instruction and 32 KB data L1 per-core caches, the processor block is somewhat similar to what we find inside modern smarphones. One difference is that this SoC contains a 4MB ECC enabled L2 cache, while most smartphone SoCs have a 1MB L2 cache.

These four Cortex-A9 cores operate between 1.1GHz and 1.4GHz, with NEON extensions for optimized SIMD processing, a dedicated FPU, and “TrustZone” technology, comparable to the NX/XD extension from x86 CPUs. The Cortex-A9 can decode two instructions per clock and dispatch up to four. This compares well with the Atom (2/2) but of course is nowhere near the current Xeon "Sandy Bridge" E5 (4/5 decode, 6 issue). But the real kicker for this SoC is its power usage, which Calxeda claims to be as low as 5W for the whole server node under load at 1.1GHz and only 0.5W when idling.

The Fabric

The last block in the Soc is the EC Fabric Switch. The EC Fabric switch is an 8X8 crossbar switch that links to five XAUI ports. These external links are used to connect to the rest of the fabric (adjacent server nodes and the SFPs) or to connect SATA 2 ports. The OS on top of server nodes sees two 10Gbit Ethernet interfaces.

As Calxeda advertises their offerings with scaling out as one of the major features, they have created fast and high volume links between each node. The fabric has a number of link topology options and specific optimizations to provide speed when needed or save power when the application does not need high bandwidth. For example, the links of the fabric can be set to operate at 1, 2.5, 5 and 10Gb/s.

A big plus for their approach is that you do not need expensive 10Gbit top-of-rack switches linking up each node; instead you just need to plug in a cable between two boxes making the fabric span across. Please note that this is not the same as in virtualized switches, where the CPU is busy handling the layer-2 traffic; the fabric switch is actually a physical, distributed layer-2 switch that operates completely autonomously—the CPU complex doesn’t need to be powered on for the switch to work.

It's a Cluster, Not a Server Software Support & The ARM Server CPU
Comments Locked

99 Comments

View All Comments

  • thenewguy617 - Wednesday, March 13, 2013 - link

    I would like to see the results with the website running on bare metal. I would like to, but I don't believe you when you say the virtualization overhead is minimal.
    Also, did you include the power used by the switch? as we scale the xeon cluster we will add a lot of cost and power in the network, however Calxeda fabric should scale for free.
  • thebeastie - Thursday, March 14, 2013 - link

    I think a lot of you are missing the main point or future potential of this server technology. And that is that intel like to make an absolute minimum of $50 per CPU they make, in server CPUs it's more like $300.

    These Arm CPUs are being sold at around $10 a CPU.
    Sure Caldexa have gone the hard yards making such a server and want a lot of money for it. BUT once these ARM servers are priced in relative context of their actual CPu costs its going to be the biggest bomb drop on Intels sever profits in history.
  • Silma - Thursday, March 14, 2013 - link

    Assuming you are right and ARM is becoming so important that it can't be ignored, what's to prevent Intel to produce and sell ARM itself? In fact what's to prevent Intel to produce the best ARM socs as it has arguably the best fabs?
    There are rumors that Apple is asking Intel to produce procs for them, this would certainly be very interesting if it proves to be true.
  • thebeastie - Friday, March 15, 2013 - link

    The fact that Intel would practically look at other businesses then produce SoC/CPUs for $10 each, x86 or ARM based doesn't matter in the face of such high portability of code.
  • Metaluna - Friday, March 15, 2013 - link

    The problem is that ARM cores are pretty much a commodity, so ARM SoC pricing is inevitably going to end up as a race to the bottom. This could make it difficult for Intel to sustain the kind of margins it needs to keep it's superior process R&D efforts going. Or at least, it would need to use its high-margin parts to subsidize R&D for the commodity stuff which could get tricky given the overall slowing of the market for the higher end processors. I think this is what's happening with the supposed Apple deal. There have been reports that they have excess capacity at 22nm right now so it makes sense to use it. And, since Apple only sells its processors as part of its phones and tablets, it doesn't directly compete with x86 on the open market.

    Of course, all the other fabs are operating under the same cost constraints, so there would be an overall slower pace of process improvements (which is happening anyway as we get closer to the absolute limits at <10nm).
  • wsw1982 - Wednesday, April 3, 2013 - link

    And so does those companies, run into bottom. What can they do to even their R&D, by put the server chip into mobile phone?
  • Krysto - Monday, March 18, 2013 - link

    Yup. This is actually Intel's biggest threat by far. It's not the technical competition (even though Intel's Atom servers don't seem nearly as competitive as these upcoming ARM servers), but the biggest problem by far for them will be that they will have to compete with the dozen or so ARM server companies on price, while having more or less the same performance.

    THAT is what will kill Intel in the long term. Intel is not a company built to last on Atom-like profits (which will get even lower once the ARM servers flood the market). And they can forget about their juicy Core profits in a couple of years.
  • wsw1982 - Wednesday, April 3, 2013 - link

    So your argument is because the ARM solution is more expensive than Intel solution now, therefore it must be cheaper than Intel solution in the feature? The mobile ARM is cheap, so does the Intel mobile chips.
  • Silma - Thursday, March 14, 2013 - link

    1300$ difference / server, that's a lot electricity you have to spare to justify the cost, especially as it is better that Xeon servers only in a few chosen benchmarks.

    Can't see how this is interesting in production environment.
    It's more for testing / experimenting I guess;
  • Wilco1 - Thursday, March 14, 2013 - link

    The savings are more than just electricity cost, you also save on cooling costs and can pack your server room more densely. If you do a TCO calculation over several years it might well turn out to be cheaper overall.

    This is the first ARM server solution, so it's partly to get the software working and test the market. However I was surprised how competitive it is already, especially when you realize they use a relatively slow 40nm Cortex-A9. The 2nd generation using 28nm A15 will be out in about 6 months, if they manage to double performance per core at similar cost and power then it will look even better.

Log in

Don't have an account? Sign up now