In recent months much has been made over the potential incursion of ARM into Intel’s tightly held server markets, and for good reason. ARM’s general focus on SoCs for consumer devices like phones and tablets has treated the company and its partners well over the past few years, but with continued upwards and outwards growth of the ARM ecosystem, ARM and its partners have been looking to expand into new markets. With that in mind they have turned their eyes towards servers, a somewhat small but very lucrative market that offers much greater profitability than the cutthroat consumer space.

ARM’s leading edge partners like Calxeda have already been toying with the concept, creating a new class of microserver based around utilizing many ARM cores to create a high density, highly threaded server with weak per-thread performance but strong overall performance, an ideal setup for shared hosting and other subsets of sever workloads. Of course ARM’s existing 32bit ARMv7 designs can only go so far, leading to ARM taking a more direct shot across Intel’s bow with the announcement of their new 64bit ARMv8 ISA and parts such as Cortex A57. ARM and their partners believe in the potential of the microserver concept, and ARMv8 will be the ISA that lets them seriously chase the concept. But before they can get that far they must face the 800lb gorilla of the server world: Intel.

Intel for their part jealously guards the server market, looking to hold onto those profitable markets that have driven Intel’s own growth and phenomenal profits. Though Intel’s primary focus has continued to be on their Core architecture derived server parts, the company has also indirectly flirted with the concept of microservers, with ex-customer (and now AMD subsidiary) SeaMicro building one of the first server businesses based around Intel’s Atom processors. SeaMicro may be gone now, but Intel has continued to work on the technology, and with ARM drumming up interest in microservers ahead of their entrance to the market next year, Intel will be making the first move.

To that end, today Intel is launching the Atom S1200 series, Intel’s first Atom processors designed specifically for the server market. Previously going by the codename Centerton, the Atom S1200 series is based around Intel’s existing 32nm Saltwell architecture, utilizing Intel’s low-power SoC-ready CPU cores in a new design better suited for the server market.



Atom S1200 Series, aka Centerton

Centerton is for the most part very similar to Intel’s existing 32nm Cedarview Atoms, operating with a pair of Saltwell cores at between 1.6GHz and 2GHz depending on the specific SKU. The key difference however is that Centerton supports a bevy of server-grade features that the consumer-focused Cedarview did not; Centerton adds support for Intel’s VT virtualization technology, 4 more PCIe lanes (for a total of 8), and most importantly support for ECC memory. Though Intel has not confirmed it, coupled with the fact that Centerton uses a new socket (FCBGA1283), we believe that Centerton is a new Atom design rather than just being a server-branded version of Cedarview. In any case Centerton represents a big step up for Intel in the server market by finally offering an Atom-level processor with the ECC support that server vendors need to offer sever-grade reliability.

The Atom S1200 family will be composed of 3 parts, the S1220, the S1240, and S1260.

Intel Atom Lineup
Model S1220 S1240 S1260 D2700
Codename Centerton Centerton Centerton Cedarview
Core/Thread Count 2/4 2/4 2/4 2/4
Frequency 1.6GHz 1.6GHz 2.0GHz 2.13GHz
L2 Cache 1MB 1MB 1MB 1MB
Max Memory 8GB 8GB 8GB 4GB
Supported Memory DDR3-1333 DDR3-1333 DDR3-1333 DDR3-1066
ECC Yes Yes Yes No
VT-x Yes Yes Yes No
PCIe 2.0 Lanes 8 8 8 4
TDP 8.1W 6.1W 8.6W 10W

As implied by the model number, the S1220 will be Intel’s entry-level part, clocked at 1.6GHz with an 8.1W TDP. Beyond that the family splits a bit. The S1240 will be Intel’s lowest-power part and the part most directly designed to take on ARM designs, clocked at 1.6GHz like the S1220 but operating a full 2W lower at 6.1W. Finally the S1260 will be Intel’s top-performance part, operating 25% faster at 2Ghz with the highest TDP at 8.5W. All of these parts are shipping today; Intel hasn’t given us the specific prices, but pricing starts at $54, presumably for the S1220.

For their part Intel is looking to head off any ARM incursion into servers by not only being the first CPU vendor to release micro/high-density server CPUs with 64bit support and ECC support, but they also intend to hold off ARM by leveraging the existing x86 software ecosystem and software compatibility with their Xeon processors.  This also means that Intel has been able to tap their existing partner network, having secured design wins from Dell, HP, Supermicro, and others. For Intel’s customers this makes the S1200 effectively a continuation of Intel’s existing technology, which can make adoption easier than having to jump to a completely new platform with ARM.

As far as performance goes it will be some time until we have a good idea of how ARM-based processors stack up against the S1200 series, but Intel and HP have already released some generalized performance data comparing the S1200 series to Xeons. As expected, performance is going to be heavily dependent on the nature of the workload, with the S1200 designed for and exceling at heavily threaded, simple tasks, while coming up short in lightly threaded scenarios that need bigger, faster cores. Given the relatively low pricing of these processors, it will be in Intel’s interests to ensure that they are complementary to their existing Xeon processors and not significantly competitive.

Ultimately there is clear customer interest in servers designed to efficiently handle highly-threaded/low-intensity workloads, and with the Atom S1200 series Intel finally has a server-grade product capable of meeting the needs of that market. At the same time microservers are just but one segment of the complete server market and for the foreseeable future Intel’s traditional Xeon processors will remain as Intel’s biggest source of server revenue, but with the ever-increasing emphasis on power efficiency this is not something Intel could have afforded to pass up.

Meanwhile by launching first Intel will get to set the stage for the micro/high-density server market.  The Atom S1200 series is far more important for Intel than just a defense against an ARM incursion into the server market, but at the end of the day that may just be the most important role it plays. ARM has shown that they are a capable competitor, and in turn Intel will need to show why they are called the 800lb gorilla of CPUs.

2013: Avoton and Beyond

Wrapping things up, along with the announcement of the Atom S1200 series, Intel also released a very general roadmap of where they intend to take their new server CPU segment. Centerton is not just a one-off product, but rather the first product in a new range of server CPUs.

In 2013 Intel will release Avoton, Centerton’s successor and based on Intel’s 22nm process. Based on Intel’s previous roadmaps we know that 22nm is also supposed to coincide with the launch of Intel’s new Silvermont architecture, so it’s reasonable to assume that Avoton will be Intel’s Silvermont-based processor for servers. Processors based on the ARMv8 ISA are not expected to launch until late in the year, so it’s possible that ARM and its partners will be going up against Avoton rather than Centerton/S1200.

Beyond that, Intel is also planning a 14nm successor to Avoton in 2014. If Intel finds success in the S1200 series, then this will set up the S1200 series and its successors to be the second prong of Intel’s server CPU offerings, similar to the relationship between Core and Atom today in the consumer space. With the forthcoming release of Haswell we have seen signs that Intel is intending to push Core into some of the space currently occupied by Atom, but all the same this two-pronged approach has worked well enough for Intel’s consumer CPUs, and is something Intel is clearly going to try to replicate in the server space.

POST A COMMENT

35 Comments

View All Comments

  • eanazag - Friday, December 14, 2012 - link

    I was thinking that but...

    $54 per CPU for the cheapest is $540 and 60-80 Watts for 20 cores/40 threads. A suck ass GPU, so no GPU workload. More sockets on a mobo mean a more expensive motherboard.

    It may still make sense to roll with a single Xeon because of the IPC performance. We need system power numbers and a little more details like memory DIMM size (like laptop SODIMM or full sized DIMMs).

    I really think it make more sense for low compute need, dispersed systems. i would snatch one up for a small NAS or home proxy server. UPS needs are lower with this vs. a Xeon.

    What would kick ass is to get these on a PCIe add-on card. And make a desktop blade server. A poor man's blade system for small business.

    I would have rather saw an AMD bobcat variant with ECC.
    Reply
  • bsd228 - Wednesday, December 12, 2012 - link

    As Haswell promises to drop power draws to the 15Watt range as soon as next year, I don't know if there's is a low term point to atom/arm for the server realm. But there is a density option.

    A 6W atom cpu + msata SSD + single NIC system on a board would allow for a blade chassis of incredible density without putting out the heat of a fusion reactor. One use case would be offering colo/web services to people who don't want to share. A single rack could have hundreds to close to perhaps a thousand of these and still be at only 10KW.

    But I have a hard time seeing a use case for singular units that would sell significantly. Certainly there would be a few good spots where saving another 5-10 watts is helpful, but enough to make it worth everyone's time to develop/support?
    Reply
  • Krysto - Thursday, December 13, 2012 - link

    These are server parts. Basically the "same" Atom that goes into mobile devices maybe have 2-4W, but has 8W for servers. The same would go for the Haswell part of Xeon. It would be a lot more power hungry. Reply
  • DanNeely - Thursday, December 13, 2012 - link

    Intel has has 17W dual core IVB Xeon; the same TDP as their coolest mobile IVB parts at the time it was launched. They also have a 45W quad core Xeon; the same TDP as all but one of hteir mobile quadcore chips. At the moment they only offer a single Xeon at each power point; but if the demand is there nothing stops them from offering more choices in the future. The dual core Xeon is actually faster than any of the 17W mobile parts available. Granted there's probably binning involved in; but as long as they don't dominate demand Intel can skim off the top of production to badge as Xeon to absorb the higher TDP of ECC at all but the lowest base levels.

    http://ark.intel.com/products/65735
    http://ark.intel.com/products/65728
    Reply
  • name99 - Saturday, January 05, 2013 - link

    On the one hand, Intel doesn't sell Xeons at $54...
    If a Xeon is the right solution for your task, buy a Xeon --- but expect to pay for it. Just like, if a POWER7 is the right solution for your task, buy it --- but you will pay for it.

    The point of Atom (or server ARM) is not to replace Xeon, it is to be a cheaper alternative for those (frequent) cases where a Xeon is way more CPU than is needed, as I described above.

    The REAL issue you all should be focussing on is not the quality of these CPUs (as measured by things like CPI), it is the quality of the memory system.

    At the memory controller level, who is doing the better job of predicting which pages should be opened or closed?

    Who is using advanced academic (not yet commercial) techniques like Idle Rank Prediction (the idea here is when you have cache write back, you don't write immediately, you store the write in a buffer and use a prediction mechanism to predict when the RAM is likely to be idle, so your write does not block a read).

    There are even fancier things you can do like much more sophisticated rank/bank mapping. Don't just extract bits from the address. If you have, say, 32 banks available, don't just extract bits from the address for chip-routing, use a divide-by-31. This gives you 31 virtual banks which will have their addresses effectively scrambled over 32 physical banks, so that long stride sequential accesses don't keep hitting the same bank (just like Cray was doing back in the 1970s).

    Even at the basic level, does ARM have something equivalent to Intel's SMI/SMB? (This splits the memory controller into a part on the main CPU, doing the real thinking, connected via a custom very high speed bus to a baby memory controller sitting right next to the actual DIMMs. The point of this is that the physical path from the DIMMs to the baby memory controller is made as short as possible, which means the DIMM bus can be run at higher speeds. The long run from the DIMMs to the CPU goes over the custom bus, which has separate read and write lanes and so can be way faster.)
    And if not, what is their plan here? One solution (IMHO) would be for ARM to get heavily into packaging, and to encourage that their server chips are all sold with DRAM stacked PoP on the chip.

    The point is, the days of decoupled CPU and memory are gone --- you cannot drive a decoupled bus fast enough. ARM needs a server solution to this fact which plays to ARM's strengths, not a mindless copying of the way Intel and AMD and IBM handle this (which reflect there rather different markets and strengths).
    Reply
  • name99 - Saturday, January 05, 2013 - link

    "I'm confused by Atom (or ARM) for servers though...what are they good for?"

    SOME computers do a substantial amount of computation. You want those computers to have powerful execute cores.

    But OTHER computers mainly route data from disks to memory to the network. Computers like this have little need for a powerful execute core, since most of their time is spent waiting on memory. They are best served by having a sophisticated memory controller (which can predict accurately which DRAM pages to open or close) hooked up to a number of independent threads of execution, each of which can just pause while it waits on RAM.

    Atom and ARM in servers target this latter market.
    IBM does so as well, through a very interesting design. Hooked up a massively powerful memory system (way beyond what we are talking here) a POWER7 core consists of two sub cores, each of which can run two threads. For compute-intensive tasks, a core runs one thread which uses the resources of both subcores (so it has two integer units, two load-store units, two FPUs available to it). For memory intensive tasks, each subcore runs two threads, so you have four threads per core, and you're back in the world I described above --- each thread spends most of its time waiting, and the memory system is 100% busy serving up data to each waiting thread.

    You will NEVER understand the design of modern CPUs until you realize that computation is cheap, memory access is expensive, so everything is twisted around making your memory work as hard as possible; and if that means your logic is idle 90% of the time, well, that's life.
    Reply
  • SunLord - Wednesday, December 12, 2012 - link

    So this is basically meant for low load servers that don't have to do much real work ever and instead of letting a powerful Xeon server waste power while they idle along under very low load you use a low power atom server at higher load levels?

    It does make sense as long as the load never changes or increases much then an atom will be fine but a xeon server would be able to scale far better across all sorts of different workloads if there any possible future changes. I guess I can see a place at the very low end or specialized projects where things won't change much.

    I'd really like the list of the "light" server apps by name where an atom can beat a xeon as the hp graphic implies with it's performance per watt as things scale out but I have to wonder how the per dollar aspect comes into play when you start scaling up with multiple servers and what loads/app scale out across multiple atom servers vs xeon servers .
    Reply
  • name99 - Saturday, January 05, 2013 - link

    "It does make sense as long as the load never changes or increases much then an atom will be fine but a xeon server would be able to scale far better across all sorts of different workloads if there any possible future changes"

    (a) Don't throw around words like "scaling" without thinking what they mean. If your scaling dimensions are along the axes of disk or network, the CPU doesn't matter much.
    If your workload is dominated by memory performance, then faster CPU doesn't matter much.

    (b) The BIG boys (Apple, Google, MS, Facebook, etc) aren't buying servers on the idea that "well, let's get something fast and we'll figure out ways to use it". They have very definite roles that are played by different machines --- some are computer servers, some are cache servers, some are network servers, some are IO servers. These ARM and Atom CPus are targeted at all these roles except compute server.
    Reply
  • DuckieHo - Wednesday, December 12, 2012 - link

    Why only 8GB if they already did the work for 64-bit support? Support for ECC registered 2x16GB would make it compelling for MemCached, NoSQL, and other distributed memory-capacity intensive workloads. Reply
  • Kevin G - Wednesday, December 12, 2012 - link

    The memory controller likely does not support registered memory which would limit DIMM size to 8 GB. Being single channel that limits the memory controller to two DIMM slots. That does raise the question of why it does not support two 8 GB unbuffered ECC DIMMs for a total of 16 GB. Reply

Log in

Don't have an account? Sign up now