Following on from the SoC disclosure at Hot Chips, Qualcomm has this week announced the formal launch of its new Centriq 2400 family of Arm-based SoCs for cloud applications. The top processor is a 48-core, Arm v8-compliant design made using Samsung’s 10LPE FinFET process, with 18 billion transistors in a 398mm2 design. The cores are 64-bit only, and are grouped into duplexes – pairs of cores with a shared 512KB of L2 cache, and the top end design will also have 60 MB of L3 cache. The full design has 6 channels of DDR4 (Supporting up to 768 GB) with 32 PCIe Gen 3.0 lanes, support for Arm Trustzone, and all within a TDP of 120W and for $1995.

We covered the design of Centriq extensively in our Hot Chips overview, including the microarchitecture, security and new power features. What we didn’t know were the exact configurations, L3 cache sizes, and a few other minor details. One key metric that semiconductor professionals are interested in is the confirmation of using Samsung’s 10LPE process, which Qualcomm states gave them 18 billion transistors in a 398mm2 die (45.2MTr/mm2). This was compared to Intel’s Skylake XCC chip on 14nm (37.5MTr/mm2, from an Intel talk), but we should also add in Huawei’s Kirin 970 on TSMC 10nm (55MTr/mm2). Today Qualcomm is releasing all this information, along with a more detailed block diagram of the chip.

The chip has 24 duplexes, essentially grouped into sets of four. Connecting them all is a bi-directional segmented ring bus, with a mid-silicon bypass to speed up cross-core transfers. This ring bus is set with 250 GBps of aggregate bandwidth. Shown in the diagram are 12 segments of L3 cache, which means these are shipped with 5 MB each (although there may be more than 5 MB in a block for yield redundancy). This gives a metric of 1.25 MB of L3 cache per core, and for the SKUs below 48 cores the cache is scaled accordingly. Qualcomm also integrates its inline memory bandwidth compression to enhance the workflow, and provides a cache quality of service model (as explained in our initial coverage). Each of the six memory controllers supports a channel of DDR4-2667, with support up to 768GB of memory and a peak aggregate bandwidth of 128 GB/s.

Qualcomm Centriq 2400 Series Centriq 2460 Centriq 2452 Centriq 2434
Cores 48 46 40
Base Frequency 2.2 GHz 2.2 GHz 2.3 GHz
Turbo Frequency 2.6 GHz 2.6 GHz 2.5 GHz
L3 Cache 60.0 MB 57.5 MB 50 MB
DDR4 6-Channel, DDR4-2667
PCIe 32 PCIe 3.0
TDP 120 W 120 W 110 W
Price $1995 $1373 $888

Starting with the chips on offer, Qualcomm will initially provide three different configurations, starting with 40 cores at 2.3 GHz (2.5 GHz turbo), up to 46 and 48 cores both at 2.2 GHz (2.6 GHz turbo). All three chips are somewhat equal, binned depending on active duplexes and cache, with $1995 set for the top SKU. Qualcomm is aiming to attack current x86 cloud server markets on three metrics: performance per watt, overall performance, and cost. In that regard it offered three distinct comparisons, one for each chip:

  • Centriq 2460 (48-core, 2.2-2.6 GHz, 120W) vs Xeon Platinum 8180 (28-core, 2.5-3.8 GHz, 205W)
  • Centriq 2452 (46-core, 2.2-2.6 GHz, 120W) vs Xeon Gold 6152 (22-core, 2.1-3.7 GHz, 140W)
  • Centriq 2434 (40-core, 2.3-2.5 GHz, 110W) vs Xeon Silver 4116 (12-core, 2.1-3.0 GHz, 85W)

Qualcomm provided some SPECint_rate2006 comparisons between the chips, showing Centriq either matching or winning in performance per thread, beating in performance per watt, and up to 4x in performance per dollar. It should be noted that the data for the Intel chips were interpolated from other Xeon chips, except the 8180. Those numbers can be found in our gallery below. 

One interesting bit of data from the launch was the power consumption results provided. As a server or cloud CPU scales to more cores, there will undoubtedly be situations where not all the cores are always drawing power, either due to how the algorithm works or the system is waiting on data. Normally the TDP values are given as a measure of power consumption, despite the actual definition of thermal dissipation requirements – a 120W chip does not always draw 120W, in other words. To this end, Qualcomm provided the average power consumption of the 120W Centriq 2460 while running SPECint_rate2006.

It shows a median power consumption of 65W, peaking just below 100W for hmmer and h264ref. The other interesting point is the 8W idle power, which is indicated as for only when C1 is enabled. With all idle states enabled, Qualcomm claims under 4W for the full SoC. Qualcomm was keen to point out that this includes the IO on the SoC, which requires a separate chipset on an Intel platform.

Any time an Arm chip comes into the enterprise space, thoughts immediately turn to high-performance, and Qualcomm is keen here to point out that while performant, their main goal is to cloud services and hyper-scale, such as scale-out situations, micro-services, containers, and instance-based implementations. At the launch in San Diego, they rolled out quotes from Alibaba, Google, HPE, and Microsoft, all of whom are working closely with Qualcomm for deployment. Demonstrations at the launch event included NoSQL, cloud automation, data analytics with Apache Spark, deep learning, network virtualization, video and image processing, compute-based bioinformatics, OpenStack, and neural networks.

On the software side, Qualcomm is working with a variety of partners to enable and optimize their software stacks for the Falkor design. At Hot Chips, Qualcomm also stated that there are plans in the works to support Windows Server, based on work done with their Snapdragon on Arm initiative, although this seemed to be missing from the presentation. 

Also as a teaser, Qualcomm gave the name of its next-generation enterprise processor. The next design will be called the Qualcomm Firetail, using Saphira cores. (Qualcomm has already trademarked both of those names).

Qualcomm Centriq is now shipping (for revenue) to key customers. We should be on the list for review samples when they become available.

Related Reading



View All Comments

  • HStewart - Friday, November 10, 2017 - link

    I have almost 7 years in OS development and understand Microprocessors a lot - basically todays high processors instructions take use many RISC level instructions to do one of higher level functions.

    I would say the closest Intel Server chip in comparison to this is the 16 Core Atom C3958

    it is a relatively new processor for servers and runs at 31W is designed compete with ARM servers.
  • HStewart - Friday, November 10, 2017 - link

    One more thing, the following link shows the 22 core Xeon beats the Qualcomm 40 core
  • Wilco1 - Friday, November 10, 2017 - link

    Actually the Centric was still a little faster at half the power and a quarter of the cost... So which is the better one in your opinion??? Reply
  • Wilco1 - Friday, November 10, 2017 - link

    You're kidding right? Try building a few large applications for both x64 and AArch64 and compare codesize and static instruction counts. You will be educated.

    And that slow Atom? That has difficulty keeping up with mobile phone cores - it's slower than the 2-way Cortex-A73 and runs at a 30% lower frequency. And it has just 16 cores while it would need well over 100 to get similar performance...
  • Samus - Saturday, November 11, 2017 - link

    Considering the 40 Core model is less than half the price of the 48 core model, wtf is right. Perhaps dual CPU motherboards will surface. And if you were building multiple servers just build one or two additional servers to make up for the 8 core deficiency and save a ton of money overall.

    The pricing is truely insane.
  • Rocket321 - Sunday, November 12, 2017 - link

    These are Enterprise chips,where $1k isn't considered a lot of money. Take a look at Xenon pricing for reference. Or, how much do you think 768GBs of ram costs? Reply
  • andychow - Tuesday, November 14, 2017 - link

    Depends if you are paying a per core or per cpu license. If your cost is $42k/year per cpu, paying an extra grand for an 20% more performance doesn't matter. I'm guessing there is aggressive silicon selection to keep the same power draw. Reply
  • MrSpadge - Friday, November 10, 2017 - link

    Qualcomm clearly put some effort into the SPEC comparison numbers and did nothing outright wrong, as far as I can see. However, the method involves at least one pretty rough step: simply halving the numbers for 2-socket systems. But we know CPUs never scale perfectly from 1 to 2 sockets. There's also the price comparison, where they choose the more expensive dual socket CPUs. The 24 core single socket Epyc might be a far better comparison.

    Source for some of that:
  • edzieba - Friday, November 10, 2017 - link

    The biggest question is the same one as all the past put-a-bunch-of-ARM-cores-on-a-die chips: does anyone actually WANT them? Reply
  • LauRoman - Friday, November 10, 2017 - link

    That's not the actual big issue. How many people can code, adapt code and/or debug code written for it. How does JS or implementations of it scale with the cores, how does it handle databases, how about storage IO and a lot stuff that might be more important than core count. Reply

Log in

Don't have an account? Sign up now