Qualcomm Launches 48-core Centriq for $1995: Arm Servers for Cloud Native Applications
by Ian Cutress on November 10, 2017 6:30 AM EST- Posted in
- CPUs
- Arm
- Qualcomm
- Enterprise
- SoCs
- ARMv8
- Centriq
- Centriq 2400
- Cloud
- Falkor
Following on from the SoC disclosure at Hot Chips, Qualcomm has this week announced the formal launch of its new Centriq 2400 family of Arm-based SoCs for cloud applications. The top processor is a 48-core, Arm v8-compliant design made using Samsung’s 10LPE FinFET process, with 18 billion transistors in a 398mm2 design. The cores are 64-bit only, and are grouped into duplexes – pairs of cores with a shared 512KB of L2 cache, and the top end design will also have 60 MB of L3 cache. The full design has 6 channels of DDR4 (Supporting up to 768 GB) with 32 PCIe Gen 3.0 lanes, support for Arm Trustzone, and all within a TDP of 120W and for $1995.
We covered the design of Centriq extensively in our Hot Chips overview, including the microarchitecture, security and new power features. What we didn’t know were the exact configurations, L3 cache sizes, and a few other minor details. One key metric that semiconductor professionals are interested in is the confirmation of using Samsung’s 10LPE process, which Qualcomm states gave them 18 billion transistors in a 398mm2 die (45.2MTr/mm2). This was compared to Intel’s Skylake XCC chip on 14nm (37.5MTr/mm2, from an Intel talk), but we should also add in Huawei’s Kirin 970 on TSMC 10nm (55MTr/mm2). Today Qualcomm is releasing all this information, along with a more detailed block diagram of the chip.
The chip has 24 duplexes, essentially grouped into sets of four. Connecting them all is a bi-directional segmented ring bus, with a mid-silicon bypass to speed up cross-core transfers. This ring bus is set with 250 GBps of aggregate bandwidth. Shown in the diagram are 12 segments of L3 cache, which means these are shipped with 5 MB each (although there may be more than 5 MB in a block for yield redundancy). This gives a metric of 1.25 MB of L3 cache per core, and for the SKUs below 48 cores the cache is scaled accordingly. Qualcomm also integrates its inline memory bandwidth compression to enhance the workflow, and provides a cache quality of service model (as explained in our initial coverage). Each of the six memory controllers supports a channel of DDR4-2667, with support up to 768GB of memory and a peak aggregate bandwidth of 128 GB/s.
Qualcomm Centriq 2400 Series | |||
AnandTech.com | Centriq 2460 | Centriq 2452 | Centriq 2434 |
Cores | 48 | 46 | 40 |
Base Frequency | 2.2 GHz | 2.2 GHz | 2.3 GHz |
Turbo Frequency | 2.6 GHz | 2.6 GHz | 2.5 GHz |
L3 Cache | 60.0 MB | 57.5 MB | 50 MB |
DDR4 | 6-Channel, DDR4-2667 | ||
PCIe | 32 PCIe 3.0 | ||
TDP | 120 W | 120 W | 110 W |
Price | $1995 | $1373 | $888 |
Starting with the chips on offer, Qualcomm will initially provide three different configurations, starting with 40 cores at 2.3 GHz (2.5 GHz turbo), up to 46 and 48 cores both at 2.2 GHz (2.6 GHz turbo). All three chips are somewhat equal, binned depending on active duplexes and cache, with $1995 set for the top SKU. Qualcomm is aiming to attack current x86 cloud server markets on three metrics: performance per watt, overall performance, and cost. In that regard it offered three distinct comparisons, one for each chip:
- Centriq 2460 (48-core, 2.2-2.6 GHz, 120W) vs Xeon Platinum 8180 (28-core, 2.5-3.8 GHz, 205W)
- Centriq 2452 (46-core, 2.2-2.6 GHz, 120W) vs Xeon Gold 6152 (22-core, 2.1-3.7 GHz, 140W)
- Centriq 2434 (40-core, 2.3-2.5 GHz, 110W) vs Xeon Silver 4116 (12-core, 2.1-3.0 GHz, 85W)
Qualcomm provided some SPECint_rate2006 comparisons between the chips, showing Centriq either matching or winning in performance per thread, beating in performance per watt, and up to 4x in performance per dollar. It should be noted that the data for the Intel chips were interpolated from other Xeon chips, except the 8180. Those numbers can be found in our gallery below.
One interesting bit of data from the launch was the power consumption results provided. As a server or cloud CPU scales to more cores, there will undoubtedly be situations where not all the cores are always drawing power, either due to how the algorithm works or the system is waiting on data. Normally the TDP values are given as a measure of power consumption, despite the actual definition of thermal dissipation requirements – a 120W chip does not always draw 120W, in other words. To this end, Qualcomm provided the average power consumption of the 120W Centriq 2460 while running SPECint_rate2006.
It shows a median power consumption of 65W, peaking just below 100W for hmmer and h264ref. The other interesting point is the 8W idle power, which is indicated as for only when C1 is enabled. With all idle states enabled, Qualcomm claims under 4W for the full SoC. Qualcomm was keen to point out that this includes the IO on the SoC, which requires a separate chipset on an Intel platform.
Any time an Arm chip comes into the enterprise space, thoughts immediately turn to high-performance, and Qualcomm is keen here to point out that while performant, their main goal is to cloud services and hyper-scale, such as scale-out situations, micro-services, containers, and instance-based implementations. At the launch in San Diego, they rolled out quotes from Alibaba, Google, HPE, and Microsoft, all of whom are working closely with Qualcomm for deployment. Demonstrations at the launch event included NoSQL, cloud automation, data analytics with Apache Spark, deep learning, network virtualization, video and image processing, compute-based bioinformatics, OpenStack, and neural networks.
On the software side, Qualcomm is working with a variety of partners to enable and optimize their software stacks for the Falkor design. At Hot Chips, Qualcomm also stated that there are plans in the works to support Windows Server, based on work done with their Snapdragon on Arm initiative, although this seemed to be missing from the presentation.
Also as a teaser, Qualcomm gave the name of its next-generation enterprise processor. The next design will be called the Qualcomm Firetail, using Saphira cores. (Qualcomm has already trademarked both of those names).
Qualcomm Centriq is now shipping (for revenue) to key customers. We should be on the list for review samples when they become available.
37 Comments
View All Comments
LemmingOverlord - Friday, November 10, 2017 - link
@Ian, is Johan testing it?Ian Cutress - Friday, November 10, 2017 - link
He hasn't got one yet. We should be near the top of the list when they send them out to press though.IGTrading - Friday, November 10, 2017 - link
HotHardware already got some tests on their own platform, CloudFlare.The results are impressive considering that Intel's CPU use exactly 200% of the power used by Qualcomm's new chip to achieve less performance.
For almost all benches, Qualcomm wins while using half the power and having a lower processor price.
Intel doesn't look good at all in CloudFlare.
IGTrading - Friday, November 10, 2017 - link
Link : https://hothardware.com/news/qualcomm-centriq-2400...cekim - Friday, November 10, 2017 - link
Um, that article indicates they used "threads" not cores for the xeon side:"Xeon processors from Intel. In the Intel corner, we have the Grantley platform (Broadwell) using two 10-core Xeon processors with Hyper-Threading enabled (40 cores) and Purley (Skylake) using two 12-core Xeon processors with Hyper-Threading (48 cores)."
Garbage test - eagerly awaiting some real data.
Notmyusualid - Friday, November 10, 2017 - link
Yep, ddriver 2.0.Wilco1 - Friday, November 10, 2017 - link
Wait - are you saying that switching hyperthreading off will improve the Xeon throughput?cekim - Tuesday, November 14, 2017 - link
No, but threads do not provide the same throughput as an additional real core. 16 threads != 16 cores in terms of compute power. That the ARM still beat it in terms of power consumption though suggests this could get interesting, but its a garbage benchmark for lack of controls or evident understanding of the hardware in question.Krysto - Friday, November 10, 2017 - link
I think calling it a $2,000 chip is a mistake, because clearly the $2,000 version is a poor value compared to the others. For only 2 fewer cores, you can get it for $600 less. For 8 fewer cores, you can get it for $1100 less.The 2452 seems like the best value by far. You get 40 cores compared to 16c/32t for AMD EPYC 7301 (which costs about $800, too), and I'm guessing the performance should be at least as good overall, if not like +50% better.
As for Intel, no contest. Intel's 16 core Xeons start at around $2,000, so perf/$ should be at least 2-3x in Qualcomm's favor.
Krysto - Friday, November 10, 2017 - link
Err. I meant the 2432 version is the best value. I wish chip companies would stop using such confusing codenames (likely on purpose). Anyways, I'd say this is a pretty good start from Qualcomm. If they stick to using the cutting edge process from Samsung or TSMC, rather than wait like 2 years, as their Arm server competitors were doing before, I think they can become a decent competitor in the server chip market. Also, the next-generation should be built using TSMC's 7nm process, so Qualcomm may have an even more competitive chip then. Plus, they would have gotten a chance to learn what the market is actually looking for and better optimize the chip for their customers' wishes.Also, for anyone wondering why I compared the cores this way, I was kind of basing it on Cloudflare's review of it. Google "Cloudflare Arm takes wing" to check out their review, as Anandtech doesn't allow links here.