Sub-NUMA Clustering

When platforms like Xeon D come into existence, focusing on markets that aren’t consumer focused, it can sometimes be difficult to determine which of the consumer or enterprise features are placed into that product. For example, Intel’s Sub-NUMA Clustering (SNC, an upgraded version of Cluster-On-Die) is used in the Xeon Scalable enterprise processors but not on the consumer focused Core-X processors, despite being the same silicon underneath.

SNC is a technology that is drawn from the processor design: within an 18-core processor design, there is actually an x/y arrangement of nodes, in this case we think a 5x5 arrangement. A node can be a core, it can be for memory controllers, for a PCIe root complex, for other IO, and so on. When data needs to be transferred from one node to another, it goes through the mesh topology in what should be the quickest way possible, depending on other node-to-node traffic. Some of the nodes are duplicated, for example, the PCIe x16 root complex nodes, or the memory controllers: for four memory controllers, they are split into pairs, each pair in a separate node, and each of the nodes are at opposite ends of the silicon design. For example, here is the Skylake-SP 18-core layout:

When a system needs main memory, where that memory is held is considered a unified space: the latency to get to all the data is the same. However, due to the physical design of the core, if the data was held in the memory closest to that core in the mesh grid, it would be quicker to access that memory (on average). What SNC does is divide the silicon at a firmware level into two ‘clusters’, with each cluster having a preference for working with the cores, nodes, and memory controllers within its own cluster. There is nothing stopping it going outside its own cluster, but to offer the best latency (sometimes at the expense of peak bandwidth), it is best for each core/node to be limited in this way. Xeon D customers can typically enable SNC in the BIOS of their system, or arrange with their OEM to have it enabled by default.

The reason why SNC is not available in consumer platforms? The benefits/drawbacks of SNC have very little effect on consumer workloads. In most cases users are not striving to minimise their 99th percentile latency figures, while server environments do need to. Also, to get the best out of SNC, software typically has to be written for it, similar to a multi-socket environment.

Intel SpeedShift

The other feature we were interested to see if it made the jump was Intel’s SpeedShift. This technology allows the processor to respond quicker to turbo mode requests, either while in its high-power state or from idle. The standard way a processor works is that when a high-performance power state is requested, the software will send instructions which the operating system will interpret, then the operating system double checks with the firmware for the power state it can ask for, then it will request that power state from the processor. SpeedShift hands control back to the processor, allowing the processor to interpret the frequency and density of the instructions coming into the core, and implement a turbo frequency much quicker.

In previous presentations, Intel has stated that this technology drops the time that the processor moves out of idle into peak turbo from 100 milliseconds down to around 25-30 milliseconds. We confirmed that for suitable OS and hypervisor technologies, SpeedShift will also be enabled on the Xeon D-2100 series platform.

You can read our analysis of Sped Shift on Skylake here:

https://www.anandtech.com/show/9751/examining-intel-skylake-speed-shift-more-responsive-processors

Virtualization

Some of the key ‘edge’ markets that Intel is targeting with the Xeon D-2100 series require virtualization. In our briefing, Intel did not spend much time discussing this part of the product, but did confirm that the latest implementation of VT-x and hardware virtualization technologies is in play. We were told that due to the upgrades over the previous generation of Xeon D, the new platform ‘enables greater VM density for VNF functions, such as Virtual Evolved Packet Core (vEPC), Virtual Content Delivery Network (vCDN), Virtual GiLAN (vGiLAN), Virtualized Radio Access Network (vRAN), and Virtual Broadband Base Unit (vBBU)’.

We were able to confirm that similar to the enterprise platforms, each core can adjust its frequency independently of the other cores, so in multi-user environments if one user is blasting AVX-512 instructions, the frequency of the other cores can still be maintained. This likely applies to L3 cache management, so that ‘noisy neighbors’ cannot crowd L3 use. This situation is less a problem now that the L3 cache is victim cache, but for some customers it can still be an issue.

Availability

Intel stated that it has over a dozen partners, both OEMs and large-scale system integrators, already working with the new D-2100 series ready for product roll-out over 2018. Certain early end-point customers (think the large-scale cloud providers and CDNs) already have had silicon for an amount of time, while it will be rolled out to everyone else in due course through Intel’s partners.

Intel did confirm that it has a sampling program in play for press like AnandTech, so I’m pushing for Johan and Ganesh to get some hands on as we did with the previous generation.

Naming

For the last generation, the Xeon D-1500 series, was tentatively given the code name ‘Broadwell-DE’. By that token, this generation of Xeon D-2100 is based on Skylake, so should be ‘Skylake-DE’. However, references to Skylake-D as an alternative have shown up online, perhaps to keep these code names down to one letter. This isn’t to be confused with Skylake for consumer desktop use, which is usually called Skylake-S. Nice and simple.

Related Reading

Migrating from Broadwell to Skylake-SP Xeon D-2100 Motherboards Appearing: ASRock Rack D2100D8UM
Comments Locked

22 Comments

View All Comments

  • Elstar - Wednesday, February 7, 2018 - link

    I'm not sure what/who the target market is for the D-2191. The core count says "high end", but the TDP, base frequency, DDR frequency, and unique lack of integrated Ethernet is weird. It feels more like an "embedded Xeon-W" than a "Xeon-D".
  • IntelUser2000 - Wednesday, February 7, 2018 - link

    Here's what one article had to say:

    "Looking back to the previous generation, Facebook utilized Mellanox multi-host adapters along with a custom version of the original Xeon D to lower networking costs and improve performance. We suspect that Intel is keenly aware of this and that is a part of the reason for that de-feature move."
  • Elstar - Wednesday, February 7, 2018 - link

    That explains it. And after a few quick searches, I found Open Compute Project PDFs that explain the setup where integrated networking would be pointless. Thanks!
  • Lakados - Wednesday, February 7, 2018 - link

    Always read the fine print:
    Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

    While I can see uses for these, until I see how they run with the patches in place this announcement is garbage.
  • pavag - Wednesday, February 7, 2018 - link

    So, you pay $2400 for Meltdown and Spectre?
  • Hurr Durr - Thursday, February 8, 2018 - link

    You`ve been paying for it for 20 years now without a single peep. You'll buy your Mossad processor and you will like it, goy.
  • prisonerX - Friday, February 9, 2018 - link

    It's strange, I had to change to my AMD system to type "Palestinian genocide/Apartheid" it wouldn't work on my i5 box.
  • Hurr Durr - Saturday, February 10, 2018 - link

    My i5 box always tries to inject something about toxic masculinity and opressive whiteness into every text I type in Word!
  • none12345 - Thursday, February 8, 2018 - link

    Showcasing benchmark results without applying critical patches seems wrong on every level.
  • prisonerX - Friday, February 9, 2018 - link

    Just subtract 30% and you've got it.

Log in

Don't have an account? Sign up now