Living On The Edge: Intel Launches Xeon D-2100 Series SoCsby Ian Cutress on February 7, 2018 9:00 AM EST
When platforms like Xeon D come into existence, focusing on markets that aren’t consumer focused, it can sometimes be difficult to determine which of the consumer or enterprise features are placed into that product. For example, Intel’s Sub-NUMA Clustering (SNC, an upgraded version of Cluster-On-Die) is used in the Xeon Scalable enterprise processors but not on the consumer focused Core-X processors, despite being the same silicon underneath.
SNC is a technology that is drawn from the processor design: within an 18-core processor design, there is actually an x/y arrangement of nodes, in this case we think a 5x5 arrangement. A node can be a core, it can be for memory controllers, for a PCIe root complex, for other IO, and so on. When data needs to be transferred from one node to another, it goes through the mesh topology in what should be the quickest way possible, depending on other node-to-node traffic. Some of the nodes are duplicated, for example, the PCIe x16 root complex nodes, or the memory controllers: for four memory controllers, they are split into pairs, each pair in a separate node, and each of the nodes are at opposite ends of the silicon design. For example, here is the Skylake-SP 18-core layout:
When a system needs main memory, where that memory is held is considered a unified space: the latency to get to all the data is the same. However, due to the physical design of the core, if the data was held in the memory closest to that core in the mesh grid, it would be quicker to access that memory (on average). What SNC does is divide the silicon at a firmware level into two ‘clusters’, with each cluster having a preference for working with the cores, nodes, and memory controllers within its own cluster. There is nothing stopping it going outside its own cluster, but to offer the best latency (sometimes at the expense of peak bandwidth), it is best for each core/node to be limited in this way. Xeon D customers can typically enable SNC in the BIOS of their system, or arrange with their OEM to have it enabled by default.
The reason why SNC is not available in consumer platforms? The benefits/drawbacks of SNC have very little effect on consumer workloads. In most cases users are not striving to minimise their 99th percentile latency figures, while server environments do need to. Also, to get the best out of SNC, software typically has to be written for it, similar to a multi-socket environment.
The other feature we were interested to see if it made the jump was Intel’s SpeedShift. This technology allows the processor to respond quicker to turbo mode requests, either while in its high-power state or from idle. The standard way a processor works is that when a high-performance power state is requested, the software will send instructions which the operating system will interpret, then the operating system double checks with the firmware for the power state it can ask for, then it will request that power state from the processor. SpeedShift hands control back to the processor, allowing the processor to interpret the frequency and density of the instructions coming into the core, and implement a turbo frequency much quicker.
In previous presentations, Intel has stated that this technology drops the time that the processor moves out of idle into peak turbo from 100 milliseconds down to around 25-30 milliseconds. We confirmed that for suitable OS and hypervisor technologies, SpeedShift will also be enabled on the Xeon D-2100 series platform.
You can read our analysis of Sped Shift on Skylake here:
Some of the key ‘edge’ markets that Intel is targeting with the Xeon D-2100 series require virtualization. In our briefing, Intel did not spend much time discussing this part of the product, but did confirm that the latest implementation of VT-x and hardware virtualization technologies is in play. We were told that due to the upgrades over the previous generation of Xeon D, the new platform ‘enables greater VM density for VNF functions, such as Virtual Evolved Packet Core (vEPC), Virtual Content Delivery Network (vCDN), Virtual GiLAN (vGiLAN), Virtualized Radio Access Network (vRAN), and Virtual Broadband Base Unit (vBBU)’.
We were able to confirm that similar to the enterprise platforms, each core can adjust its frequency independently of the other cores, so in multi-user environments if one user is blasting AVX-512 instructions, the frequency of the other cores can still be maintained. This likely applies to L3 cache management, so that ‘noisy neighbors’ cannot crowd L3 use. This situation is less a problem now that the L3 cache is victim cache, but for some customers it can still be an issue.
Intel stated that it has over a dozen partners, both OEMs and large-scale system integrators, already working with the new D-2100 series ready for product roll-out over 2018. Certain early end-point customers (think the large-scale cloud providers and CDNs) already have had silicon for an amount of time, while it will be rolled out to everyone else in due course through Intel’s partners.
For the last generation, the Xeon D-1500 series, was tentatively given the code name ‘Broadwell-DE’. By that token, this generation of Xeon D-2100 is based on Skylake, so should be ‘Skylake-DE’. However, references to Skylake-D as an alternative have shown up online, perhaps to keep these code names down to one letter. This isn’t to be confused with Skylake for consumer desktop use, which is usually called Skylake-S. Nice and simple.
- The Intel Xeon D-1500 Review: Performance Per Watt Server SoC Champion
- Evaluating Xeon D-1500 on the Supermicro SYS-5028D-TN4T
- Intel Announces Xeon D-1500 Network Series SoCs with QuickAssist
- ASRock Rack Launches Xeon D Motherboards
- New GIGABYTE Server Motherboards Show Xeon D Round 2
- Skylake-D Creeps Out on Intel’s Price List