Sharing Cache and Memory Resources

In a virtualized environment, the hosted VMs are sharing both the CPU caches and the overall DRAM memory bandwidth. One cache-hungry application can quickly hog most of the shared L3 caches, and a bandwidth intensive one can do the same with the available and shared memory bandwidth. These VMs create the "noisy neighbor" problem. That is bad news for anyone consolidating a lot of VMs on top of a Xeon server, but it is complete show stopper for telco and other scenarios where service providers want to guarantee "Quality-of-Service" (QoS) and thus predictable latency. For Intel this is a notable scenario to address, as the telco market is one of the few markets where the Xeons still have some room to grow. Many telco applications still run on proprietary boxes, which makes virtualization a tantalizing option if Intel can deliver the necessary latency. 

Haswell had already some features to monitor cache usage, which in turn allowed you to identify the noisy neighbors. However the "Resource Director Technology" (RDT) of Broadwell can do a lot more. 

RDT can not only monitor L3 cache usage and memory bandwidth, but it can also allocate L3-cache space on a per thread/process/virtual machine basis. Threads are assigned a Resource Monitoring ID. Eight of these RMID are available per core/cache slice. Sixteen different classes of service can be assigned to an RMID: higher priority threads/applications can get a higher class, and thus a larger portion of the L3-cache. 

Intel has already demonstrated an application that made use of these new MSRs to read out memory bandwidth and L3 cache consumption on different levels. 

Broadwell Architecture Improvements TSX and Faster Virtualization
Comments Locked

112 Comments

View All Comments

  • PowerOfFacts - Thursday, June 23, 2016 - link

    And now Oracle marketing speaks. Their HammerDB results are bogus. Oracle continues to site socket results when the majority of the world has moved on to per core results. They cite the results from a 32 core HammerDB then compare it to a 1 chip (1/2 of 1 socket) POWER8 because Phil has a hard-on for how "HE" believes IBM has packaged the processor and similarly chooses an Intel configuration to ensure "THEY" get the result they want. Phil & Oracle (appear) to always speak with forked tongue.
  • patrickjp93 - Sunday, April 3, 2016 - link

    "Best" only at specific scale-up workloads. There's a reason Sparc is not particularly popular for clusters and supercomputing (and it's NOT software compatibility). It sucks at a lot of workloads when compared to x86. As for the SAP benchmarks, that's to be expected since x86 doesn't yet support transactional memories. That changes with Skylake Purley though.
  • Brutalizer - Wednesday, April 6, 2016 - link

    In these 25ish benchmarks, the SPARC M7 is 2-3x faster on all kinds of workloads, not just some specific scale up workloads. The reason SPARC M7 is not popular for clusters (supercomputers are clusters) is not because of low raw compute performance, it is because of cost and wattage. The M7 is much more expensive than x86, and draws much more power. I guess somewhere 250 watt or so? M7 are in big enterprise servers, some have water cooling, etc. Whereas clusters have many cheap nodes, with no water cooling.

    Clusters can have x86 because the highest wattage x86 cpu, uses 140 watt or so. Not more. So it would be feasible to use 140 watt cpus in clusters. But not 250 watt cpus, they draw too much power.

    For instance, the IBM Blue Gene supercomputer that hold spot nr 5 in top500 for a couple of years, used 850 MHz powerpc cpus, when everyone else used 2.4 GHz x86 or so. The 850 MHz cpu dont use lot of power, so that is the reason it was used in Blue Gene, not because it was faster (it wasnt). A large supercomputer can draw 10 MegaWatt, and that costs very much. Power is a huge issue in super computers. SPARC M7 draws too much power to be useful in a large cluster, and costs too much.

    If we talk about raw compute power for SPARC M7, it reaches 1200 SPECint2006, whereas E5-2699v3 reaches 715 SPECint2006. Not really 2-3x faster, but still much faster.
    In SPECfp2006, the M7 reaches 832, whereas the E5-2699v3 reach 474.
    https://blogs.oracle.com/BestPerf/entry/201510_spe...

    So, as you can see yourself, the SPARC M7 is faster on scale-up business workloads (it was designed for that type of workloads) and also faster on raw compute power. And faster in everything in between. Just look at the wide diversity among these 25 ish benchmarks.
  • Brutalizer - Wednesday, April 6, 2016 - link

    BTW, do you really expect a 150 watt x86 cpu, to outperform a 250 watt SPARC M7 cpu? Have you seen benchmarks where they compare 250 watt graphics card vs a 150 watt graphics card? Which GPU do you think is faster? Do you expect a 150 watt GPU to outperform a 250 watt gpu?

    The SPARC M7 has 50% more cores, twice the cpu cache, twice the GHz, twice the Wattage, twice the RAM bandwidth, twice the nr of transistors (10 billions) - and you are surprised it is 2-3x faster than x86?

    BTW, the SPARC M7 has stronger cores than x86. If you look at all these benchmarks, typically one M7 with 32 cores, is faster than two E5-2699v3 with 2x18 = 36 cores. This must mean that one SPARC M7 core, packs more punch than a E5-2699v3 core, because 32 SPARC cores are faster than 36 x86 cores in all benchmarks.
  • adamod - Friday, June 3, 2016 - link

    i know this is an old post but i am confused (this isnt something i have learned much about yet) i am hoping you can help some...if the sparc has 2 to 3x performance and is 250w compared to 140w then wouldnt that make it MORE efficient? and if you need two 2699's to compare to a sparc m7 then wouldnt that be 280w, more than the 250w of the xeons? i realize there are other factors here but this doesnt make sense to me. also yea there are graphics cards that are a lower wattage and perform better...i am an AMD fan but nvidia has had some faster cards with better performance in the past...i have an R9 280X, a mid grade card rated at i believe 225w, kinda crazy when it can get beaten by 17w nvidia cards
  • tqth - Sunday, April 3, 2016 - link

    The SPARC and POWER servers are for people with unlimited pocket where compactness and reliability worth the premium it's spent on. If you have to ask how much it costs, you'd probably can't afford it.
    Xeons are commodity hardware where you could purchase the best bang for your buck.
    They are not aiming at the same market. Most software wouldn't even work on both system.
    Besides, benchmarks are worthless - unless the performance of the specific software is tested. And that's rare.
  • PowerOfFacts - Thursday, June 23, 2016 - link

    Depends on which Xeon processors you are referring to. The latest Broadwell EP & EX chips can cost over $7K each. Well on par if not exceeding POWER8 chips and definitely more than OpenPOWER chips. Times are changing. Intel has milked their clients for a long time feeding them the marketing line of open, commodity & low cost. They are no longer open buying up ecosystem integrating into the silicone, what exactly does commodity mean anyway and as low cost goes ... as I just said, pretty salty.
  • yuhong - Thursday, March 31, 2016 - link

    64GB LR-DIMMs will probably not come out at reasonable prices until 8Gbit DDR4 is more mainstream.
  • iwod - Thursday, March 31, 2016 - link

    I thought Samsung announced a 128GB DIMM with some type of 3D / TSV RAM.
  • Casper42 - Thursday, March 31, 2016 - link

    Not shipping just yet though.
    Should be sometime this year though.

Log in

Don't have an account? Sign up now