System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogeneous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. Intel's Alder Lake processors work closely with the Windows 11 thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a VLC transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. VLC internally uses the x265 encoder, and the settings are configured to allow the CPU usage to be saturated across all cores. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

UL PCMark 10 Load Testing - Productivity Scores

UL PCMark 10 Load Testing - Essentials Scores

UL PCMark 10 Load Testing - Gaming Scores

UL PCMark 10 Load Testing - Overall Scores

Except for the gaming workload which is pretty much unaffected by the CPU loading, the sheer number of cores in the Renoir-based systems help it in salvaging good scores in the presence of concurrent loading.

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

The Tiger Lake-U systems are the top performers here, both in terms of raw scores and minizing performance loss.

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

The large number of cores in the Renoir systems allow it to shuffle around the tasks in such a way that the performance loss for rendering workloads is minimized.

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) for select systems are tabulated below.

VLC Transcoding Rate (Multi-Tasking Test) - Frames per Second
  Enc. Pass #1 PCMark 10 WebXPRT4 Cinebench Enc. Pass #2
ASRock 4X4 BOX-4800U
(Ryzen 7 4800U)
1.6366 1.5167 1.4080 1.5505 1.6073
Intel NUC11TNBi5 (Akasa Newton TN)
(Core i5-1135G7)
0.8662 0.7773 0.7275 0.7773 0.8722
ASRock NUC BOX-1165G7
(Core i7-1165G7)
0.8409 0.8004 0.7230 0.7534 0.8854

The transcoding rates in different systems drop down with simultaneous loading, as expected. The key numbers to note in the above table are the first and second encoding passes for the Newton build. With the numbers actually showing a slight advantage for the second pass, it is clear that there is no throttling at play for this particular workload and duration.

GPU Performance HTPC Workloads
Comments Locked

18 Comments

View All Comments

  • meacupla - Sunday, July 24, 2022 - link

    At that point, you may just want to use a white noise maker
  • abufrejoval - Sunday, July 24, 2022 - link

    Thanks, that's what I've been hearing, too!

    And in a way that's what I've been thinking without hinting it explicitely to Ryan1981: Getting yourself tuned to zero noise is both very expensive and counter-productive.

    Humanity has operated on communal and external noises for hundreds of thousands of years: a large part of our brain is designed to do nothing but discriminate between sounds that indicate danger and those that don't. A total absense of sound only has your brain increase the sensitivity of your receptors to the point where minute sounds become a bother.

    Instead of making electronics completely silient, we should have them emit a soft snore or other comforting noises akin to humans being human.

    There is an auditorium in the midst of Gibraltar's rock, that offers a level of silence no recording studio can match. Anyone left alone in there is bound to develop tinnitus as the brain keeps increasing the sensitivity in your in-ear "DSPs" to the point where they get the "social noise" evolution set as a base line.
  • abufrejoval - Saturday, July 23, 2022 - link

    Your previous Akasa tests had me hoping, that I’d be able to silence any NUC, if a passive Mini-ITX based solution, like the one I’d been using for Gemini Lake Atoms, wasn’t going to be available.

    I had sampled a NUC or Brix once before and was quite shocked at the nervous noise it generated: the fan gave you an audio variant of a CPU graph that you couldn’t just click away. And at top load, it was an unacceptable howler.

    I wanted something with a bit more punch than an Atom, but a similar idle power and obviously a notebook SoC should be able to do that. But the only way to get that stationary and at a reasonable price (with a full complement of RAM) was to get a NUC.

    When I hit across a NUC8i7BEK with the “double sized” Iris 655 48EU iGPU for only €300, my resistance to the form factor melted away and I gave it a try, even if the primary use case—a Linux based HCI server—had zero use for a GPU. After all you never know if it might be recycled as a desktop later and I was just curious to see how this “Apple spec” SoC would perform.

    It turned out that they key to making it unnoticeable was to ensure that the fan would never rev beyond 3200rpm and for that I had to ensure that PL2 would never last longer than 10 seconds nor exceed 50 Watts, while a PL1 of 15 Watts ensured low fan revs even for a power virus.
    I had just ordered another, when I saw a hexa-core i7-10700U based NUC (with a very ordinary 24EU iGPU) going for just €50 extra. So I cancelled and got that one instead. It turned out much more difficult to tame, because Intel was desperate to wring performance leadership out of 14nm in a tiny NUC and only Watts can get you there. I managed again, playing with the PL1/PL2/TAU to get a system rather good for those sprints where the Atoms were trying my patience, yet with a low-enough power and noise footprint to operate 24x7 as a server.

    Half a year later in February 2021 I landed a fresh Tiger Lake NUC11PAHi7, that’s played hard to get ever since. But mine is a Panther Canyon variant, evidently consumer optimized, with a completely different layout of ports for which Akasa doesn’t build a chassis. I don’t know if Intel already made these differentiations in earlier generations, but it’s rather annoying when only the number of models increases, not their availability.

    Again, that Tiger could also be tamed to unnoticeability via the excellent control Intel’s NUCs offer in the BIOS. Of course, even better would be a set of CLI tools which allow you to adjust these things from Linux…

    In terms of snappiness, none of them needs to hide, because at least for a couple of seconds they will all clock to 4.5 GHz or more and match any desktop. For brutal workloads I have other machines with 16 or 18 cores and 140-150 Watts of TDP made tolerable via lots of giant Noctua fans and coolers.

    While there is no noticeable difference in scalar performance between the NUC8 and NUC10, the two extra cores on the NUC10 i7-10700U will obviously deliver a bit of extra punch until TAU runs out. But the Tiger Lake annihilates their value with better IPC: with its four cores it matches pretty exactly the six cores of its predecessor on any parallelized workload while the single core performance is on par with a Zen 3 at the same clocks.

    The “double sized” Iris 655 with its 128MB of eDRAM on the NUC8 turned out to be a paper tiger, effectively adding only 50% of extra power vs. a normal 24EU UHD iGPU at the expense of quite a bit of silicon real-estate and production complexity. If Intel were to sell “Apple spec” chips only, I doubt they’d be nearly as profitable. The list price of an i7-8565U is $409 while the list price of an i7-8559U is only $22 higher. They are close to identical on the CPU side, but the GT3e extra die area and the 128MB eDRAM chip must have cost a pretty penny! I still own a notebook with an i5-6267U, a dual-core Skylake variant of GT3e where the CPU cores were probably the smallest piece of the chip’s silicon pie.

    Really astounding was how badly it got beaten by the 96 Tiger Lake Xe iGPU, which doesn’t have eDRAM for extra bandwidth: that one scaled rather nicely to 4x 24EU performance, beating my Ryzen 3 based 5800U APUs in most benchmarks, just as you describe.

    I don’t really know where that performance is coming from, because DRAM bandwidth is very similar across the board and only around 40GB/s. All my NUCs run with 64GB and while the timings may have gone from DDR4-2400 (NUC8) to DDR4-3200 (NUC11), that’s just adding wait states on these low power devise.

    I love running Google Maps in 3D globe view on Chrome derived browsers at 4k, because it really shows what this low power hardware is capable of with perhaps the most efficient 3D pipeline on the planet: it puts Microsoft’s best flight simulator to shame on an RTX 2080ti!

    It proves the main issue is software, not hardware. But existing real-world games are no fun on these boxes, even the Tiger Lake needs another power of 10 to become reasonably attractive at 4k.

    Another aftermarket NUC solution would evidently be one that adds a beefy active cooling, say a Noctua NH-L9i or even a Noctua NH-L9x65. Obviously these chips could sustain 65 Watts with proper cooling and then deliver quite a reasonable desktop performance in only a slightly bigger form factor.

    BTW: for my use as µ-servers I've added TB3 based 10Gbase-T NICs so the NVMe based SSDs contributing bricks to the Gluster file system don't get slowed down to inacceptable levels.

    I'd have preferred to make do with TB3 based networking via direct connect cables, but fell afoul the fact that Thunderbolt ports don't have MACs and will randomly generate them on every boot or plug event. It's the software.... again!
  • xane - Sunday, July 24, 2022 - link

    Interesting to see continued development, but for me nothing beats Cirrus7 cases from Germany. I do understand it's subjective, though.
  • Hixbot - Tuesday, July 26, 2022 - link

    Ganesh, I've been politely asking you add noise testing to your mini-pc tests for the last couple years. Noise is a very important characteristic to home theater PCs.
    Here we are with a fanless offering with some obvious thermal compromises, but your other reviews don't highlight noise at load and therefore cannot be compared.
  • ganeshts - Tuesday, July 26, 2022 - link

    If there is any noise / electrical coil whine, or anything of that sort, I do make a mention of it in the concluding section (like I did in the Zotac ZBOX CI660 nano).

    Other than that, the ambient noise / noise floor is too high in the environment where these systems are tested for a sound meter to pick up anything at all from them.
  • kepstin - Wednesday, August 24, 2022 - link

    You should really consider retiring/updating that Gimp application startup benchmark… The multithreaded scaling being weird is actually a bug where it's doing extra redundant work that it shouldn't have been, and has been fixed (or at least worked around) in newer versions.
  • storapa - Thursday, September 1, 2022 - link

    Had an old NUC3 with the old version of the Akasa Newton. Worked like a charm for years, until the board died (google results suggests it was a common problem with NUC3, not the case).

    But note that the kensington "lock" doesn't add any security, as you can remove the entire backplate with 4 screws..

Log in

Don't have an account? Sign up now