System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogeneous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. Intel's Alder Lake processors work closely with the Windows 11 thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a ffmpeg transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

UL PCMark 10 Load Testing - Productivity Scores

UL PCMark 10 Load Testing - Essentials Scores

UL PCMark 10 Load Testing - Gaming Scores

UL PCMark 10 Load Testing - Overall Scores

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) is graphed below.

ffmpeg Transcoding Rate and Processor Usage

Across all the different workloads, we actually find the ASRock Industrial NUC(S) BOX systems having significant drop in performance compared to similar UCFF systems. It leads one to suspect that Thread Director is simply not able to do the appropriate thread allocation in the systems. Whether this is related to any BIOS configuration is something for the company to look into.

ASRock NUCS BOX-1360P/D4 ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 2 9.6 43.5
PCMark 10 0 8.37 31.5
WebXPRT 4 2.5 9.18 18
Cinebench R23 0.5 8.4 29.5
Transcode End Pass 2 9.51 30.5
ASRock NUCS BOX-1360P/D4 (In-Band ECC) ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 1.5 9.28 39.5
PCMark 10 0 8.03 27.5
WebXPRT 4 2 8.9 17.5
Cinebench R23 0.5 8.16 27
Transcode End Pass 1.5 9.21 29

On the positive side, the drop in transcoding frame rate for the NUCS BOX configurations is not as heave as what was seen for other systems.

GPU Performance: Synthetic Benchmarks HTPC Credentials
Comments Locked

30 Comments

View All Comments

  • drajitshnew - Sunday, January 29, 2023 - link

    The in band ECC is an absolutely brilliant idea for systems with 64 GB or more. It is unfortunate that windows does not support it.
  • Samus - Sunday, January 29, 2023 - link

    My understanding is this doesn't need support at the software level. This is still "hardware ECC" and OS-independent.
  • Samus - Sunday, January 29, 2023 - link

    Oh, I see what you are saying. About how Windows will handle an error. In AT's memtest run the test triggered a stop interrupt presumably as it didn't know how to handle the error. I see what you are getting at with Windows.
  • bernstein - Monday, January 30, 2023 - link

    it's more likely, that chrome mandates ecc support, while with windows intel pushes ecc as $$$ feature
  • sjkpublic@gmail.com - Monday, February 13, 2023 - link

    This competes with laptops. Please expand on why ECC is coming up?
  • mode_13h - Tuesday, February 14, 2023 - link

    > Please expand on why ECC is coming up?

    This is sold as an industrial mini-PC. For something like that, reliability is key. Memory errors are one potential source of reliability problems, and ECC is an effective measure to compensate (short-term) and flag for replacement (long-term) any defective memory modules or boards.

    The lore behind ECC is that it protects against cosmic rays, but I've only personally seen ECC errors that seem tied to flaky or failing hardware. It's worthwhile even for that purpose, alone.
  • TLindgren - Sunday, January 29, 2023 - link

    It needs to be noted that SECDED over 512 bit is FAR less powerfull in handling errors than SECDED over 64-bit like regular ECC (or SECDED over 32-bit using DDR5 ECC sticks). They could have instead emulated the SECDED over each 64-bit chunk but then the extra reserved memory would have needed to be 8GB instead of 2GB, and the performance penalty likely would have been sigificantly worse.
    SECDED means it's guaranteed to correct one incorrect bit (SEC) and detect two incorrect bits (DED), no warranties for what happen with more incorrect bits but there's a decent statistical chance it'll detect them (but no chance it'll fix them).
    Obviously getting two or even three+ faulty bits in the same "group" is far more likely over 512-bit compared to 64-bit, in fact it's my understanding that it'll likely happen most of the time given how memory sticks are constructed!
    It's still useful because it'll detect a certain percentage of the multi-bit error so you will often? get told that you that you have faulty memory (except this doesn't seem to work) before things crash which means you know you need to fix the hardware, but the "correct bits" part is unlikely to save you because at least some of the time it'll get multiple wrong bits in the burst. I suspect they would have been better of with just giving up on correcting and aiming for "detect as many bit errors as we can" (probably 3-4 guaranteed bit detected with the 16-bit of extra data per 512bit they choose).
    It's definitely better than no ECC *if* the software support gets improved a bit, but is in no way comparable to "real" ECC. OTOH, it's not priced as that either but it needs to be pointed out because some people will sell it as if it is.
  • ganeshts - Monday, January 30, 2023 - link

    Taken standalone, you arguments are completely sound.

    However, in the bigger picture, you should note that newer memory technologies include link ECC to protect the high-speed communication link between the SoC and the external memory, AND, the DRAM DIMMs themselves implement transparent ECC for the stored data.

    Overall, even mission-critical requirements like ASIL / ISO26262 (for automotive safety) can be met with the requisite FIT (failure-in-time) rate using SECDED protection for 512-bit blocks *assuming those other protection mechanisms are also in place*.

    In-band ECC is also used on Tegra for such embedded applications [ https://twitter.com/never_released/status/13559704... ; I can't seem to dig up the original documentation, but remember this was heavily discussed when the Tegra feature was made public ].
  • ganeshts - Monday, January 30, 2023 - link

    (Correction: DRAM DIMMs -> The memory chips)
  • mode_13h - Tuesday, February 14, 2023 - link

    > you should note that newer memory technologies include link ECC to protect the high-speed communication link between the SoC and the external memory

    Are you saying the system you reviewed also supports traditional out-of-band ECC? Why wasn't that mentioned in the review? If not, then your point would seem to be moot.

    I also don't see the point of using in-band ECC atop OOB ECC. Anything that OOB ECC can't correct doesn't seem like it's going to be correctable by in-band ECC.

Log in

Don't have an account? Sign up now