CPU Performance: Office and Web

Our previous set of ‘office’ benchmarks have often been a mix of science and synthetics, so this time we wanted to keep our office section purely on real world performance.

Agisoft Photoscan 1.3.3: link

Photoscan stays in our benchmark suite from the previous benchmark scripts, but is updated to the 1.3.3 Pro version. As this benchmark has evolved, features such as Speed Shift or XFR on the latest processors come into play as it has many segments in a variable threaded workload.

The concept of Photoscan is about translating many 2D images into a 3D model - so the more detailed the images, and the more you have, the better the final 3D model in both spatial accuracy and texturing accuracy. The algorithm has four stages, with some parts of the stages being single-threaded and others multi-threaded, along with some cache/memory dependency in there as well. For some of the more variable threaded workload, features such as Speed Shift and XFR will be able to take advantage of CPU stalls or downtime, giving sizeable speedups on newer microarchitectures.

For the update to version 1.3.3, the Agisoft software now supports command line operation. Agisoft provided us with a set of new images for this version of the test, and a python script to run it. We’ve modified the script slightly by changing some quality settings for the sake of the benchmark suite length, as well as adjusting how the final timing data is recorded. The python script dumps the results file in the format of our choosing. For our test we obtain the time for each stage of the benchmark, as well as the overall time.

The final result is a table that looks like this:

(1-1) Agisoft Photoscan 1.3, Complex Test

As explained in the power tests, the 4800U with double the cores wins out here, and due to the vector pressure also wins on power efficiency. There’s still a sizeable uplift from Ice Lake to Tiger Lake at 15 W, although 28 W is needed to get something sizeable.

Mozilla Kraken 1.1

Kraken is a 2010 benchmark from Mozilla and does a series of JavaScript tests. These tests are a little more involved than previous tests, looking at artificial intelligence, audio manipulation, image manipulation, json parsing, and cryptographic functions. The benchmark starts with an initial download of data for the audio and imaging, and then runs through 10 times giving a timed result.

Automation involves loading the direct webpage where the test is run and putting it through. All CPUs finish the test in under a couple of minutes, so we put that as the end point and copy the page contents into the clipboard before parsing the result. Each run of the test on most CPUs takes from half-a-second to a few seconds.

(7-1) Kraken 1.1 Web Test

Both the Tiger Lake results are very fast, not showing much difference between the power modes. Intel pushes ahead of AMD here, and ultimately a sizable jump over Ice Lake.

Google Octane 2.0

Our second test is also JavaScript based, but uses a lot more variation of newer JS techniques, such as object-oriented programming, kernel simulation, object creation/destruction, garbage collection, array manipulations, compiler latency and code execution.

Octane was developed after the discontinuation of other tests, with the goal of being more web-like than previous tests. It has been a popular benchmark, making it an obvious target for optimizations in the JavaScript engines. Ultimately it was retired in early 2017 due to this, although it is still widely used as a tool to determine general CPU performance in a number of web tasks.

Octane’s automation is a little different than the others: there is no direct website to go to in order to run the benchmark. The benchmark page is opened, but the user has to navigate to the ‘start’ button or open the console and initiate the JavaScript required to run the test. The test also does not show an obvious end-point, but luckily does try and aim for a fixed time for each processor. This is similar to some of our other tests, that loop around a fixed time before ending. Unfortunately this doesn’t work if the first loop goes beyond that fixed time, as the loop still has to finish. For Octane, we have set it to 75 seconds per run, and we loop the whole test four times.

(7-2) Google Octane 2.0 Web Test

The Tiger Lake system reaches new records in Optane. If there’s anything this system is fast at, it is web workloads.

Speedometer 2: JavaScript Frameworks

Our newest web test is Speedometer 2, which is a test over a series of JavaScript frameworks to do three simple things: built a list, enable each item in the list, and remove the list. All the frameworks implement the same visual cues, but obviously apply them from different coding angles.

Our test goes through the list of frameworks, and produces a final score indicative of ‘rpm’, one of the benchmarks internal metrics. Rather than use the main interface, we go to the admin interface through the about page and manage the results there. It involves saving the webpage when the test is complete and parsing the final result.

We repeat over the benchmark for a dozen loops, taking the average of the last five.

(7-3) Speedometer 2.0 Web Test

Again, another good win for Tiger Lake.

CPU MT Performance: SPEC 2006, SPEC 2017 CPU Performance: Simulation and Science
Comments Locked

253 Comments

View All Comments

  • huangcjz - Thursday, September 17, 2020 - link

    No, you can say "have to hand" as in something which is available. E.g. "Do you have the presentation to hand?"
  • 29a - Thursday, September 17, 2020 - link

    Wouldn't a non Iris chip be a fairer comparison to Renoir?
  • Kamen Rider Blade - Thursday, September 17, 2020 - link

    AMD's 4800U has a 25 watt mode, Hardware UnBoxed tested it against Intel.

    Why didn't you test it and put those results in the chart?

    Why this biased reviewing of one side gets 15 watt and 28 watt scores.

    Yet AMD isn't allowed to show 25 watt scores?

    What are you afraid of when comparing like for like?
  • IanCutress - Thursday, September 17, 2020 - link

    For us, the 15W to 15W results were the focal point. 28W is there to show a max Intel and look at scaling. Also, The amount of 4800U devices at 25W is minimal.

    Not only that, I'm on holiday. I had to spend two days out, while in this lovely cottage in the countryside, to write 18k words, rather than spend time with my family. I had 4 days with the TGL laptop, and 8 days notice in advance to prepare before the deadline. Just me with a couple of pages from Andrei, no-one else. Still posted the review 30 minutes late, while writing it in a pub as my family had lunch. Had to take the amd laptop with me to test, and it turns out downloading Borderlands 3 in the middle of nowhere is a bad idea.

    Not only that, I've been finishing up other projects last week. I do what I can in the time I have. This review is 21k words and more detailed than anything else out there done by a single person currently in the middle of a vacation. If you have further complains, our publisher's link is at the bottom of the webpage. Or roll your own. What are you afraid of? I stand by my results and my work ethic.
  • PixyMisa - Thursday, September 17, 2020 - link

    I really appreciate the effort. The individual SPEC results are vastly more useful than (for example) a single Geekbench score.
  • Spunjji - Friday, September 18, 2020 - link

    I can second that - I appreciate seeing a breakdown of the strengths/weaknesses of each core design.
  • Kamen Rider Blade - Friday, September 18, 2020 - link

    We appreaciate your hard work, I do watch your YT channel Tech Tech Potato. That being said, if you knew about this issue, with not comparing like for like; then just omit the 28 W scores from the Intel machine and just focus on Intel's 15W vs AMD's 15W.

    Why even include the 28W on the chart? You know how this makes you and Anandtech look, right? The issues of bias towards or against any entity could've been easily avoided if you had "Like for like" scores across the board. That's part of what Steve from Gamers Nexus and many of us enthusiast see's as "Bias Marketing" or "Paid Shilling" to manipulate results in one way or another. Many people can easily interpret your data of not showing "like for like" in many wrong ways when they have no context for it.

    If you didn't want to test AMD's 25 watt scores, nobody would care, just don't bring up Intel's equivalent 28 watt scores. Alot of the more casual readers won't look at the details and they can easily mis-interpret things. I prefer that your good name doesn't get dragged down in mud with a simple omission of certain benchmark figures. I know you wouldn't deliberately do that to show bias towards one entity or another, but will other folks know that?
  • Spunjji - Friday, September 18, 2020 - link

    Presenting the figures he has isn't bias. Bias would be proclaiming Intel to be the winner without noting the discrepancy, or specifically choosing tests to play to the strength of one architecture.

    As it is, the Lenovo device doesn't do a 25W mode, so you're asking him to add a full extra device's worth of testing to an already long review. That's a bit much.

    If you take a look at the 65W APU results and compare them, you'll see a familiar story for Renoir - there's not actually a whole lot of extra gas in the tank to be exploited by a marginally higher TDP. It performs spectacularly well at 15W, and that's that.
  • Kamen Rider Blade - Friday, September 18, 2020 - link

    You can literally just omit the 65W APU, it has no relevance to be on that chart.

    Ok, if that Lenovo LapTop doesn't offer a 25W mode, fine. Maybe Hardware Unboxed got a different model of LapTop for the 4800U. Then don't present Intel's 28W mode.

    That's how people misunderstand things when there is a deliberate omission of information or extra information that the other side doesn't happen. The lack of pure like for like causes issues.
  • Spunjji - Saturday, September 19, 2020 - link

    You're *demanding bias*. They had the Intel device with a 28W mode, 28W figures are a big part of the TGL proposition, so they tested it and labelled it all appropriately. That isn't bias.

    The "lack of pure like for like" only causes issues if you don't really pay attention to what the article says about what they had and how they tested it.

Log in

Don't have an account? Sign up now