Updating AnandTech’s 2013 Mobile Benchmark Suite (RFC)by Jarred Walton on January 29, 2013 9:45 PM EST
- Posted in
If it seems like just last year that we updated our mobile benchmark suite, that’s because it was. We’re going to be keeping some elements of the testing, but with the release of Windows 8 We’re looking to adjust other areas. This is also a request for input (RFC = Request for Comments if you didn’t know) from our readers on benchmarks they would like us to run—or not run—specifically with regards to laptops and notebooks.
We used most of the following tests with the Acer S7 review, but we’re still early enough in the game that we can change things up if needed. We can’t promise we’ll use every requested benchmark, in part because there’s only so much time you can spend benchmarking before you’re basically generating similar data points with different applications, and also because ease of benchmarking and repeatability are major factors, but if you have any specific recommendations or requests we’ll definitely look at them.
General Performance Benchmarks
We’re going to be keeping most of the same general performance benchmarks as last year. PCMark 7, despite some question as to how useful the results really are, is at least a general performance suite that’s easy to run. (As as side note, SYSmark 2012 basically requires a fresh OS install to run properly, plus wiping and reinstalling the OS after running, which makes it prohibitively time consuming for laptop testing where every unit comes with varying degrees of customization to the OS that may or may not allow SYSmark to run.) We’re dropping PCMark Vantage this year, mostly because it’s redundant; if Futuremark comes out with a new version of PCMark, we’ll likely add that as well.
At least for the near term, we’re also including results for TouchXPRT from Principled Technologies; this is a “light” benchmark suite designed more for tablets than laptops (at least in our opinion), but it does provide a few other results separate from a monolithic suite like PCMark 7. We’ll also include results from WebXPRT for the time being, though again it seems more tablet-centric. We don’t really have any other good general performance benchmarking suites, so for other general performance benchmarks we’ll return once again to the ubiquitous Cinebench 11.5 and x264 HD. We’re updating to x264 HD 5.x, however, which does change the encoding somewhat, and if a version of x264 comes out with updated encoding support (e.g. for CUDA, OpenCL, and/or Quick Sync) we’ll likely switch to that when appropriate. We’re still looking for a good OpenCL benchmark or two; WinZip sort of qualifies, but unfortunately we’ve found in testing that 7-zip tends to beat it on file size, compression time, or both depending on the settings and files we use.
On the graphics side of the equation, there doesn’t seem to be a need to benchmark every single laptop on our gaming suite—how many times do we need to see how an Ultrabook with the same CPU and iGPU runs (or doesn’t run) games?—so we’ll continue using 3DMark as a “rough estimate” of graphics performance. As with PCMark, we’re dropping the Vantage version, but we’ll continue to use 3DMark06 and 3DMark 11, and we’ll add the new version “when it’s done”. We’re considering the inclusion of another 3D benchmark, CatZilla (aka AllBenchmark 1.0 Beta19), at the “Cat” and “Tiger” settings, but we’d like to hear feedback on whether it makes sense or not.
Finally, we’ll continue to provide analysis of display quality, and this is something we really hope to see improve in 2013. Apple has thrown down the gauntlet with their pre-calibrated MacBook, iPhone, iPad, and iMac offerings; if anyone comes out with a laptop that charges Apple prices but can’t actually match Apple on areas like the display, touchpad, and overall quality, you can bet we’ll call them to the carpet. Either be better than Apple and charge the same, or match Apple and charge less, or charge a lot less and don’t try to compete with Apple (which is a dead-end race to the bottom, so let’s try to at least have a few laptops that eschew this path).
As detailed in the Acer S7 review, we’re now ramping up the “difficulty” of our battery life testing. The short story is that we feel anything less than our previous Internet surfing test is too light to truly represent how people use their laptops, so we’re making that our Light test. For the Medium test, we’ll be increasing the frequency of page loads on our Internet test (from every 60 seconds down to every 12 seconds) and adding in playback of MP3 files. The Heavy test is designed not as a “worst-case battery life” test but rather as a “reasonable but still strenuous” use case for battery power, and we use the same Internet test as in the Medium test but add in looped playback of a 12Mbps 1080p H.264 video with a constant FTP download from a local server running at ~8Mbps (FileZilla Server with two simultaneous downloads and a cap of 500KBps, downloading a list of large movie files).
Other aspects of our battery testing also warrant clarification. For one, we continue to disable certain “advanced” features like Intel’s Display Power Saving Technology (which can adjust contrast, brightness, color depth, and other items in order to reduce power use). The idea seems nice, but it basically sacrifices image quality for battery life, and since other graphics solutions are not using these “tricks” we’re leaving it enabled. We also disable refresh rate switching, for similar reasons—testing 40Hz on some laptops and 60Hz on others isn’t really apples-to-apples. Finally, we’re also moving from 100 nits brightness to 200 nits brightness for all the battery life testing, and the WiFi and audio will remain active (volume at 30% with headphones connected).
In truth, this is the one area where there is the most room for debate. Keep in mind that when testing notebooks, we’re not solely focused on GPU performance most of the time (even with gaming notebooks); the gaming tests are only a subset of all the benchmarks we run. We’ll try to overlap with our desktop GPU testing where possible, but we’ll continue to use 1366x768 ~Medium as our Value setting, 1600x900 ~High as our Mainstream setting, and 1920x1080 ~Max for our Enthusiast setting. Beyond the settings however is the question of which games to include.
Ideally, we’d like to have popular games that also tend to be strenuous on the graphics (and possibly the CPU as well). A game or benchmark that is extremely demanding of your graphics hardware that few people actually play isn’t relevant, and likewise a game that’s extremely popular but that doesn’t require much from your hardware (e.g. Minecraft) is only useful for testing low-end GPUs. We would also like to include representatives of all the major genres—first person shooter/action, role-playing, strategy, and simulation—with the end goal of having ten or fewer titles (and for laptops eight seems like a good number). Ease of benchmarking is also a factor; we can run FRAPS on any game, but ideally a game with a built-in benchmark is both easier to test and produces more reliable/repeatable results. Frankly, at this point we don’t have all that many titles that we’re really set on including, but here’s the short list.
Elder Scrolls: Skyrim: We’ve been using this title since it came out, and while it may not be the most demanding game out there, it is popular and it’s also more demanding (and scalable) than most other RPGs that come to mind. For example, Mass Effect 3 generally has lower quality (also DX9-only) graphics and doesn’t require as much from your hardware, and The Witcher 2 has three settings: High, Very High, and Extreme (not really, but it doesn’t scale well to lower performance hardware). Skyrim tends to hit both the CPU and GPU quite hard, and even with the high resolution texture pack it can still end up CPU limited on some mobile chips. Regardless of our concerns, however, we can’t think of a good RPG replacement, so our intention is to keep Skyrim for another year.
Far Cry 3: This is an AMD-promoted title, which basically means they committed some resources to helping with the games development and/or advertising. In theory, that means it should run better on AMD hardware, but as we’ve seen in the past that’s not always the case. This is a first-person shooter that has received good reviews and it’s a sequel to a popular franchise with a reputation for punishing GPUs, making it a good choice. It doesn’t have a built-in benchmark, so we’ll use FRAPS on this one.
Sleeping Dogs: This is another AMD-promoted title. This is a sandbox shooter/action game with a built-in benchmark, making it a good choice. Yes, right now that's two for AMD and none for NVIDIA, but that will likely change with the final list.
Sadly, that’s all we’re willing to commit to at this point, as all of the other games under consideration have concerns. MMORPGs tend to be a bit too variable, depending on server load and other aspects, so we’re leaving out games like Guild Wars 2, Rift, etc. For simulation/racing games, DiRT: Showdown feels like a step back from DiRT 3 and even DiRT 2; the graphics are more demanding, yes, but the game just isn’t that fun (IMO and according to most reviews). That means we’re still in search of a good racing game; Need for Speed Most Wanted is a possibility, but we’re open for other suggestions.
Other titles we’re considering but not committed to include Assassin’s Creed III, Hitman: Absolution, and DmC: Devil May Cry; if you have any strong feelings for or against the use of those titles, let us know. Crysis 3 will hopefully make the grade this time, as long as there's no funny business at launch or with the updates (e.g. no DX11 initially, and then when it was added the tessellation was so extreme that it heavily favored NVIDIA hardware, even though much of the tessellation was being done on flat surfaces). Finally, we’re also looking for a viable strategy game; Civ5 and Total War: Shogun 2 could make a return, or there are games like Orcs Must Die 2 and XCOM: Enemy Unknown, but we’re not sure if either meet the “popular and strenuous” criteria, so we may just hold off until StarCraft II: Heart of the Swarm comes out (and since that games on “Blizzard time”, it could be 2014 before it’s done, though tentatively it’s looking like March; hopefully it will be able to use more than 1.5 CPU cores this time).
As stated at the beginning, this is a request for comments and input as much as a list of our plans for the coming year. If you have any strong feelings one way or the other on these benchmarks, now is the time to be heard. We’d love to be able to accommodate every request, but obviously there are time constraints that must be met, so tests that are widely used and relevant are going to be more important than esoteric tests that only a few select people use. We also have multiple laptop reviewers (Dustin, Jarred, and occasionally Vivek and Anand), so the easier it is to come up with a repeatable benchmark scenario the better. Remember: these tests are for laptops and notebooks, so while it would be nice to do something like a compilation benchmark, those can often take many hours just to get the right files installed on a system, which is why we’ve shied away from such tests so far. But if you can convince us of the utility of a benchmark, we’ll be happy to give it a shot.
Post Your CommentPlease log in or sign up to comment.
View All Comments
HibyPrime1 - Tuesday, January 29, 2013 - linkThey're trying to get away from JS benchmarks on mobile, I don't think introducing them on another platform is a good idea. Not to mention JS performance on any non-atom/bobcat laptop is more than adequate.
The problem is JS on full voltage x86 CPUs is basically just a browser benchmark, not a hardware benchmark.
I'd like to see cinebench 10 come back. It's by far the most reliable indicator of single threaded performance I've seen - you see almost perfectly linear scaling with clock speed in a given architecture, and comparisons across architectures almost always show what you would expect.
11.5 is just as reliable as far as I can tell. The problem is 11.5 numbers aren't comparable to 10 and for the most part only Cinebench 10 numbers exist for older processors that people are looking to upgrade.
mayankleoboy1 - Wednesday, January 30, 2013 - linkIf you plan to use any JS benchmark, use the Google Octane one only. It is the most comprehensive and most real world becnhmark, as agreed by both Mozilla and google developers.
Kraken, Sunspider are obsolete and optimized in every major browser.
Robohornet, Robohornet Pro are a joke, which google and moz avoid.
Blacksn0w - Tuesday, January 29, 2013 - linkI would like to see a battery test when doing mobile gaming, really any game would probably do, just to give an indication of how long I would be able to do gaming disconnected from the mains, e.g. during a long commute, airplane or train ride.
Really no other background tasks should be run at the same time, as the scenario probably excludes access to any reasonably fast internet connection.
QChronoD - Wednesday, January 30, 2013 - linkI agree that vanilla minecraft isn't the most demanding of games but adding a few graphics mods can bring most systems to their knees. It's still probably one of the more popular OpenGL games, and the fact that its running on java makes it a good test for single core performance.
Optifine, GLSL Shaders and a 128x pack drops the performance on my i7-920 & GF560 from ~80 to ~30. I would imagine that with a more extreme shader (adding godrays, motion blur and DOF) and 256x or 512x pack would start to stress a 680 or 7970.
JarredWalton - Wednesday, January 30, 2013 - linkWe've discussed this in the past, but basically we don't benchmark heavily modded games -- for one, mods are seldom optimized well for all platforms/configurations, and for two it just opens up a huge can of worms. Besides, the number of people playing Minecraft with tons of demanding mods is pretty trivial in comparison to the number playing the largely stock Minecraft.
ltcommanderdata - Wednesday, January 30, 2013 - linkI suggest Max Payne 3 as a RAGE engine proxy for GTA V, which should be of interest to a lot of people. Despite all the angst over whether GTA V is coming to PC, if history is any indication, since it'll be launching for consoles in the spring, it'll be out for PC before the end of the year.
Unigine Heaven could be of some use as a multiplatform benchmark supporting Windows, Mac, and Linux and testing the status of OpenGL drivers vs DirectX drivers in Windows.
F1 2012 is the latest EGO engine game and it's system requirements are slightly higher than Dirt Showdown.
Bioshock Infinite could be something to look forward to.
For OpenCL benchmarks, are the new OpenCL filters in Photoshop CS6 suitable as a benchmark?
riddler9 - Wednesday, January 30, 2013 - linkUE3 Epic Citadel was just released and it has a benchmarking mode too. This benchmark looks like a hardcore gaming benchmark.
JarredWalton - Wednesday, January 30, 2013 - linkErm... are you talking about the version for iOS and Android? Because this article is specifically about laptop/desktop testing. Plus, it sounds like only the Android version will have benchmarking support -- I've heard Apple isn't all that keen on authorizing apps with benchmarking ability, though perhaps that's just an urban legend.
dananski - Wednesday, January 30, 2013 - linkTo be fair to riddler9, I often get confused by the use of the word 'mobile' on this site to refer to devices such as laptops, as opposed to 'mobile phones'. I suppose it's a British thing - "Can I nab yer mobile fer a minute? Gotta ring me nan."
Reading the article before posting helps though ;-)
karasaj - Wednesday, January 30, 2013 - linkI know you briefly mentioned it, but please add heart of the swarm when it is out! If you're curious of the best way to test it, throwing an army against an army at maxed (2v2?) lategame would be best. It's a better indication than anything from single player (I don't remember how you did it last time though)