Real World Performance at 3 GHz

For our generational testing, we took each of the four main processors in this test and adjusted their CPU frequencies in the BIOS to 3 GHz. This was achieved through a 30x multiplier and 100 MHz base frequency, which for each processor is a reduction from the stock speeds. We set each CPU to perform at 3 GHz only to fix the frequency, and ran the memory in each case at the maximum supported frequency by the processor. Some benchmarks in the generational tests will probe the memory, and an upgrade in the memory controller to support higher frequencies (officially) than an older processor is, a generational upgrade, as important as the core or cache performance.

AMD CPUs
  µArch /
Core
Cores Base
Turbo
TDP DDR3 L1 (I)
Cache
L1 (D)
Cache
L2
Cache
Athlon
X4 845
Excavator
Carrizo
4 3500
3800
65 W 2133 192KB
3-way
128KB
8-way
2 MB
16-way
 
Athlon
X4 860K
Steamroller
Kaveri
4 3700
4000
95 W 1866 192KB
3-way
64KB
4-way
4 MB
16-way
 
Athlon
X4 760K
Piledriver.v2
Richland
4 3800
4100
100 W 1866 128KB
2-way
64KB
4-way
4 MB
16-way
 
Athlon
X4 750K
Piledriver
Trinity
4 3400
4000
100 W 1866 128KB
2-way
64KB
4-way
4 MB
16-way

Speaking of cache, as mentioned at the beginning of this review, the Athlon X4 845 has a significant advantage in the L1 cache layout, affording a 2x size L1 data cache along with a move from 4-way to 8-way associativity. Each of these methods, as a broad rule of thumb, typically decreases the cache miss rate by a factor of 1.414 (square root of 2x). Combined should see a factor two decrease in cache misses overall, and this will affect a number of benchmarks when we compare each processor at a fixed frequency. On the other side of the equation, the L2 cache for the X4 845 is half that of the X4 860K, meaning that if the data is not in the L1, it is less likely to be in the L2, which will add additional latency.

Dolphin Benchmark: link

Many emulators are often bound by single thread CPU performance, and general reports tended to suggest that Haswell provided a significant boost to emulator performance. This benchmark runs a Wii program that raytraces a complex 3D scene inside the Dolphin Wii emulator. Performance on this benchmark is a good proxy of the speed of Dolphin CPU emulation, which is an intensive single core task using most aspects of a CPU. Results are given in minutes, where the Wii itself scores 17.53 minutes.

Dolphin Emulation Benchmark

Emulation takes cues from a high IPC and base frequency, however for our generational testing it is all about the microarchitecture. The Carrizo has a 9% advantage here over the Kaveri.

WinRAR 5.0.1: link

Our WinRAR test from 2013 is updated to the latest version of WinRAR at the start of 2014. We compress a set of 2867 files across 320 folders totaling 1.52 GB in size – 95% of these files are small typical website files, and the rest (90% of the size) are small 30 second 720p videos.

WinRAR 5.01, 2867 files, 1.52 GB

WinRAR enjoys memory bandwidth with its variable workload, and seemingly the Kaveri has a strong showing here. The Carrizo only has 2MB of L2 cache, which most likely puts it at a disadvantage.

3D Particle Movement v2

The second version of this benchmark is similar to the first, however it has been re-written in VS2012 with one major difference: the code has been written to address the issue of false sharing. If data required by multiple threads, say four, is in the same cache line, the software cannot read the cache line once and split the data to each thread - instead it will read four times in a serial fashion. The new software splits the data to new cache lines so reads can be parallelized and stalls minimized. As v2 is fairly new, we are still gathering data and results are currently limited.

3D Particle Movement v2.0 beta-1

We saw this in our laptop Carrizo testing: if we adjust the software to avoid false sharing (which decreases performance), the Excavator microarchitecture pulls a significant lead in 3DPMv2. Part of this is most likely down to the larger L1 data cache as well.

Web Benchmarks

On the lower end processors, general usability is a big factor of experience, especially as we move into the HTML5 era of web browsing. 

WebXPRT 2013

WebXPRT

This benchmark can be memory intensive, as it draws various graphs and applies filters to pictures, among other things. The lower L2 cache hurts here.

Google Octane v2

Google Octane v2

In contrast, Octane attempts to stay as close to the execution ports as possible, and the Carrizo cores take an 18% lead over Kaveri.

Benchmark Overview Performance at 3 GHz: Office
Comments Locked

131 Comments

View All Comments

  • lefty2 - Thursday, July 14, 2016 - link

    I'm predicting Bristol Ridge will be just as bad a failure as Carrizo. I.e. the few design wins will only have single DIMM memory and be universally unavailable, buried somewhere in a dark corner of the OEM's website. It's a pity, because both SoCs are very good in their own right.
  • nandnandnand - Thursday, July 14, 2016 - link

    If it's not Zen, it can be thrown straight in the garbage.
  • Samus - Friday, July 15, 2016 - link

    I still rock a few Kaveri desktops and they are incredibly powerful for the price. The 860K is half the cost of a comparable Intel chip, which supporting faster memory and a lower cost platform.

    Carizo on the desktop is an anomaly. I'd like to see what it could do with 4MB cache (would require an entirely new die)
  • Lolimaster - Saturday, July 16, 2016 - link

    They were nice in 2014.

    We should have a nice 20nm 768SP APU in 2015 with a full L2 cache Excavator and fully mature 896SP 20nm early this year.

    Remember the A8 3870K? That APU was a damn monster only hold back from being godly cause of their sub 3Ghz cpu speed, what we had after?

    400SP VLIW5 2011 --> 384 VLIW4 2012 --> 384VLIW4 2013 --> 512SP GCN 2015 --> 512SP GCN 2016

    Intel improved way faster (non "e" + edram igp's are near A8 level from being utter trash when the A8 3850 was release).
  • The_Countess - Tuesday, July 19, 2016 - link

    yes being able to thrown in a extra billion transistors compared to AMD (1.7 vs 0.75 billion transistors for a quad core with GPU) because of 14nm really does help intel along a lot.

    but as nobody has been able to make a 20nm class process for anything but flash and ram besides intel, AMD's hands were tied. there is nothing AMD could have done to change that.
  • BlueBlazer - Friday, July 15, 2016 - link

    Formula for failure: FM2 socket (with limited CPU upgradeability), only PCI Express x8 lanes available (which can bottleneck GPUs), and only "4 cores" (which performs more like 2C/4T Core i3 processor).
  • neblogai - Friday, July 15, 2016 - link

    Bristol Ridge is not FM2; PCI-E x8 can not bottleneck midrange GPUs; ultra low power mobile APU also sold as desktop chip is not a failure, just additional revenue
  • BlueBlazer - Friday, July 15, 2016 - link

    The results in the article shows otherwise, where AMD's Bristol Ridge was slower in most gaming tests, despite having better performance in some applications. Both FM2 and FM2+ are still the same (legacy) socket. AMD will be probably selling these chips at a loss. Note that these are the same (large) dies as Carrizo chips, and at 250mm^2 coupled with low prices typically meant razor thin margins or none at all.
  • silverblue - Friday, July 15, 2016 - link

    That L2 cache is probably making more difference than you realise.
  • evolucion8 - Saturday, July 16, 2016 - link

    The PCI-E is busted, even at PCI E 2.0 @ 4X, it barely makes a difference on the Fury X and the GTX 980 Ti.

Log in

Don't have an account? Sign up now