Core: Out of Order and Execution

After Prefetch, Cache and Decode comes Order and Execution. Without rehashing discussions of in-order vs. out-of-order architectures, typically a design with more execution ports and a larger out-of-order reorder buffer/cache can sustain a higher level of instructions per clock as long as the out-of-order buffer is smart, data can continuously be fed, and all the execution ports can be used each cycle. Whether having a super-sized core is actually beneficial to day-to-day operations in 2016 is an interesting point to discuss, during 2006 and the Core era it certainly provided significant benefits.

As Johan did back in the original piece, let’s start with semi-equivalent microarchitecture diagrams for Core vs. K8:


Intel Core


AMD K8

For anyone versed in x86 design, three differences immediately stand out when comparing the two. First is the reorder buffer, which for Intel ranks at 96 entries, compared to 72 for AMD. Second is the scheduler arrangement, where AMD uses split 24-entry INT and 36-entry FP schedulers from the ‘Instruction Control Unit’ whereas Intel has a 32-entry combined ‘reservation station’. Third is the number of SSE ports: Intel has three compared to two from AMD. Let’s go through these in order.

For the reorder buffers, with the right arrangement, bigger is usually better. Make it too big and it uses too much silicon and power however, so there is a fine line to balance between them. Also, the bigger the buffer it is, the less of an impact it has. The goal of the buffer is to push decoded instructions that are ready to work to the front of the queue, and make sure other instructions which are order dependent stay in their required order. By executing independent operations when they are ready, and allowing prefetch to gather data for instructions still waiting in the buffer, this allows latency and bandwidth issues to be hidden. (Large buffers are also key to simultaneous multithreading, which we’ll discuss in a bit as it is not here in Core 2 Duo.) However, when the buffer has the peak number of instructions being sent to the ports every cycle already, having a larger buffer has diminishing returns (the design has to keep adding ports instead, depending on power/silicon budget).

For the scheduler arrangements, using split or unified schedulers for FP and INT has both upsides and downsides. For split schedulers, the main benefit is entry count - in this case AMD can total 60 (24-INT + 36-FP) compared to Intel’s 32. However, a combined scheduler allows for better utilization, as ports are not shared between the split schedulers.

The SSE difference between the two architectures is exacerbated by what we’ve already discussed – macro-op fusion. The Intel Core microarchitecture has 3 SSE units compared to two, but also it allows certain SSE packed instructions to execute within one instruction, due to fusion, rather than two. Two of the Intel’s units are symmetric, with all three sporting 128-bit execution rather than 64-bit on K8. This means that K8 requires two 64-bit instructions whereas Intel can absorb a 128-bit instruction in one go. This means Core can outperform K8 on 128-bit SSE on many different levels, and for 64-bit FP SSE, Core can do 4 DP per cycle, whereas Athlon 64 can do 3.

One other metric not on the diagram comes from branch prediction. Core can sustain one branch prediction per cycle, compared to one per two cycles on previous Intel microarchitectures. This was Intel matching AMD in this case, who already supported one per cycle.

Core: Decoding, and Two Goes Into One Core: Load Me Up, but no Hyper-Threading or IMC
Comments Locked

158 Comments

View All Comments

  • Hrel - Thursday, July 28, 2016 - link

    10 years to double single core performance, damn. Honestly thought Sandy Bridge was a bigger improvement than that. Only 4 times faster in multi-core too.

    Glad to see my 4570S is still basically top of the line. Kinda hard to believe my 3 year old computer is still bleeding edge, but I guess that's how little room for improvement there is now that Moore's law is done.

    Guess if Windows 11 brings back normal functionality to the OS and removes "apps" entirely I'll have to upgrade to a DX12 capable card. But I honestly don't think that's gonna happen.

    I really have no idea what I'm gonna do OS wise. Like, I'm sure my computers won't hold up forever. But Windows 10 is unusable and Linux doesn't have proper support still.

    Computer industry, once a bastion of capitalism and free markets, rife with options and competition is now become truly monastic. Guess I'm just lamenting the old days, but at the same time I am truly wondering how I'll handle my computing needs in 5 years. Windows 10 is totally unacceptable.
  • Michael Bay - Thursday, July 28, 2016 - link

    I like how desperate you anti-10 shills are getting.
    More!
  • Namisecond - Thursday, July 28, 2016 - link

    I do not think that word means what you think it means...
  • TormDK - Thursday, July 28, 2016 - link

    You are right - there is not going to be a Windows 11, and Microsoft is not moving away from "apps".

    So you seems stuck between a rock in a hard place if you don't want to go on Linux or a variant, and don't want to remain in the Microsoft ecosystem.
  • mkaibear - Thursday, July 28, 2016 - link

    >Windows 10 is unusable

    Now, just because you're not capable of using it doesn't mean everyone else is incapable. There are a variety of remedial computer courses available, why not have a word with your local college?
  • AnnonymousCoward - Thursday, July 28, 2016 - link

    4570S isn't basically top of the line. It and the i5 are 65W TDP. The latest 91W i7 is easily 33% faster. Just run the benchmark in CPU-Z to see how you compare.
  • BrokenCrayons - Thursday, July 28, 2016 - link

    Linux Mint has been my primary OS since early 2013. I've been tinkering with various distros starting with Slackware in the late 1990s as an alternative to Windows. I'm not entirely sure what you mean my "doesn't have proper support" but I don't encourage people to make a full conversion to leave Windows behind just because the current user interface isn't familiar.

    There's a lot more you have to figure out when you switch from Windows to Linux than you'd need to learn if going from say Windows 7 to Windows 10 and the transition isn't easy. My suggestion is to purchase a second hand business class laptop like a Dell Latitude or HP Probook being careful to avoid AMD GPUs in doing so and try out a few different mainstream distros. Don't invest a lot of money into it and be prepared to sift through forums seeking out answers to questions you might have about how to make your daily chores work under a very different OS.

    Even now, I still keep Windows around for certain games I'm fond of but don't want to muck around with in Wine to make work. Steam's Linux-friendly list had gotten a lot longer in the past couple of years thanks to Valve pushing Linux for the Steam Box and I think by the time Windows 7 is no longer supported by Microsoft, I'll be perfectly happy leaving Windows completely behind.

    That said, 10 is a good OS at its core. The UI doesn't appeal to everyone and it most certainly is collecting and sending a lot of data about what you do back to Microsoft, but it does work well enough if your computing needs are in line with the average home user (web browsing, video streaming, gaming...those modest sorts of things). Linux can and does all those things, but differently using programs that are unfamiliar...oh and GIMP sucks compared to Photoshop. Just about every time I need to edit an image in Linux, I get this urge to succumb to the Get Windows 10 nagware and let Microsoft go full Big Brother on my computing....then I come to my senses.
  • Michael Bay - Thursday, July 28, 2016 - link

    GIMP is not the only, ahem, "windows ecosystem alternative" that is a total piece of crap on loonixes. Anything outside of the browser window sucks, which tends to happen when your code maintainers are all dotheads and/or 14 years old.
  • Arnulf - Thursday, July 28, 2016 - link

    I finally relegated my E6400-based system from its role as my primary computer and bought a new one (6700K, 950 Pro SSD, 32 GB RAM) a couple of weeks ago.

    While the new one is certainly faster at certain tasks the biggest advantage for me is significantly lower power consumption (30W idle, 90W under load versus 90W idle and 160-180W under load for the old one) and consequently less noise and less heat generation.

    Core2 has aged well for me, especially after I added a Samsung 830 to the system.
  • Demon-Xanth - Thursday, July 28, 2016 - link

    I still run an i5-750, NVMe is pretty much the only reason I want to upgrade at all.

Log in

Don't have an account? Sign up now