Core: It’s all in the Prefetch

In a simple CPU design, instructions are decoded in the core and data is fetched from the caches. In a perfect world, such as the Mill architecture, the data and instructions are ready to go in the lowest level cache at all times. This allows for the lowest latency and removes a potential bottleneck. Real life is not that rosy, and it all comes down to how the core can predict what data it needs and has enough time to drag it down to the lowest level of cache it can before it is needed. Ideally it needs to predict the correct data, and not interfere with memory sensitive programs. This is Prefetch.

The Core microarchitecture added multiple prefetchers in the design, as well as improving the prefetch algorithms, to something not seen before on a consumer core. For each core there are two data and one instruction prefetchers, plus another couple for the L2 cache. That’s a total of eight for a dual core CPU, with instructions not to interfere with ‘on-demand’ bandwidth from running software.

One other element to the prefetch is tag lookup for cache indexing. Data prefetchers do this, as well as running software, so in order to avoid a higher latency for the running program, the data prefetch uses the store port to do this. As a general rule (at least at the time), loads happen twice as often as stores, meaning that the store port is generally more ‘free’ to be used for tag lookup by the prefetchers. Stores aren’t critical for most performance metrics, unless the system can’t process stores quickly enough that it backs up the pipeline, but in most cases the rest of the core will be doing things regardless. The cache/memory sub-system is in control for committing the store through the caches, so as long as this happens eventually the process works out.

Core: More Cache Please

Without having access to a low latency data and instruction store, having a fast core is almost worthless. The most expensive SRAMs sit closest to the execution ports, but are also the smallest due to physical design limitations. As a result, we get a nested cache system where the data you need should be in the lowest level possible, and accesses to higher levels of cache are slightly further away. Any time spent waiting for data to complete a CPU instruction is time lost without an appropriate way of dealing with this, so large fast caches are ideal. The Core design, over the previous Netburst family but also over AMD’s K8 ‘Hammer’ microarchitecture, tried to swat a fly with a Buick.

Core gave a 4 MB Level 2 cache between two cores, with a 12-14 cycle access time. This allows each core to use more than 2MB of L2 if needed, something Presler did not allow. Each core also has a 3-cycle 32KB instruction + 32KB data cache, compared to the super small Netburst, and also supports 256 entries in the L1 data TLB, compared to 8. Both the L1 and L2 are accessible by a 256-bit interface, giving good bandwidth to the core.

Note that AMD’s K8 still has a few advantages over Core. The 2-way 64KB L1 caches on AMD’s K8 have a slightly better hit rate to the 8-way 32KB L1 caches on Core, with a similar latency. AMD’s K8 also used an on-die memory controller, lowering memory latency significantly, despite the faster FSB of Intel Core (relative to Netburst) giving a lower latency to Core. As stated in our microarchitecture overview at the time, Athlon 64 X2s memory advantage had gotten smaller, but a key element to the story is that these advantages were negated by other memory sub-system metrics, such as prefetching. Measured by ScienceMark, the Core microarchitecture’s L1 cache delivers 2x bandwidth, and the L2 cache is about 2.5x faster, than the Athlon one.

Ten Year Anniversary of Core 2 Duo and Conroe Core: Decoding, and Two Goes Into One
Comments Locked

158 Comments

View All Comments

  • Hrel - Thursday, July 28, 2016 - link

    10 years to double single core performance, damn. Honestly thought Sandy Bridge was a bigger improvement than that. Only 4 times faster in multi-core too.

    Glad to see my 4570S is still basically top of the line. Kinda hard to believe my 3 year old computer is still bleeding edge, but I guess that's how little room for improvement there is now that Moore's law is done.

    Guess if Windows 11 brings back normal functionality to the OS and removes "apps" entirely I'll have to upgrade to a DX12 capable card. But I honestly don't think that's gonna happen.

    I really have no idea what I'm gonna do OS wise. Like, I'm sure my computers won't hold up forever. But Windows 10 is unusable and Linux doesn't have proper support still.

    Computer industry, once a bastion of capitalism and free markets, rife with options and competition is now become truly monastic. Guess I'm just lamenting the old days, but at the same time I am truly wondering how I'll handle my computing needs in 5 years. Windows 10 is totally unacceptable.
  • Michael Bay - Thursday, July 28, 2016 - link

    I like how desperate you anti-10 shills are getting.
    More!
  • Namisecond - Thursday, July 28, 2016 - link

    I do not think that word means what you think it means...
  • TormDK - Thursday, July 28, 2016 - link

    You are right - there is not going to be a Windows 11, and Microsoft is not moving away from "apps".

    So you seems stuck between a rock in a hard place if you don't want to go on Linux or a variant, and don't want to remain in the Microsoft ecosystem.
  • mkaibear - Thursday, July 28, 2016 - link

    >Windows 10 is unusable

    Now, just because you're not capable of using it doesn't mean everyone else is incapable. There are a variety of remedial computer courses available, why not have a word with your local college?
  • AnnonymousCoward - Thursday, July 28, 2016 - link

    4570S isn't basically top of the line. It and the i5 are 65W TDP. The latest 91W i7 is easily 33% faster. Just run the benchmark in CPU-Z to see how you compare.
  • BrokenCrayons - Thursday, July 28, 2016 - link

    Linux Mint has been my primary OS since early 2013. I've been tinkering with various distros starting with Slackware in the late 1990s as an alternative to Windows. I'm not entirely sure what you mean my "doesn't have proper support" but I don't encourage people to make a full conversion to leave Windows behind just because the current user interface isn't familiar.

    There's a lot more you have to figure out when you switch from Windows to Linux than you'd need to learn if going from say Windows 7 to Windows 10 and the transition isn't easy. My suggestion is to purchase a second hand business class laptop like a Dell Latitude or HP Probook being careful to avoid AMD GPUs in doing so and try out a few different mainstream distros. Don't invest a lot of money into it and be prepared to sift through forums seeking out answers to questions you might have about how to make your daily chores work under a very different OS.

    Even now, I still keep Windows around for certain games I'm fond of but don't want to muck around with in Wine to make work. Steam's Linux-friendly list had gotten a lot longer in the past couple of years thanks to Valve pushing Linux for the Steam Box and I think by the time Windows 7 is no longer supported by Microsoft, I'll be perfectly happy leaving Windows completely behind.

    That said, 10 is a good OS at its core. The UI doesn't appeal to everyone and it most certainly is collecting and sending a lot of data about what you do back to Microsoft, but it does work well enough if your computing needs are in line with the average home user (web browsing, video streaming, gaming...those modest sorts of things). Linux can and does all those things, but differently using programs that are unfamiliar...oh and GIMP sucks compared to Photoshop. Just about every time I need to edit an image in Linux, I get this urge to succumb to the Get Windows 10 nagware and let Microsoft go full Big Brother on my computing....then I come to my senses.
  • Michael Bay - Thursday, July 28, 2016 - link

    GIMP is not the only, ahem, "windows ecosystem alternative" that is a total piece of crap on loonixes. Anything outside of the browser window sucks, which tends to happen when your code maintainers are all dotheads and/or 14 years old.
  • Arnulf - Thursday, July 28, 2016 - link

    I finally relegated my E6400-based system from its role as my primary computer and bought a new one (6700K, 950 Pro SSD, 32 GB RAM) a couple of weeks ago.

    While the new one is certainly faster at certain tasks the biggest advantage for me is significantly lower power consumption (30W idle, 90W under load versus 90W idle and 160-180W under load for the old one) and consequently less noise and less heat generation.

    Core2 has aged well for me, especially after I added a Samsung 830 to the system.
  • Demon-Xanth - Thursday, July 28, 2016 - link

    I still run an i5-750, NVMe is pretty much the only reason I want to upgrade at all.

Log in

Don't have an account? Sign up now