Nehalem - Everything You Need to Know about Intel's New Architectureby Anand Lal Shimpi on November 3, 2008 1:00 PM EST
- Posted in
Improved Loop Stream Detection
Core 2 featured a Loop Stream Detector (LSD), the point of this logic was to detect when the CPU was executing a loop in software, stop predicting branches (and potentially incorrectly predicting the last branch of the loop) and simply stream instructions out of the LSD:
The traditional prediction pipeline
The LSD active on Penryn
The branch prediction and fetch hardware could be disabled and in current Core 2 CPUs you could hold up to 18 instructions in the LSD and simply stream them over and over again intro the decode engine until the loop was completed or you ran out of instructions in the LSD.
The LSD active on Nehalem
In Nehalem, the LSD is moved behind the decoder and now caches decoded micro-ops. If a loop is detected, the branch prediction, fetch and decode hardware can now all be powered down and the LSD can stream directly into the re-order buffer. Nehalem can cache 28 micro-ops in its LSD, which actually works out to more “instructions” than what Core 2 could do.