Under The Hood of Celeron D

For an in-depth look at what's different with the new Celeron, the first 11 or so pages of our Pentium 4 E (Prescott) launch article do an excellent job of covering the bases. For a quick summary, here's a look at the major changes inside the Prescott core:
  • 90nm Strained Silicon Process - more, faster transistors in less space
  • 31 Pipeline Stages - for clock speed ramping
  • Improved Branch Predictor - helps avoid pipeline stall
  • Improved Scheduler - helps avoid doing unnecessary work
  • Improved Execution Core - added integer multiply and fast shift to ALU
  • Larger, Slower Caches - higher latency caches for speed and size scaling
  • SSE3 - 13 new instructions
The Celeron D gets an additional bonus of an FSB speed increase from 400MHz to 533MHz as well.

Even with the ominous 31-stage pipeline and higher latency caches, we get better performance with the new Celeron D. So, how does all this stack up to make Prescott a better Celeron than Northwood? Well, let's take it step by step.

First of all, the 16kb L1 cache size of Prescott has a significant impact on the Celeron. Northwood based Celerons only have 8kb of L1 cache. With 8kb more of the on die data stored "closer" (in terms of latency) to the processor, we will definitely see more cache hits get to the processor quicker in spite of the fact that cache latency on Celeron D is the same as Pentium 4 E. Prescott's cache latency is much higher than Northwood's. Improving this ability to recover is critical, as eventhough Celeron D has an increased L2 cache, the size of on die memory is still small and cache misses will occur more than on the Pentium 4.

When dealing with a processor short on cache and prone to very painful pipeline stalls, improving the average cache hit latency can really help to keep extra stalls from happening (a fast L2 hit will come back in about 25 cycles on Prescott), and can help to refill the pipeline once its stalled (as more data will be able to get back into the pipeline faster).

This 8kb of extra L1 cache is a much smaller portion of Pentium 4's total cache size. Since Pentium 4 E has fewer cache misses than Celeron D (it has 4 times the L2 cache), improvements to the L1 cache size don't have as much opportunity to shine.

Speaking of L2, the Celeon D has received an increase from 128kb in the current Celeron to 256kb. Even though this is still a quarter of the (still insufficient) 1MB cache the Pentium 4 E has, we aren't going to see the same type of performance drop we saw when moving from the Northwood Pentium 4 to Celeron (which also had a quarter of its big brother's cache). The reason is the number of cache hits we will see increase rapidly and hit a point of diminishing returns after a certain size. The curve is similar to a logarithmic curve (benefits increase rapidly as cache size increases at first, but then level off quickly).

What it comes down to is that doubling a small cache (say, going from 128kb to 256kb) will have a much higher impact on performance (because the number of cache hits is significantly increased) than doubling a larger cache (like going from 512kb to 1MB). In other words, P4 E gets less benefit from its doubled L2 cache than Celeron D.

While we're on the subject of caches and memory, the 533MHz frontside bus effectively gets data from memory to the processor faster in case of a cache miss. This is very important in the low- cache environment of the Celeron world. Unfortunately, we couldn't increase our multiplier and run our 2.8 GHz Celeron 335 at 28x100 to see just what kind of impact bus speed has on the new processor.

The enhancements Intel made to branch prediction and scheduling round out the factors that help make Prescott an excellent Celeron core. Since we're working with a small L2 cache, it is excessively important to work with good data and avoid stalls for reasons other than cache misses. Northwood is at a disadvantage to Prescott here. Better branch prediction will help avoid filling the cache with data from a mis-predicted branch as well as aid in averting unnecessary bubbles in the pipeline for the same reason. Better scheduling means more efficient use of the data available to the processor as well. Northwood is stuck on these two counts. Adding an integer multiply and fast shift/rotate to Prescott also helped the Celeron D maintain a high level of efficiency, but this really shouldn't have any greater impact on Celeron D than on Pentium 4.

It all comes down to being resilient and efficient. Northwood is very dependent on its L2 cache size. The enhancements Intel made to Prescott in order to avoid that large negative impact of adding so many pipeline stages really benefit the processor when it is starved for data. Prescott has to be more careful not to stall just to keep up with the current Pentium 4 line. As a result, the Celeron flavor can deal with tighter constraints on L2 cache size, which help even more when paired with a larger cache than the Northwood derived version.

Index CPU Model Numbers and Pricing
Comments Locked

54 Comments

View All Comments

  • Marlin1975 - Thursday, June 24, 2004 - link

    Don't forget they were comapring a AMD chip that sells for 20% or more less. And also the the Sempron is AMDs new low line.
    Lets see how Celeron handles the sempron :)
  • SDA - Thursday, June 24, 2004 - link

    The hell? An XP 2200+ beating a 2500+ in compilation? I think you might need to rerun that one.. the 2500+ is clocked higher (only 33MHz higher, sure, but higher), it has more cache, and its FSB is faster. AFAIK, there is NO way in which it is worse than a 2200+, so it should not post worse numbers.
  • Minot - Thursday, June 24, 2004 - link

    When are these going to be available? I'm sure I'd still pick an Athlon XP over the Celeron D line, but for competetions sake, it will be good to see a worthy value competetor from Intel in the marketplace.
  • PrinceGaz - Thursday, June 24, 2004 - link

    Yes, Northwood Celerons have only 128K L2 cache while these Prescott Celeron 'D's have 256K.

    You could compare a Celeron D at 20x100 with an original Willamette core P4 2GHz (as they also had 256K L2 and 400FSB) if you wanted to do the comparison between core architecture excluding L2 cache and FSB. The gap would probably be a lot narrower.
  • Zebo - Thursday, June 24, 2004 - link

    Typo above: I meant AMD still owns price and performance with a two year old part.:)
  • Illissius - Thursday, June 24, 2004 - link

    Second Yomicron. I was under the impression that Northwood Celeron's have only 128KB cache. (Makes sense, considering each has a fourth of its P4 counterpart.)
    Also, iirc there was something of a price parity between Celerons and equivalently rated AXP's, so while these are certainly improvements (and not small ones either), they still fall clearly behind in price/performance (the 2.8GHz usually lost to the 2600+ as well as a few lower models).
  • Zebo - Thursday, June 24, 2004 - link

    AMD will still owns price to performance with thier 2 year old parts and even more so with Semiporn. But this is still wonderful news for 2004 beleaguered Intel. Let's see pricing..should be worth $60-$90 starting.
  • Yomicron - Thursday, June 24, 2004 - link

    I think there is a mistake about L2 cache sizes. It says that both the Prescott and Northwood based Celerons have the same amount of L2 cache. However, the Prescott version has 256KB while the desktop Celerons based on the Northwood core only have 128KB.
  • blackarc - Thursday, June 24, 2004 - link

    hmm... if only i could use them in a dual system :D
  • Budman - Thursday, June 24, 2004 - link

    How much does it overclock to??

Log in

Don't have an account? Sign up now