Under The Hood of Celeron D

For an in-depth look at what's different with the new Celeron, the first 11 or so pages of our Pentium 4 E (Prescott) launch article do an excellent job of covering the bases. For a quick summary, here's a look at the major changes inside the Prescott core:
  • 90nm Strained Silicon Process - more, faster transistors in less space
  • 31 Pipeline Stages - for clock speed ramping
  • Improved Branch Predictor - helps avoid pipeline stall
  • Improved Scheduler - helps avoid doing unnecessary work
  • Improved Execution Core - added integer multiply and fast shift to ALU
  • Larger, Slower Caches - higher latency caches for speed and size scaling
  • SSE3 - 13 new instructions
The Celeron D gets an additional bonus of an FSB speed increase from 400MHz to 533MHz as well.

Even with the ominous 31-stage pipeline and higher latency caches, we get better performance with the new Celeron D. So, how does all this stack up to make Prescott a better Celeron than Northwood? Well, let's take it step by step.

First of all, the 16kb L1 cache size of Prescott has a significant impact on the Celeron. Northwood based Celerons only have 8kb of L1 cache. With 8kb more of the on die data stored "closer" (in terms of latency) to the processor, we will definitely see more cache hits get to the processor quicker in spite of the fact that cache latency on Celeron D is the same as Pentium 4 E. Prescott's cache latency is much higher than Northwood's. Improving this ability to recover is critical, as eventhough Celeron D has an increased L2 cache, the size of on die memory is still small and cache misses will occur more than on the Pentium 4.

When dealing with a processor short on cache and prone to very painful pipeline stalls, improving the average cache hit latency can really help to keep extra stalls from happening (a fast L2 hit will come back in about 25 cycles on Prescott), and can help to refill the pipeline once its stalled (as more data will be able to get back into the pipeline faster).

This 8kb of extra L1 cache is a much smaller portion of Pentium 4's total cache size. Since Pentium 4 E has fewer cache misses than Celeron D (it has 4 times the L2 cache), improvements to the L1 cache size don't have as much opportunity to shine.

Speaking of L2, the Celeon D has received an increase from 128kb in the current Celeron to 256kb. Even though this is still a quarter of the (still insufficient) 1MB cache the Pentium 4 E has, we aren't going to see the same type of performance drop we saw when moving from the Northwood Pentium 4 to Celeron (which also had a quarter of its big brother's cache). The reason is the number of cache hits we will see increase rapidly and hit a point of diminishing returns after a certain size. The curve is similar to a logarithmic curve (benefits increase rapidly as cache size increases at first, but then level off quickly).

What it comes down to is that doubling a small cache (say, going from 128kb to 256kb) will have a much higher impact on performance (because the number of cache hits is significantly increased) than doubling a larger cache (like going from 512kb to 1MB). In other words, P4 E gets less benefit from its doubled L2 cache than Celeron D.

While we're on the subject of caches and memory, the 533MHz frontside bus effectively gets data from memory to the processor faster in case of a cache miss. This is very important in the low- cache environment of the Celeron world. Unfortunately, we couldn't increase our multiplier and run our 2.8 GHz Celeron 335 at 28x100 to see just what kind of impact bus speed has on the new processor.

The enhancements Intel made to branch prediction and scheduling round out the factors that help make Prescott an excellent Celeron core. Since we're working with a small L2 cache, it is excessively important to work with good data and avoid stalls for reasons other than cache misses. Northwood is at a disadvantage to Prescott here. Better branch prediction will help avoid filling the cache with data from a mis-predicted branch as well as aid in averting unnecessary bubbles in the pipeline for the same reason. Better scheduling means more efficient use of the data available to the processor as well. Northwood is stuck on these two counts. Adding an integer multiply and fast shift/rotate to Prescott also helped the Celeron D maintain a high level of efficiency, but this really shouldn't have any greater impact on Celeron D than on Pentium 4.

It all comes down to being resilient and efficient. Northwood is very dependent on its L2 cache size. The enhancements Intel made to Prescott in order to avoid that large negative impact of adding so many pipeline stages really benefit the processor when it is starved for data. Prescott has to be more careful not to stall just to keep up with the current Pentium 4 line. As a result, the Celeron flavor can deal with tighter constraints on L2 cache size, which help even more when paired with a larger cache than the Northwood derived version.

Index CPU Model Numbers and Pricing
POST A COMMENT

51 Comments

View All Comments

  • JeremiahTheGreat - Monday, July 26, 2004 - link

    I bought a Celeron D 320 (2.4Ghz).. running it at 3.2Ghz as we speak! I know.. why would someone buy it to replace a XP2700+.. and that I cannot answer :) Reply
  • Minot - Wednesday, June 30, 2004 - link

    Has anyone seen these processors for sale? I thought we'd see them available for sale by now. Reply
  • Karaktu - Monday, June 28, 2004 - link

    What's funny about all the hype surrounding the Celeron "D" is that it is no different than what some of us have been doing with Mobile Celeron CPUs for months (except the "D" has SSE3).

    Buy a 100MHz FSB Mobile Celeron, crank it up to 200MHz FSB, and you have a CPU that can hold its own.

    I had a for sale thread awhile back that gives you plenty of info:

    http://forums.anandtech.com/messageview.cfm?catid=...

    And a screen shot of a 1.6GHz CPU at 2.13GHz (133FSB)

    http://tschidanet.com/forsalepics/213.jpg

    So maybe this is an instance of Intel paying attention to what the overclockers are doing. Then again, probably not...

    Joe
    Reply
  • Spacecomber - Saturday, June 26, 2004 - link

    First off, let me say that I'm a long time fan of AnandTech, so my criticisms are hopefully constructive ones.

    It seems to me that this article suffers from taking something of a cookie cutter approach to reviewing these new processors. In other words, it talks about the processor's new architecture and then runs a bunch of benchmarks with an eye to seeing whether the new architecture actually demonstrates “real world” benefits. This is all fine, but I think the review would have been better if the writer had taken a bit more time to think about what possible interests the typical AnandTech reader might have in this chip. While the article successfully shows how the Celeron D is an improvement over the previous P4 based Celeron, and this is in itself is newsworthy, it still leaves many AnandTech readers with a number of unanswered questions, as they wonder whether this new processor is really something that they should take an interest in.

    You've already seen and noted many of these questions, such as whether this processor can be easily overclocked and how it performs in comparison to other kinds of processors, such as full blown P4's in roughly the same price range as the top end Celeron D.

    Before actually suggesting some questions for AnandTech staff to think about for a potential follow-up article, let me mention a previous Celeron up-date, which has some similarities to this most recent one, the Tualatin Celeron. If you think a bit about what made this processor so interesting, i.e., new architecture allowed for better performance than its predecessor, backward compatibility (including PII motherboards with an adaptor), and easy overclockability on motherboards supporting frontside bus speeds faster than the default speed for this processor, I think you can better imagine some the questions that readers will be thinking about with regard to this latest Celeron.

    So, here are my questions, whether it will overclock has already been asked, but are these new Celerons backward compatible with older chipsets supporting a 533 MHz bus, such as the 850E, E7205, or the 845PE? Does this new Celeron have hyperthreading? How do these new Celerons fit in to some sort of a bang for the buck curve, both at their default speeds and overclocked (assuming that they can be overclocked), compared to other processors?

    I hope this is helpful, and I look forward to your future articles.

    Space
    Reply
  • davidbec - Friday, June 25, 2004 - link

    Since the Celeron D cost abour $117 it would only be fair to include the Athlon XP 2800+ in the review. For reasons or price comparison. The reviewer himself expressed his distaste when resellers charge customers to "upgrade" computers from Athlon XP processors to Northwood Celeron.

    Let justice be done. Let your viewers know the truth. Include an Athlon XP 2800+ in the review.

    In addition, the AXP 2600+ is supposed to match the P4 2.6. To be fair to the less informed viewers include the AXP 2800+ so that Intel's 2.8 chip can be matched with a processor AMD supposed equalvalent. Which is the Athlon XP 2800+

    Otherwise a great review!! Good job.

    D
    Reply
  • Zebo - Friday, June 25, 2004 - link

    We definitely note the request for heat, overclocking, and Pentium 4 Prescott comparisons ...
    ----------------------
    While your at throw a $100 air cooled mobile barton @2600Mhz and watch the beating Intel takes.
    Reply
  • johnsonx - Friday, June 25, 2004 - link

    I'd like to second (or third, or whatever) the call for at least adding a Prescott 2.4A to the benchmark mix. The 2.4A's play in the same pricing ballpark as the higher-clocked Celeron D's, and a certain large chain store often sells a bundle of a P4 2.4A and an ECS i848 board for $120 or $130, depending on the week. That bundle makes the 2.4A cheaper than the cheapest Celeron D (though nothing compared to the XP 2500+ and NForce2 bundles for $70 a few weeks ago!)

    I won't name said store, but just think of the potato-based fat sticks you get with a burger in the drive-thru... (sorry, they're on the west coast and Texas only, though I imagine that other stores in other places offer similar bundles).
    Reply
  • DerekWilson - Friday, June 25, 2004 - link

    We definitely note the request for heat, overclocking, and Pentium 4 Prescott comparisons ...

    We hear your requests, and will look into our review schedule and see if we have room for a follow up.

    Thanks,
    Derek Wilson
    Reply
  • Minot - Thursday, June 24, 2004 - link

    Can we get a comparison of a P4 2.4A (Prescott, 1MB L2 cache, 533 MHz FSB) compared to these new Celeron D processors? Reply
  • Pumpkinierre - Thursday, June 24, 2004 - link

    Yeah, there's something more to this than meets the eye. I dont really follow your cache arguments, Derek (and I'm known not to like caches when they are irrelevant). To me what applied to the P4E applies to the celeron D. Its a pity you didnt throw in a 533MHz 2.8E in your benchmarks. I predicted the Prescott celeron would be a good buy but more on the basis of less heat and better o'clocking. The only conclusion I can come of all this, is the Prescott core is better than we think but the cache structure is the problem. Else they've changed something in the pipeline architecture of these celeron Ds which may have ramifications for later stepping P4Es.

    Reply

Log in

Don't have an account? Sign up now