The 5 Things that Comprise Dothan

There are five basic parts of Dothan that differ it from Banias, but unfortunately (just as was the case with Banias), Intel is not very forthcoming with details about Dothan out of a desire to guard their intellectual property. Even a year after its release, we have yet to see any serious competition for the Pentium M and Intel wants it to remain that way for as long as possible.

That being said, we will try to be as specific about the details of Dothan as much as possible; and we'll start at the most obvious - its 90nm process.

90nm process and 2MB L2

Banias was built on Intel's 0.13-micron manufacturing process at its peak. The tried and true manufacturing process meant that Banias faced no manufacturing delays and could hit its target clock speeds without a problem.

Dothan gets its most noticeable improvements over Banias, thanks to the move to Intel's smaller 90nm manufacturing process. This is the same process that's used in the manufacturing of Prescott, which means a couple of things. For starters, it explains why availability of Dothan hasn't been incredible, since its launch as 90nm production is still ramping. The availability problem aside, 90nm gives Dothan the ability to cram almost twice as many transistors onto the chip without increasing the overall die size compared to Banias.

Dothan is now a 140 million transistor chip (up from 77 million in Banias) with those 140 million transistors occupying the same 84 mm2 die area as Banias (almost, Banias is about 1 mm^2 smaller). Almost twice the transistors with no increase in die size? It's a chip manufacturer's dream. Because of the stagnant die size, yields should not differ between Banias and Dothan (once Intel's 90nm process has truly matured) and it shouldn't cost Intel any more to produce Dothan than it did Banias.

The majority of the increase in transistor count is thanks to Dothan's 2MB L2 cache, twice that of Banias' 1MB cache. The 64KB L1 cache remains the same that was present in Banias.

We believe that Intel is using the same 90nm SRAM cells from Prescott in Dothan. If they are indeed, then the extremely small 84 mm2 die is further enabled by the significantly smaller 90nm SRAM cells that Intel developed. However, we are not clear as to how independent Banias and Dothan's SRAM cell design remains from the desktop chips, thanks to their unique power requirements.

Along with a larger L2 cache, Intel has increased how aggressively Dothan prefetches data into its cache in order to take advantage of the extra on-die L2. This is a fairly normal practice that microprocessor designers employ whenever an architecture stays the same, but cache size increases in order to help improve performance.

The 90nm process will also allow Dothan to scale up in clock speed, thanks in part to Intel's strained silicon technology, something that we're already seeing the fruits of today with its introductory 2GHz clock speed (up from Banias' 1.6GHz intro speed). Dothan will break the 2GHz barrier by the end of 2004. Remember that Intel's design philosophy with Dothan, just like Banias, is to design the chip for a specific power consumption and to leave clock speed scaling mostly up to the manufacturing process to enable.

Dothan's 90nm manufacturing process, in the end, gives it the higher clock speeds and larger L2 cache, which offer some of the more tangible advantages over Banias. Another very important fact to keep in mind is that these are the only major changes to Banias that make up Dothan; unlike Prescott, the pipeline has not been changed at all. Even Intel's Dothan design team views Prescott as a bit of a risky move, to try out significant modifications to the architecture alongside a brand new manufacturing process. Thus, it's no surprise that Dothan remains relatively unchanged architecturally outside of the move to 90nm; the pipeline and L1 cache are identical to Banias.

Micro Ops Fusion

Intel has been deliberately vague about Banias' micro ops fusion and they continue to be such with the modifications to the micro ops fusion engine in Dothan. All that we are allowed to publish is that Dothan now allows more types of micro ops to be fused, which isn't a bad thing, it would just be nice to know which ones and what enables Dothan to support the fusing of more micro ops.

Local Branch Prediction Improvements

With Dothan, there have been some improvements to branch prediction performance in order to reduce power consumption and increase performance. Remember that the fewer branch mispredicts you have, the less power that is wasted on refilling the pipeline after a flush.

One of the biggest improvements to Dothan's branch predictors is in its loop detector. Although most don't think of a loop as a branch, all loops either end or begin with some sort of a comparison statement that determines whether the loop should continue to execute (e.g. if i ‹ 10, then keep looping). Loops are normally handled by a static branch predictor that always predicts taken once a loop is detected, and usually the only mispredicts that exist once a loop is detected are at the end of the loop. While this works fine for larger loops (100+ iterations), it does not work so well for extremely small loops (e.g. 5 iterations). What ends up happening is that the 5th, 6th and 7th time around, the predictor will mispredict a taken branch when, actually, the loop is finished with. Mispredicting 3 times for a loop that only runs for 5 iterations does not help branch prediction accuracy, so we have a problem on our hands.

Dothan includes a more sophisticated algorithm in its detection and prediction of branches involving small loops; once again, Intel was purposely vague about exactly what Dothan does that Banias did not, but just know that Dothan has better overall branch predictor performance, thanks to modifications like improved detection of short loops.

Faster Integer Division

When moving to a small manufacturing process, it's often possible to include logic that didn't make the cut originally due to space constraints, such is the case with Dothan and its integer division performance. Once again, all we know is that integer division is faster on Dothan, but no idea how fast or why.

Enhanced Register Access Mananger

As we mentioned at the beginning of this article, much of what went into Dothan were tweaks to Banias that couldn't be implemented without pushing the design completion date further out. One such fix that didn't make it to Banias was a workaround to a register access issue that caused the entire pipeline to stall in Banias. The situation was a unique one, where a partial register write followed by a full register read would cause the pipeline to stall. Dothan features a workaround for the problem and there is no longer a performance penalty for performing a partial register access followed by a full register access.

A quick look back at Banias The Pentium M Socket
Comments Locked

28 Comments

View All Comments

  • phtbddh - Wednesday, July 21, 2004 - link

    What is the battery life of a Dothan compared to a Banias? I know the Dothan is suppose to be better, but can we see some numbers?
  • tfranzese - Wednesday, July 21, 2004 - link

    Not quite SKiller, a large part of the P4's dominance in media encoding is the high core frequency attributed to such a long pipeline.
  • SKiller - Wednesday, July 21, 2004 - link

    I think the assertion that..

    "With Intel's vision for the future being centered on media encoding and content creation, the Pentium M is the last thing that Intel would want to build their future desktop CPUs around."

    ..may not be correct as by your own admission:

    "Partially constrained by its 400MHz FSB and single channel memory interface, the Pentium M is not the successor to the Pentium 4 that many will make it out to be."

    So all Intel would have to do is up the FSB on a desktop version to improve media encoding and content creation performance and be competitive with P4.
  • mkruer - Wednesday, July 21, 2004 - link

    you know i wonder just how much of the preformance is gained from the 2MB of L2 cache. If I recall from Aceshardware the 2MB is the sweetspot For mico op code, any more, and there is a preformance hit in either direction, Also on a side note. The 90nm Athlon 64 show a ~5% improvement across the board.
  • dvinnen - Wednesday, July 21, 2004 - link

    Yea, I was wondering the same thing. Why not just use a mobile A64 system with a mobile 9600. Acer and emachines make systems with them.
  • alexruiz - Wednesday, July 21, 2004 - link

    Another one: Was that difficult to get an eMachines M68xx for the review? Mobile against mobile.
  • alexruiz - Wednesday, July 21, 2004 - link

    Anand made a huge mistake in the Athlon 64 CPU selection. The mobile [b]A64 3000+ is clocked at 1.8 GHz with a 1MB L2 cache[/b]. He used a desktop 2.0 GHz with 512 K. This will affect the outcome, specially because clock speed matters more cache.

    I knew Dotham was going to give a very good fight, but I didn't expect it to win any gaming application ot Business Winstone. As reference, my M6805 A64 3000+ scores 22.2 and 27.8 in the BW and CCMW tests (7K60 hard drive, so not the same setup)

    A very good review, but we can do better. I still want to see video encoding tests run with a commercial application, preferably 3 (Ulead Video Studio 8, Roxio Videowave 7, Pinnacle 9) and 2 alternative programs for DivX encoding (DVD2AVI and virtualdubmod are suggested. We have seen enough XMPEG from other sites)

    Run some photoedition benchmarks not only with Adobe, but also with Corel Photopaint 11 or Roxio Photosuite.

    AutoCAD is also expected to give an idea of what be attained. SolidWorks or UG would be fantastic, but those 2 are more of a wish.

    How about more scientific or technical programs? Electrical simulators (PSpice for example), FEA (Nastran), MathCAd, Maple, etc.

    More games were expected to be run. Howe about chess programs? How about OSmark, the succesor of COSBI by Van Smith?

    I stressed the use of 2 or more applications that do the same to highlight the fact that software optimization matters a lot and that some myth about a CPU being "the best for that activity" are only myths.

    All in all, Dotham is a potent rival that uncovers some weaknesses in the K7/K8 architecture that were noticeable against the P6 (Pentium II/III) but forgotten against the P7 (Pentium 4): [b]L2 cache performance[/b] and integer performance.

    Regarding battery life keep in mind that the CPU is not the biggest spender in a laptop, the screen is. The K8T800, the most popular chipset for AMF64 laptops is a desktop part, and is quite voracious. Keep those factor when battery life is evaluated.

    I foresee that SOI will give AMD the edge in battery life once they implement power saving caches, the biggest energy conservation feature in the P-M.

    Comments are welcome


    Alex
  • dacaw - Wednesday, July 21, 2004 - link

    Well Dothan looks very much like a copy of a 32-bit AthlonXP to me.

    Comparing it to an Athlon64 makes no sense. Dothan is not 64-bit.

    I bought an AthlonXP Barton mobile 2600 for $99 and it runs barely warm under PowerNow. What could you buy for the price of a Dothan? Maybe 5 top-of-the-line Athlon XPs?

    Let's compare apples to apples and have a review of top-of-the line Dothan to top-of-the-line AthlonXP.

    Oh, and drop those fake synthetic benchmarks. What point are they if they simply "favor" Intel processors (your comment in the review).

    Come on Anand, lets have a review that really means something. Please!
  • Jeff7181 - Wednesday, July 21, 2004 - link

    Can't wait to see battery life tests.
  • mino - Wednesday, July 21, 2004 - link

    Nice review, however it is a shame you didn't include Celeron 2.4 (which could be find in many SLOW notebooks) and also AXP-M 2600+ would be nice. -> this way it would be a complete notebook market review. - The best one.

    I'll love to see bench results of Cely and XP added (by using same desktop platform as you did in case of P4)

    mino

Log in

Don't have an account? Sign up now