Presler vs. Smithfield - A Brief Look

Other than the larger L2 cache, Presler as incorporated in the Pentium Extreme Edition 955 provides us with two more enhancements over Smithfield: 1066MHz FSB support and a higher clock speed (3.46GHz).

We wanted to isolate the performance improvement due to the larger L2 cache aside from the other improvements to Presler, so we underclocked our sample and its FSB, and compared it to a Pentium D 820 (2.8GHz). 

Looking at a small subset of our tests, we can get a feel for where you can expect the largest performance gains due simply to the increase in L2 cache size.  Remember that since L2 access latency on Smithfield was already at 27 cycles, Presler's cache isn't any slower, so what we end up measuring is how large of an impact a 2MB cache has in some of our benchmarks. 

 Winstone   Business Winstone 2004  Multimedia Content Creation Winstone 2004
Presler 19.0 30.2
Smithfield 18.5 29.9

Under Business Winstone 2004, we see a boost of just under 3%, thanks to the larger cache size.  We have seen the biggest improvements in Winstone, thanks to lower latency caches and higher clock speeds, so it's not too much of a surprise to see a minimal impact here.  Content Creation Winstone 2004 shows no real performance impact either. 

 Media Encoding  3dsmax 7 Composite DVD Shrink WME9 H.264 iTunes
Presler 2.03 9.1m 31.3fps 10.5m 50s
Smithfield 2.05 8.9m 31.0fps 10.5m 50s

Our 3D rendering, video encoding and audio encoding tests basically all agree with the earlier results - the added cache doesn't really improve performance here, but that's to be expected, given the nature of the applications (and the already quite large 1MB L2 cache to which we are comparing). 

 Gaming   Battlefield 2  Call of Duty 2 Quake 4
Presler 77.3 76.2 130.6
Smithfield 73.0 75.6 125.5

It isn't until we look at some of our 3D gaming tests that we start to see some more tangible performance gains.  In games, there are some decent performance improvements to be had, ranging anywhere from 0 to just under 6%, thanks to the larger cache alone. 

Couple the larger cache with a faster FSB and higher clock speed, and the Pentium Extreme Edition 955 is shaping up to be a decent improvement over its predecessor. 

Larger L2, but no increase in latency? Multi-Core Support in Games?
Comments Locked

84 Comments

View All Comments

  • Anand Lal Shimpi - Friday, December 30, 2005 - link

    I had some serious power/overclocking issues with the pre-production board Intel sent for this review. I could overclock the chip and the frequency would go up, but the performance would go down significantly - and the chip wasn't throttling. Intel has a new board on the way to me now, and I'm hoping to be able to do a quick overclocking and power consumption piece before I leave for CES next week.

    Take care,
    Anand
  • Betwon - Friday, December 30, 2005 - link

    quote:


    We tested four different scenarios:

    1. A virus scan + MP3 encode
    2. The first scenario + a Windows Media encode
    3. The second scenario + unzipping files, and
    4. The third scenario + our Splinter Cell: CT benchmark.

    The graph below compares the total time in seconds for all of the timed tasks (everything but Splinter Cell) to complete during the tests:

    AMD Athlon 64 X2 4800+ AVG LAME WME ZIP Total
    AVG + LAME 22.9s 13.8s 36.7s
    AVG + LAME + WME 35.5s 24.9s 29.5s 90.0s
    AVG + LAME + WME + ZIP 41.6s 38.2s 40.9s 56.6s 177.3s
    AVG + LAME + WME + ZIP + SCCT 42.8s 42.2s 46.6s 65.9s 197.5s

    Intel Pentium EE 955 (no HT) AVG LAME WME ZIP Total
    AVG + LAME 24.8s 13.7s 38.5s
    AVG + LAME + WME 39.2s 22.5s 32.0s 93.7s
    AVG + LAME + WME + ZIP 47.1s 37.3s 45.0s 62.0s 191.4s
    AVG + LAME + WME + ZIP + SCCT 40.3s 47.7s 58.6s 83.3s 229.9s


    We find that it isn't scientific. Anandtech is wrong.
    You should give the end time of the last completed task, but not the sum of each task's time.

    For expamle: task1 and task2 work at the same time

    System A only spend 51s to complete the task1 and task2.
    task1 -- 50s
    task2 -- 51s

    System B spend 61s to complete the task1 and task2.
    task1 -- 20s
    task2 -- 61s

    It is correct: System A(51s) is faster than System B(61s)
    It is wrong: System A(51s+50s=101s) is slower than System B(20s+61s=81s)
  • tygrus - Tuesday, January 3, 2006 - link

    The problem is they don't all finish at the same time and the ambiguous work of a FPS task running.

    You could start them all and measure the time taken for all tasks to finish. That's a workload but it can be susceptible to the slowest task being limited by its single thread performance (once all other tasks are finished, SMP underutilised).

    Another way is for tasks that take longer and run at a measurable and consistent speed.
    Is it possible to:
    * loop the tests with a big enough working set (that insures repeatable runs);
    * Determine average speed of each sub-test (or runs per hour) while other tasks are running and being monitored;
    * Specify a workload based on how many runs, MB, Frames etc. processed by each;
    * Calculate the equivalent time to do a theoretical workload (be careful of the method).

    Sub-tasks time/speed can be compared to when they were run by themselves (single thread, single active task). This is complicated by HyperThreading and also multi-threaded apps under test. You can work out the efficiency/scaling of running multiple tasks versus one task at a time.

    You could probably rejig the process priorities to get better 'Splinter Cell' performance.
  • Viditor - Saturday, December 31, 2005 - link

    Scoring needs to be done on a focused window...
    By doing multiple runs with all of the programs running simultaneously, it's possible to extract a speed value for each of the programs in turn, under those conditions. The cumulative number isn't representative of how long it actually took, but it's more of a "score" on the performance under a given set of conditions.
  • Betwon - Saturday, December 31, 2005 - link

    NO! It is the time(spend time) ,not the speed value.
    You see:
    24.8s + 13.7s = 38.5s
    42.8s + 42.2s + 46.6s + 65.9s = 197.5s

    Anandtech's way is wrong.
  • Viditor - Saturday, December 31, 2005 - link

    quote:

    It is the time(spend time), not the speed value

    It's a score value...whether it's stated in time or even an arbitrary number scale matters very little. The values are still justified...
  • Betwon - Saturday, December 31, 2005 - link

    You don't know how to test.
    But you still say it correct.

    We all need the explains from anandtech.
  • Viditor - Saturday, December 31, 2005 - link

    quote:

    You don't know how to test


    Then I better get rid of these pesky Diplomas, eh?
    I'll go tear them up right now...:)
  • Betwon - Saturday, December 31, 2005 - link

    I mean: You don't know how the anandtech go on the tests.
    The way of test.
    What is the data.

    We only need the explain from anandtech, but not from your guess.

    Because you do not know it!
    you are not anandtech!
  • Viditor - Saturday, December 31, 2005 - link

    Thank you for the clarification (does anyone have any sticky tape I could borrow? :)
    What we do know is:
    1. All of the tests were started simultaneously..."To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time"
    2. The 2 ways to measure are: finding out individual times in a multitasking environment (what I think they have done), or producing a batch job (which is what I think you're asking for) and getting a completion time.

    Personally, I think that the former gives us far more usefull information...
    However, neither scenario is more scientifically correct than the other.

Log in

Don't have an account? Sign up now