Twice the Cache - 17% Higher Latency

Both the Pentium 4 6xx and the new Extreme Edition share the same core, meaning they also have the same L2 cache.  When Intel first launched Prescott we noticed that in the move to the new architecture that cache latencies went up tremendously.  The increase in cache latencies was to be expected, as one tradeoff of a larger cache is that it takings longer to find and access data.  So when we heard that Intel was moving to a 2MB L2 cache with the 6xx series, we wondered how much slower the cache would get.

First we wanted to confirm that L1 cache latencies stayed the same, and they did at 4 cycles for the new Prescott 2M based core:

   Cachemem L1 Latency  ScienceMark L1 Latency
AMD Athlon 64 3 cycles 3 cycles
Intel Pentium 4 (Northwood) 1 cycle 2 cycles
Intel Pentium 4 (Prescott) 4 cycles 4 cycles
Intel Pentium 4 (Prescott 2M) 4 cycles 4 cycles
Intel Pentium M 3 cycles 3 cycles

Next up, was L2 cache latency.  In our review of the Pentium M processor on the desktop we discovered that its 10 cycle L2 cache was responsible for its solid performance in non "media rich" applications (e.g. office applications, OS performance).  The original Prescott had a 23 cycle L2 cache, and with a 2MB cache the latency has gone up to 27 cycles:

   Cachemem L2 Latency  ScienceMark L2 Latency
AMD Athlon 64 17 cycles 18 cycles
Intel Pentium 4 (Northwood) 16 cycles 16 cycles
Intel Pentium 4 (Prescott) 23 cycles 23 cycles
Intel Pentium 4 (Prescott 2M) 27 cycles 27 cycles
Intel Pentium M 10 cycles 10 cycles

While we're talking about "only" 4 cycles, at 3.6GHz that's 17% longer to access data from L2 cache.  Given Prescott's extremely lengthy pipeline, a 17% increase in L2 cache latency is not going to help minimize the downsides of such a long pipeline.  Also keep in mind that the only architectural change here is a larger L2 cache, so none of the normal tricks to help hide memory latencies are expanded upon in the new Pentium 4. 

What Intel is counting on is that the increase in hit rate provided by a 100% larger cache will outshine the 17% longer access to L2 cache.  Did Intel make the right bet?  In order to find out we took the new Pentium 4 660 (3.6GHz - 2MB L2) and compared it to the old Pentium 4 560 (3.6GHz - 1MB L2), with all other variables the same, let's see how much of an impact the extra megabyte of cache has in the real world.

In the business category, we see the added cache paying off a little. SYSMark shows good improvement in the document creation portion of its tests, while the Business Winstone makes some very good gains. Worldbench shows web browsing with Mozilla to have improved a good bit while our compression test and the ACDSee test show a loss in performance. These losses generally indicate areas where the test is more dependant on latency than cache hit rate. On the content creation side, adding Windows Media Encoder to the Mozilla test improves performance more than the individual Mozilla test. This is likely due to the fact that the large cache keeps Mozilla's data from being kicked out while Windows Media Encoder is working.

On the gaming front, Doom 3 is the only test we saw with any performance improvement. And the only other application to show a significant performance gain is Maya with more than a 43% gain. The huge gain in performance under Maya is likely a result of 1MB of cache being too small to fit models in while 2MB is enough. This seems to be a case where the test is very bandwidth sensitive rather than latency sensitive. Dropping most (if not all) of the data being worked on into the L2 cache offers a program a very large boost in apparent bandwidth.

As we can see, the unfortunate truth for performance on the 600 series is that most consumer data sets can fit into a 1MB cache just fine. The added cache does seem to help with multitasking from our limited investigation of the subject. The more threads that hit memory aggressively, the better chance we have of seeing a benefit from the 2MB cache. This is because less data from each thread will be kicked out of the cache, resulting in fewer pipeline stalls.

Unfortunately, most usage models that are a good fit for the 600 series are server and workstation workloads. Streaming data (using or encoding media), games, and most other consumer applications don't have the lots of big data requirement that can really separate the performance of the 1MB and 2MB parts.

As we've provided this chart and gone through the general impact of the benchmarks on Intel's new 600 line, we won't include analysis on the pages with our benchmark data. For those who are interested in a deeper look at the numbers and performance of all 5 new parts, graphs of each benchmark are included later in this article.

 Impact of L2 Cache Size on Performance (1MB vs. 2MB - 3.60GHz)
   1MB L2  2MB L2  2MB Performance Advantage
Business/General Use Performance
Business Winstone 2004 21.4 24.2 13.0%
SYSMark 2004 - Communication 137 137 0.0%
SYSMark 2004 - Document Creation 201 218 8.4%
SYSMark 2004 - Data Analysis 184 186 1.0%
Microsoft Office XP with SP-2 522 520 0.3%
Mozilla 1.4 459 422 8.0%
ACD Systems ACDSee PowerPack 5.0 547 558 -2.0%
Ahead Software Nero Express 6.0.0.3 545 550 -0.9%
WinZip Computing WinZip 8.1 412 411 0.2%
WinRAR 479 469 -2.0%
Multitasking Content Creation Performance
Content Creation Winstone 2004 32.7 33.9 3.7%
SYSMark 2004 - 3D Creation 231 231 0.0%
SYSMark 2004 - 2D Creation 288 279 -3.1%
SYSMark 2004 - Web Publication 206 203 -1.0%
Mozilla and Windows Media Encoder 676 601 11.1%
Video/Photo Creation & Editing
Adobe Photoshop 7.0.1 342 342 0.0%
Adobe Premiere 6.5 461 468 -1.5%
Roxio VideoWave Movie Creator 1.5 287 276 3.8%
Audio/Video Encoding
MusicMatch Jukebox 7.10 484 470 2.9%
DivX Encoding 55.3 55.4 0.2%
XviD Encoding 33.9 33.4 -1.4%
Microsoft Windows Media Encoder 9.0 2.57 2.56 -0.3%
Gaming
Doom 3 84.6 88.6 4.7%
UT2004 59.3 60.4 1.9%
Wolfenstein: ET 97.2 95.5 -1.7%
3D Rendering
Discreet 3dsmax 5.1 (DX) 268 266 0.7%
Discreet 3dsmax 5.1 (OGL) 327 329 -0.6%
SPECapc 3dsmax 6 1.64 1.62 -1.1%
Professional 3D
SPECviewperf 8 - 3dsmax-03 17.04 17.11 0.4%
SPECviewperf 8 - catia-01 13.87 13.57 -2.2%
SPECviewperf 8 - light-07 14.3 13.83 -3.3%
SPECviewperf 8 - maya-01 13.12 18.85 43.7%
SPECviewperf 8 - proe-03 16.7 16.5 -1.2%
SPECviewperf 8 - sw-01 13.09 13.33 1.8%
SPECviewperf 8 - ugs-04 15.31 13.82 -9.7%


Index An Interesting Observation: Prescott 2M's Die
POST A COMMENT

71 Comments

View All Comments

  • L3p3rM355i4h - Monday, February 21, 2005 - link

    #30 90mm SOI= lower wattage. Reply
  • Brian23 - Monday, February 21, 2005 - link

    Look at the power consumption graph for the A64. Why is the 3500 winchester doing so much better than the 3000 and 3200 winchesters? Reply
  • L3p3rM355i4h - Monday, February 21, 2005 - link

    #28 saw almost the same thing at PCPER too. Reply
  • Aenslead - Monday, February 21, 2005 - link

    I could ALMOST swear I saw the VERY same bencmarks last night @ xbit labs... fancy that. Reply
  • bldckstark - Monday, February 21, 2005 - link

    227 WATTS!!... My daughter has a crayon maker. It uses a 60W light bulb in a plastic box to melt 3 crayons and pours them into a mold. It melts the wax in about 5 minutes. If I buy a P4 I can melt 11.35 crayons at once. It uses 3.78 times as much energy as is necessary to light my computer room. This is not efficient use of resources. Reply
  • L3p3rM355i4h - Monday, February 21, 2005 - link

    sorry to go off topic, but are the forums down or does this terminal suck? Reply
  • LoneWolf15 - Monday, February 21, 2005 - link

    From a price/performance standpoint, I can't see many good reasons to buy a P4 six series, and in many cases, a five series either (exceptions being high-end 3D rendering apps and heavy video encoding). Not just because of what price of processor (which doesn't seem to net a huge speed increase) but the increased power draw means a heavier power supply, plus more expensive cooling. Compared to the lower power draw of the Athlon 64 CPU's, as well as a lower price at least at the entry-to-mid level CPU's, I think Intel really needs to go back to basics and create a new CPU architecture. Reply
  • mlittl3 - Monday, February 21, 2005 - link

    Okay, I have an addition to my last comment made about the Extreme Edition being a scam. I did some calculations that were left out my anandtech to see if the 3.73EE is truely better than the 3.46EE.

    Everyone knows the differences between the two processors. The 3.73EE has an 8% increase in CPU speed, less total cache overall but 4x the lower latency L2 cache when compared to L3 cache (the XD-bit and EM64T are also added but that will not effect performance at all with 32-bit OS).

    With these added features, the 3.73EE should be better than the 3.46EE especially since the Prescott core is supposed to scale well with clock speed versus the Gallatin/Northwood and the 1066 MHz FSB is supposed to give better performance at higher clock speeds. Well, let's look at the numbers.

    Using Anandtech's results, I calculated the % difference between the two processors. They varied between -10 (worse) and 30 % (better). I then added up all the scores (I took the inverse of the less is better scores) and divided them by the introduction price ($999) and the MHz of each processor. Here are the results.

    Performance per $:
    3.46EE - 20.69
    3.73EE - 20.61

    Performance per MHz:
    3.46EE - 5.96
    3.73EE - 5.52

    You can do the calculations yourself by using all the benchmark numbers from the two extreme edition CPUs in the review. As you can see, the 3.73EE is worse on a per dollar and per MHz basis compared to the 3.46EE (even though the margin is small, it is still worse for the higher clocked CPU). The Prescott core is a failure IMHO. The 3.73EE is a total scam and the extreme edition processors in general are poor performers. Remember these were released just to offset the marketing of AMD FX processors when Intel got wind of them 1.5 years ago. I don't think Intel was ever going to release them and they keep getting worse and worse.

    A scam alert should be issued. Buyer beware!
    Reply
  • L3p3rM355i4h - Monday, February 21, 2005 - link

    Ho Hum, intel is still stagnating. 227 watts load? Jeezus, thats incredible. Reply
  • mlittl3 - Monday, February 21, 2005 - link

    Just a quick, possible correction.

    I don't know if you meant to or not, but the comparison of the Prescott vs. Prescott 2M table is missing Windows Media Creator HD and Visual Studio results.
    Reply

Log in

Don't have an account? Sign up now