Even More Tweaks

Translation Lookaside Buffers, TLBs for short, are used to cache what virtual addresses map to physical memory locations in a system. TLB hit rates are usually quite high but as programs get larger and more robust with their memory footprint, microprocessor designers generally have to tinker with TLB sizes to accommodate. With K8 AMD increased the size of its TLBs over K7, and with Barcelona AMD is repeating the process once more.

Barcelona's TLBs are slightly larger than K8's, but they now include support for 1G pages which are useful for database applications and virtualized workloads. AMD also introduced a 128 entry 2M L2 TLB with Barcelona, once again to help cope with newer programs using larger page sizes. The TLB improvements to Barcelona won't make any sort of tangible impact on desktop applications, but enterprise performance should improve in server applications with large memory footprints.

When Intel introduced its second Pentium M, codenamed Dothan, one of the enhancements made was a lower integer divide latency. Although details at the time are slim, AMD has indicated that it has moved to reduce integer divide latency in Barcelona as well. We're not sure if the changes implemented are similar in any way to what Intel did with Dothan, but don't expect the performance improvement to be vastly noticeable in real world applications. It's one of those tweaks that will add up to overall more efficient execution but not one that's going to give you double digit performance gains across the board.

In another attempt to effectively "widen" Barcelona without committing a significant amount of transistors to doing so, AMD took a couple of instructions that were microcoded and turned them into fastpath decode instructions. A microcoded instruction takes significantly longer to decode than an instruction able to go through one of the core's fastpath decoders. CALL and RET-Imm instructions are now fastpath, which is a part of Barcelona's sideband stack optimization enhancements. MOVs from SSE registers to integer registers are now fastpath as well.

While on the topic of instructions, AMD also introduced a few new extensions to its ISA with Barcelona. There are two new bit manipulation instructions: LZCNT and POPCNT. Leading Zero Count (LZCNT) counts the number of leading zeros in an op, while Pop Count counts the leading 1s in an op. Both of these instructions are targeted at cryptography applications.

AMD also introduced four new SSE extensions: EXTRQ/INSERTQ, MOVNTSD/MOVNTSS. The first two extensions are mask and shift operations combined into a single instruction, while the latter two are scalar streaming stores (streaming stores that can be done on scalar operands). We may see some of these same instructions included in Penryn and other future Intel processors.

Stacks and Loads of Optimizations A Faster Memory Controller
POST A COMMENT

83 Comments

View All Comments

  • agaelebe - Friday, March 02, 2007 - link

    Wow! A lot of dicussion in here.
    And, by the way, very interesting article.

    I'm a software engineer from Brazil and I'm planning to change my PC this year.
    I've bem using AMD processors since the K6.
    Today I've a XP Mobile 2500+(@2.2ghz), 1gb ram, 200gb and an AGP 6600GT
    My PC is not very slow, but I'm thinking in going dual core to speed things up(office applications, web development and some games).
    I can run some of the newest games, but not in high graphics.
    I expect that my PC can run C&C 3 (Already run the demo in 1024 medium, but have some craches although it's not running it slow)

    So, today I'm thinking in 3 options:
    1) Stay with this computer and wait until AMD launchs it's new architecture (I pretend to go with an average price Kuma)

    2) Go with Intel Core 2 Duo (e6300 or e6400). They're not expensive and for games I can easily make an overclock and gain more performance.

    3) Buy a good AM2 board and a cheap Atlhon X2 (3600) and wait new AMD processors and then change only the processor.

    Here in Brazil the taxes are to high, so I'm planning in buying a PC with these specs:

    - CORE 2 Duo e6300/6400 or X2 3600/3800
    - mid-tier motherboard (
    - 2 x 1gb DDR 800 4-4-4-12
    - 2 x 250 gb
    - X1950pro 256 or 512
    - 500watts power

    So the prices are below:

    e6300 box US$ 300 (same price for a X2 4200+ box)

    x23800 box US$ 220

    motherboard: US$ 220

    ram: US$ 400

    video: US$ 450

    DVD: US$ 70

    case: US$ 150

    HDs : US$ 250

    Power: us$ 180

    So I plan to spent about 2000 dollars (Sadly, I can buy this same PC in US for the half of the price).

    My new PC should spent not to much power so I can leave it turned onall day long(max 150watts on iddle without monitor), otherwise I'll keep my old computer turned on just for downloding stuff)

    So, If someone has an opinion, I'd like to "hear" it. You can give another options to, or make some comments about the specs I'm choosing now.

    I had Pentium 75 and after that only AMD CPUs... Should know I surrender to the Core 2 Duo or believe that AMD can really beat it until the end of 2008?

    And thanks for the cooperation and patience.
    Reply
  • Zebo - Saturday, March 03, 2007 - link

    Athlon 64 AM2's arnt exactly slow so if you're an AMD fan just get one..like a 3800+ or 3600+ and overclock it. It will be at least 4x faster than what you have now and accept K8L Agena core later. It will be cheaper than C2D by about $50 USD and You'll also pay cheap for a GeForce 6100 Motherboard which is only $50 USD. Overall expect the the AM2 system to be about $100 USD cheaper.

    Keep in mind that C2D is 20% faster clock for clock in most apps so it's not exactly a quantum leap here getting a C2D.. Gap gets a lot larger when overclocking since C2D's overclcok higher like 3.2Ghz is common on air vs. only 2.8Ghz for AM2, so, at the end of the day a C2D setup is able to be about 40% faster over most benchmarks. That is getting significant and why enthusiasts are buying C2D's.
    Reply
  • agaelebe - Friday, March 02, 2007 - link

    And,as always, sorry with the errors and not so good writing... Reply
  • Kiijibari - Thursday, March 01, 2007 - link

    Hi,

    never heard of of that before, does anybody know what it is ?
    So far I see 2 pad areas at the DIE photo, therefore I assume that it would be also 2 interfaces, e.g. x8 PCIe like Sun uses ?

    bb

    Kiijibari
    Reply
  • mino - Friday, March 02, 2007 - link

    It should be some management/coodrination stuff (can-t remember the name of that bus).
    Every northbridge and CPU has that.
    Reply
  • davecason - Thursday, March 01, 2007 - link

    Anand,

    Great article! I know it took a lot of time and I wanted you to know I really appreciate your effort. It is the kind of article that keeps me coming back to your site.

    -Dave
    Reply
  • yyrkoon - Thursday, March 01, 2007 - link

    quote:

    On average, about 1/3 of all instructions in a program end up being loads, thus if you can improve load performance you can generally impact overall application performance pretty significantly.


    Page 5, paragraph 4 'pretty significantly'. Well is it, or is it not it ?

    http://www.wikihow.com/Avoid-Colloquial-%28Informa...">http://www.wikihow.com/Avoid-Colloquial-%28Informa...

    Aside from my gripe concerning writing style, good article :)
    Reply
  • trisweb2 - Friday, March 16, 2007 - link

    Usually we criticize writing style based on a whole experience... obviously Anand is one of the best technical review writers on the Internet; if you bother to read his articles more fully perhaps you'd realize that. The colloquial writing sometimes brings it to a more personal level that a reader can better relate to and understand -- it works especially well in this case, where it's a future design, we really don't know how it's going to perform. That he can guess and say "pretty significantly" tells me he understands the uncertainty of the situation, and the language communicates that fact perfectly well. It would be more confusing if he said it would impact performance "significantly" as you want him to, as that would imply that he was more certain than he might actually have been.

    Masters are allowed to bend the rules, and Anand is one, so lay off.
    Reply
  • yyrkoon - Thursday, March 01, 2007 - link

    *Is it, or is it not*

    /me hangs head in shame
    Reply
  • baronzemo78 - Thursday, March 01, 2007 - link

    Any rough guess as to how Barcelona will compete with Core2 in gaming? Many articles have shown how Core2 gets you a slight FPS boost in games that aren't graphics card limited. I'm curious how Barcelona will fit in with the overall picture of DX10 cards like G80 and R600. Reply

Log in

Don't have an account? Sign up now