Nehalem's Media Encoding Performance

We had time to run two of our media encoding tests: the DivX 6.8 and x264 workloads.

DivX 6.8 with Xmpeg

Our DivX test is the same one we've run in our regular CPU reviews, we're simply encoding a 1080p MPEG-2 file in DivX. We are using an unconstrained profile, encoding preset of 5 and enhanced multithreading is enabled.

The DivX test is an important one as it doesn't scale well at all beyond four threads, any performance advantage Nehalem has here is entirely due to microarchitectural improvements and not influenced by its ability to work on twice as many threads at once.

DivX 6.8 w/ Xmpeg 5.0.3

Clock for clock, Nehalem is nearly 28% faster than Penryn in our DivX test. Even better is when you put this performance in perspective: at 2.66GHz Nehalem is faster than the fastest Penryn available today the Core 2 Extreme QX9770 running at 3.2GHz. At 3.2GHz, Nehalem will be fast. The improvements in performance here are entirely due to the faster L2 cache and micro-architectural gains; being able to have more micro-ops in flight and improved unaligned cache accesses give us a significant improvement in video encoding performance.

The last time we saw these sorts of performance gains was when Conroe first launched.

x264 Encoding with AutoMKV

Using AutoMKV we compress the same source file we used in our WME test down to 100MB, but with the x264 codec. We used the 2_Pass_Insane_Quality profile:

x264 w/ AutoMKV 

Encoding performance here went through the roof with Nehalem: a clock for clock boost of 44%. Once more, Nehalem at today's artificially limited, modest clock speed is already faster than any Penryn out today. What Intel did to AMD in 2006, it is doing to itself in 2008. Amazing.

A Quick Path to Memory Faster Unaligned Cache Accesses & 3D Rendering Performance
POST A COMMENT

108 Comments

View All Comments

  • Anand Lal Shimpi - Thursday, June 5, 2008 - link

    I really wish I could've turned off hyperthreading :)

    DivX doesn't scale well beyond 4 threads so that's the best benchmark I could run to look at how Nehalem performs when you keep clock speeds and number of threads capped. With a 28% improvement that's at the upper end of what we should expect from Nehalem on average.

    Take care,
    Anand
    Reply
  • SiliconDoc - Monday, July 28, 2008 - link

    Great answer, expalins it to a tee....
    However that leaves myself and I'd bet most of the fans here with not much real world use for 4x4HT ...
    I don't know should we all steal rented DVD's... by re-encoding - only use I know of that might work for the non-connected enduser.
    Not like "folding" is all the rage, they would have to pay me to do their work - especially with all the "power savings" hullaballoo going on in tech.
    That's great, 28% increase, ok...
    So I want it in a 2 core or a single core HT... since that runs everything I do outside the University.
    lol
    I guess the all core useage all the time, will hit sometime....
    Reply
  • Calin - Thursday, June 5, 2008 - link

    First of all, the Hyperthreading in the Pentium 4 line brought at most a 20% or so performance advantage, with a -5% or so at worst. I don't have many reasons just now to think this new Hyperthreading would be vastly different.
    As for the scaling to 8 cores, maybe the scaling was limited due to other issues (latency, interprocessor communication, cache coherency)? It might be possible that DivX on this new platform to increase performance from 4 to 8 cores?
    Reply
  • bcronce - Thursday, June 5, 2008 - link

    Intel claims that the new HT is improved and gives 10%-100% increase. The main issue with the P4 is that it had a double pumped alu that could process 2 integers per clock. This was great for HT since you could do 2 instructions per clock. The problem came with competition for the FPU, which there was only 1. This would cause the 2nd thread in the logical cpu to stall and thread swapping has additional overhead.

    You also run into the issue of L2 cashe thrashing. If you have 2 threads trying to monopolize the FPU and also loading large datasets into cache, you're cache misses go up while each thread is bottlenecked at the fpu.
    Reply
  • techkyle - Thursday, June 5, 2008 - link

    I'd like to know what AMD fans are thinking. As one myself, I'm starting to wonder if I'm going to give in and become an Intel fan.

    Intel implementing the IMC:
    I can only say two things. One, it's about time. Two, THIEF!

    Return of Hyper Threading:
    It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual?

    Another nail in the coffin:
    AMD can provide no competition to high end Core 2 Quad machines. Even if the K10 line can jump up to par performance with the Core architecture, can they really expect to have a Nehalem competitor ready any time remotely close to Intel's launch? AMD can't afford to keep playing the power efficient and price/performance game.

    AMD is going to be in an even worse position than when it was X2 vs Core 2 if they can't pull something out of their sleeves. Barcelona already isn't clock for clock competitive with the Penryn and now we hear that early Nehalems are 20-40% above Core 2?
    If AMD's next processor flops, is it possible for them to drop desktop and server processors and still be a functioning (not to forget profitable) company? It's no longer a race for the performance crown. It's becoming a race to simply survive.
    Reply
  • bcronce - Thursday, June 5, 2008 - link

    "Return of Hyper Threading:
    It seems to me that some sort of intelligence must go in to the design of multi-core hyper threading. If two intensive tasks are given to the processor, the fastest solution would be simple, devote one core to one thread, a second core to the other. With Hyper Threading back with a multi-core twist, what's stopping one thread the first core, first virtual and the second thread on the first core, second virtual? "

    The way Windows lists cpu is first physicals, then logicals. So in task manager the first 4, on a quad core w/ HT, will be your physical cpu and the last 4 will be you logical.

    Windows, by default, will put threads on cpu 1-4 first. It will move threads around to different CPUs if it feels that one is under-taxed and another is way over-taxed.

    Programmers can also force Windows to use differnt cores for each thread. So, a program can tell Windows to lock all threads to the first 4 cpus, which will keep them off of the logical. You could then allow a thread that manages the worker threads to run on the logical cpus. You would then be keeping all your hard data-crunching threads from competing with themselves and let the UI/etc threads take advantage of HT.
    Reply
  • Spoelie - Thursday, June 5, 2008 - link

    Supposedly, Shanghai (the 45nm iteration of Phenom) will be around 20% faster clock per clock over Phenom. This is what AMD said itself some time ago and not verified by an independent source. Judging by current benchmarks, this would put Shanghai at the same or slightly higher performance level of Penryn.

    As such, a very crude estimate is that Shanghai should be as competitive to Nehalem as K8 is to Conroe. Not a very rosy outlook so let's hope this early information is not accurate and AMD can pull something more out of its hat.

    BTW, last I heard is that Bulldozer will come at the 32nm node at the earliest, since the design is supposedly too complex for 45nm. So no instant relieve from that corner. AMD will be fighting a harsh battle the coming years.
    Reply
  • Calin - Thursday, June 5, 2008 - link

    "Two, THIEF"
    AMD's vector processing is 3DNow!, if I remember correctly. Yet, the Intel's versions of it is are touted on its processors instead (SSE2, SSE3). Now who's the thief?
    Reply
  • swaaye - Thursday, June 5, 2008 - link

    MMX? Or, MDMX? Who copied who? Nobody, really. SIMD has been around forever. Reply
  • Retratserif - Thursday, June 5, 2008 - link

    I really would not try and think of it as a Fan base. A majority of the OC'er and Benchmarkers use what works for what they are doing. I have owned and water cooled AMD CPU's. It was great at the time.

    Once Conroes came out the door swung the other way. Technology is like the ocean, it comes in waves. All waves die out. Fortunately we as the user are living it up because of Intel's success. At the same time I truly hope that AMD/ATI does something in response to the high power cpu's. If not we get what ever intel wants to give us.

    There will always be to sides to each story. Since Intel is on top and unscathed, they have time to perfect chips before they go mainstream. Same way we have seen the delay in Yorkfields. There was something seriously wrong, and they had time to address it before it was in the hands of thousands of users.

    Ok, I can say I am a fan.... of what ever works the best for what I do. Price/Performance/Practicallity. You take what you can afford and make it work harder for every penny you put in it.

    One thing you have to keep in mind. AMD is selling more budget CPU's and integrated/onboard video PC's to large companies like Dell and HP. They are moving more aggressively into typical home PC and mobile use. Intel just does not do very well there atm. With Ati in the pocket and being pretty green on power consumption, you can get a good mobile AMD that will do everything a typical PC user will ever need for 2 years at a good price.
    Reply

Log in

Don't have an account? Sign up now