It Does Multiple Threads Though: The Case for SMT

Despite being 2-issue, it's not always easy to execute two instructions from a single thread in parallel due to data dependencies between the two. Intel's solution to this problem was to enable SMT (Simultaneous Multi-Threading) on Atom (not all models unfortunately) to allow the concurrent execution of up to two threads. Welcome the return of Hyper Threading.

Remember the rule of thumb for power/performance tradeoffs? Intel's decision to enable SMT on Atom was the perfect example of just that. SMT increased power consumption by less than 20% on Atom, however it also yielded a 30 - 50% increase in performance on the in-order core. The decision couldn't be easier.

The Atom has a 32-entry instruction scheduling queue, but when running with SMT enabled each thread has its own 16-entry queue. The scheduler doesn't have to switch between threads each clock, it can do so intelligently, the only limitation is that it can only dispatch 2 ops per clock (since it is a 2-wide machine). If one thread is waiting on data to complete an instruction, on the next clock tick the scheduler can choose to dispatch an op from a separate thread that will hopefully be able to execute.

Making Atom multithreaded made perfect sense from a logical standpoint. The downside to an in-order core is that if there is an instruction that is waiting on data to begin execution the rest of the pipeline stalls while that dependency is resolved. The chances that you'll have two independent instructions from two independent threads both with misses in cache is highly unlikely.

Execution Units

Atom isn't a superwide processor, with an in-order front end and no on-die memory controller it's unlikely that we'll see tremendous instruction throughput. Data dependencies would do a good job of ensuring that tons of execution units remain idle, so Atom's designers did their best to only include the bare minimum when it came to execution units.

There's no dedicated integer multiplier or divider, these functions are shared with the SIMD FP units. There are two SSE units and the scheduler can dispatch either a float or an integer SIMD op to both ports in a given clock.

All of the functional units are 64-bits wide with the exception of supporting full width SIMD integer and single precision FP ADDs.

Return of the CISC: Macro-op Execution Fighting Power Consumption...with a Longer Pipeline?
POST A COMMENT

46 Comments

View All Comments

  • adntaylor - Tuesday, April 08, 2008 - link

    On that chart with price / power, you need to be clearer...

    For price, you show the combined price for CPU + Chipset. For power, you say just the CPU... so 0.65W for the CPU... but you're conveniently ignoring the >2W figure for the chipset!!! This absolutely flatters Intel wherever possible.

    AMD are just as misleading - they describe the Geode LX as "1W" which excludes the non-CPU core parts of the chip (which is an integrated CPU + GMCH)

    Just please be honest - the figures are out there in the Intel datasheets... it takes 10 minutes to check.
    Reply
  • Clauzii - Friday, April 04, 2008 - link

    I still have a PowerVR 4MB addon card, runnung in tandem with a Rage128Pro. Quite a combination w. 15 FPS in Tombraider. Constant(!) 15FPS, that is..

    Amazing what they actually achieved back in 95!
    Reply
  • Clauzii - Friday, April 04, 2008 - link

    Ooops!

    Totally misplaced that. Sorry.
    Reply
  • wimaxltepro - Friday, April 04, 2008 - link

    The Atom represents a shift in processor architecture that is the most dramatic departure for Intel since introduction of x86 processors... the philosophy of how computing itself occurs from centralized processors to distributed processing based on an extension of the popular x86 instruction set.

    The Atom is not about the immediate prospects for the Atom or Nehalem products: we will likely see members of Intel's new product family be used in embedded applications in consumer products and in areas where specialized communications processors are more the rule. While not optimized for use in specific networking applications, the products capitalize on the wide range of support available in IT/Networking to develop common functions that leverage the low cost, low power/processing capability to be used as a common denominator for a wide range of applications.

    Intel has been built on the 'Wintel' architecture: massively integrated chips needed to handle the massively integrated operating systems and applications of Windows (and Apple) environments. The Atom allows migration and broadening out from that architectural motif to a very highly distributed architecture. So, the increased parallelism found in the internal chip architecture is enabling of changes in external system architectures and device applications that go well beyond the typical domain of Intel.. and right into the domain of 'personal wireless broadband' and SDWN, Smart Distributed Wireless broadband Network.

    The decisions about in-order vs. out of-order instruction streams, memory architecture, I/O architecture have been made in light of the broad vision for how computing, networking and, out of hand, how wireless enabled broadband networking including WiMAX will occur. This should be understood for what it represents as a shift in direction for Intel both in response to broad industry shifts and as a trend setting development.
    Reply
  • jtleon - Friday, April 04, 2008 - link

    Thanks to all the flash player ads, etc., a mobile web device will continuously avoid switching to low power states. Thus one could argue that advertising will be carbon footprint enemy of the internet's future. This is already becoming the case for desktop/laptop machines.

    Without such continuous (arguably wasted) consumption of CPU power, then Intel's engineered power management might have a significant impact on the value of the Atom.

    Regards,
    jtleon
    Reply
  • 0WaxMan0 - Friday, April 04, 2008 - link

    I am definatly much impressed and enthused by intels work here, the future looks interesting esp for those of us who like low power cross compatible computing products.

    However I have to point out that a low power modern x86 cpu has allready been done infact 4 years ago with AMD's Geode. While technically much weaker than the Atom and with out any where near the scalability (single core design etc.) the Geode has been available in the same TDP ranges for a good long while. Take a look here http://www.amdboard.com/geode.html">http://www.amdboard.com/geode.html for some old stuff.

    I do hope that the Intel name and hype makes more of an impact than AMD managed.
    Reply
  • whycode - Thursday, April 03, 2008 - link

    Does the TDP quoted include the chipset? Or is that CPU only? Reply
  • IntelUser2000 - Thursday, April 03, 2008 - link

    Anand, the Pentium M does not feature Macro Ops Fusion. Its Core 2 Duo that started Macro Ops Fusion. Reply
  • Anand Lal Shimpi - Thursday, April 03, 2008 - link

    You're correct, I was referencing micro-op fusion. I've made the appropriate correction :)

    Take care,
    Anand
    Reply
  • squito - Wednesday, April 02, 2008 - link

    Am I the only one shocked to see that Poulsbo is a 130nm part... Reply

Log in

Don't have an account? Sign up now