It's not often these days that we see AMD's name attached to new x86 instructions. While monumental to the creation of the AMD64/x86-64 standard for 64bit processors and the No-eXecute bit for buffer overflow protection, we haven't otherwise heard much out of AMD lately. It's been Intel driving new instruction sets, with standards such as SSE3, SSE4, and VT for virtualization (with AMD's own implementation, AMD-v following).

With AMD's impressive track record in recent years, we're all ears when they are announcing something new about x86 instructions. However with that said, what we're looking at today is not a new instruction set for high performance instructions, or a new safety measure. In fact AMD's proposal doesn't even directly apply to most computer users, they'll likely never use these instructions. What AMD is proposing is to our knowledge a first for the x86 instruction set: a set of instructions solely for developers.

As part of the newly-launched Extensions for Software Parallelism initiative, AMD is making its first move by announcing the Lightweight Profiling Proposal(LWP), a proposed standard for adding hardware and instructions to help fine tune their code and improve application performance by profiling the performance of their applications. With this addition to the x86 instruction set, AMD is specifically targeting managed code environments and developers producing multithreaded applications, two of the biggest areas of growth in the software industry. AMD believes that these groups stand to benefit the most from LWP given the unique difficulties faced by those two fields.

Although the hardware to come from this proposal is still some time off, we're in a position today to talk about some the benefits that can be extracted from such hardware, and some of the hurdles in bringing about such a change. Profilers can be extremely powerful tools, with special purpose platforms such as embedded computers and video game consoles having used such hardware and software to squeeze an amazing amount of performance out of what can be very limited hardware. In some ways what AMD is proposing is simply bringing the PC up to par with other systems, and what they're proposing is simply simple. But never the less, the potential performance can be huge.

So what is profiling, why does it have such a potential to improve performance, and how does AMD intend to improve profiling? Let's take a look.

The Importance of Profiling
Comments Locked

11 Comments

View All Comments

  • MadBoris - Thursday, August 16, 2007 - link

    I can't help but wonder what advantages this would have over Intel's existing open source TBB. Threading Building Blocks 2.0 seems to be a pretty robust runtime library to be able to do the hard work covering some of the most difficult things to manage currently...

    Efficient parallelism.
    Automatic load balancing.
    Easier thread management.
    Help with concurrency issues.

    Plus Intels companion tools are awesome. Expensive but pretty nice.
    TBB is also open source and multiplatform friendly.
    I'm also curious to learn more about real world TBB experiences.

    Anyway, it's good to hear more work is being done on this stuff and different approaches will always help.

    We have a long way to go because as it is Quad cores and above will not really be leveraged like everyone assumes they will be. It will take more than market saturation of multicores for them to be used efficiently. Unless an application is manipulating data streams, which has an easily splittable workload, like encoding, rendering, compression, etc. then you really won't see the type of workload granularity from other types of applications in truly leveraging multicores. Core's beyond 2 will offer ever decreasing negligible results for quite a while in mainstream applications without some advances.

    I hope to hear more about advances on this in future articles because as it is, it seems quad will be somewhat of a wall for us of any real beneficial performance, anything above that will really just serve as a good heater unless it's for a very specific application.
  • Ryan Smith - Thursday, August 16, 2007 - link

    LWP isn't intended to be competition for TBB, rather it augments it (at least as much as TBB is beneficial on an AMD chip). TBB is compiler and library help, an essential part of extracting maximum performance, but it doesn't include anything as far as application profiling goes. LWP is the final link as far as that goes, once TBB has taken you as far as it can, you break out profilers and start looking at what your code is doing that could be causing any more performance bottlenecks.
  • JumpingJack - Sunday, August 19, 2007 - link

    I think one of the points he is making is that this is really not a methodology or instruction set change that is used for multithreading, rather a HW based profiler which is only useful during product development.

    Profiling is not new, so while AMD is proposing some unique intructions to get a realtime peak of the architectural state (profiling), it does not directly speed up multithreading by some new or novel algorithm. Basically, what AMD is proposing is pretty much already used at a software level to an extent.
  • MadBoris - Sunday, August 19, 2007 - link

    Yeah, while profiling is necessary especially for multithreaded apps for optimizing and finding overhead, stalls, IPC issues, synchronized contention, etc., I'm far more interested in the core issues of leveraging cores for better parallel execution. Rather than a better topical ointment we need to address the core cause. Initially I had thought AMD was doing more than just profiling with the extra architecture, but that was due to my silly skimming.

    I also mentioned TBB because I think it also may be worthy of an article someday, it is an intriguing route to the core problems developers are facing in multithreading in the future. I'm curious how well it works, from engineers feedback, something Anandtech would have access to. Making a thread and throwing it to another core today is all well and good in removing thread contention for primary threads and giving other threads more CPU budget, but lower level multithreading and parallel advantages are currently limited for most types of apps due to inherent limitations. That's the real core issue in multithreading, improving scaling and fully leveraging several cores.

    I'm interested to see what AMD's methods may bring in improved performance and maybe ways to gather a few more tidbits of data, much still depends on the software element and how it will present the data and how robust it will be during development time. Who knows, maybe they are putting the horse out first then bringing in the cart next, it would be the right order of things. Having a HW profiler in place first would make it easier to produce and test, a HW based multithreading optimizing approach, which would be stellar someday down the road.
  • MadBoris - Thursday, August 16, 2007 - link

    Maybe LWP can lessen overhead w/ HW and even be more precise, as you say. Although their is a good bit you can retrieve in software, so it will be interesting to see what it is I am missing, maybe some deeper level CPU cache usage metrics maybe beneficial. Time will tell.
  • yyrkoon - Friday, August 17, 2007 - link

    Dude, do you even know what software profiling *is* ? I have no idea what TBB *is*, but I can tell you with 99.9% certainty is is not even remotely related(except perhaps that it is a set of instructions).

    Direct quote from wikipedia:

    quote:

    A profiler is a performance analysis tool that measures the behavior of a program as it runs, particularly the frequency and duration of function calls. The output is a stream of recorded events (a trace) or a statistical summary of the events observed (a profile). Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, operating system hooks, and performance counters. The usage of profilers is called out in the performance engineering process.


    http://en.wikipedia.org/wiki/Performance_analysis">http://en.wikipedia.org/wiki/Performance_analysis
  • MadBoris - Friday, August 17, 2007 - link

    Thank you for the definition. I just finished saying I have used Intels profilers in profiling applications, so I may know.

    As to TBB, it's a runtime tool for making threaded applications more efficient and easier to produce, not a profiler. I initially thought AMD's approach was going to be more than just profiling, but that's what I get for skimming articles.
  • MadBoris - Thursday, August 16, 2007 - link

    And of course the Intel Threading tools which work in cooperation with TBB are very robust as well, which have been around for a while. I've used the Thread Checker and Threading Profiler and I like them pretty well, pretty impressive actually, although you can incur some serious overhead with the profilers if your not careful.
    http://www.intel.com/cd/software/products/asmo-na/...">http://www.intel.com/cd/software/products/asmo-na/...

    The more the merrier, just was wondering what LWP was adding beyond what I was already familiar with on the Intel side.
  • DigitalFreak - Thursday, August 16, 2007 - link

    Unless Microsoft puts the boot to Intel again, I doubt Intel will jump on board. They have too much of a "nowhere but here" mentality.
  • DeepThought86 - Thursday, August 16, 2007 - link

    I think this article could have been written with 30% less words

Log in

Don't have an account? Sign up now