One of the main features Intel was promoting at the launch of Haswell was TSX – Transactional Synchronization eXtensions. In our analysis, Johan explains that TSX enables the CPU to process a series of traditionally locked instructions on a dataset in a multithreaded environment without locks, allowing each core to potentially violate each other’s shared data. If the series of instructions is computed without this violation, the code passes through at a quicker rate – if an invalid overwrite happens, the code is aborted and takes the locked route instead. All a developer has to do is link in a TSX library and mark the start and end parts of the code.

News coming from Intel’s briefings in Portland last week boil down to an erratum found with the TSX instructions. Tech Report and David Kanter of Real World Technologies are stating that a software developer outside of Intel discovered the erratum through testing, and subsequently Intel has confirmed its existence. While errata are not new (Intel’s E3-1200 v3 Xeon CPUs already have 140 of them), what is interesting is Intel’s response: to push through new microcode to disable TSX entirely. Normally a microcode update would suggest a workaround, but it would seem that this a fundamental silicon issue that cannot be designed around, or intercepted at an OS or firmware/BIOS level.

Intel has had numerous issues similar to this in the past, such as the FDIV bug, the f00f bug and more recently, the P67 B2 SATA issues. In each case, the bug was resolved by a new silicon stepping, with certain issues (like FDIV) requiring a recall, similar to recent issues in the car industry. This time there are no recalls, the feature just gets disabled via a microcode update.

The main focus of TSX is in server applications rather than consumer systems. It was introduced primarily to aid database management and other tools more akin to a server environment, which is reflected in the fact that enthusiast-level consumer CPUs have it disabled (except Devil’s Canyon). Now it will come across as disabled for everyone, including the workstation and server platforms. Intel is indicating that programmers who are working on TSX enabled code can still develop in the environment as they are committed to the technology in the long run.

Overall, this issue affects all of the Haswell processors currently in the market, the upcoming Haswell-E processors and the early Broadwell-Y processors under the Core M branding, which are currently in production. This issue has been found too late in the day to be introduced to these platforms, although we might imagine that the next stepping all around will have a suitable fix. Intel states that its internal designs have already addressed the issue.

Intel is recommending that Xeon users that require TSX enabled code to improve performance should wait until the release of Haswell-EX. This tells us two things about the state of Haswell: for most of the upcoming LGA2011-3 Haswell CPUs, the launch stepping might be the last, and the Haswell-EX CPUs are still being worked on. That being said, if the Haswell-E/EP stepping at launch is not the last one, Intel might not promote the fact – having the fix for TSX could be a selling point for Broadwell-E/EP down the line.

For those that absolutely need TSX, it is being said that TSX can be re-enabled through the BIOS/firmware menu should the motherboard manufacturer decide to expose it to the user. Reading though Intel’s official errata document, we can confirm this:

We are currently asking Intel what the required set of circumstances are to recreate the issue, but the erratum states ‘a complex set of internal timing conditions and system events … may result in unpredictable system behaviour’. There is no word if this means an unrecoverable system state or memory issue, but any issue would not be in the interests of the buyers of Intel’s CPUs who might need it: banks, server farms, governments and scientific institutions.

At the current time there is no road map for when the fix will be in place, and no public date for the Haswell-EX CPU launch.  It might not make sense for Intel to re-release the desktop Haswell-E/EP CPUs, and in order to distinguish them it might be better to give them all new CPU names.  However the issue should certainly be fixed with Haswell-EX and desktop Broadwell onwards, given that Intel confirms they have addressed the issue internally.

Source: Twitter, Tech Report

 

POST A COMMENT

63 Comments

View All Comments

  • ObstinateMuon - Wednesday, August 13, 2014 - link

    It's unrealistic to never connect to the internet. There needs to be a way for the user to control AMT. Reply
  • jhh - Friday, August 15, 2014 - link

    Linux comes with CPU microcode, so when Intel updates the microcode in Linux, and the new Linux kernel is installed, the new microcode is there. I'm expect Windows does the same thing. If you never patch your operating system or BIOS, the TSX is likely to stay enabled. Reply
  • beginner99 - Wednesday, August 13, 2014 - link

    Got to suck for the ones that bought a 4770 instead of K specifically because of this feature. But then how cares? It wasn't available in most Intel CPUs anyway so almost no consumer software will use it for the next decade. And for servers they get replaced in shorter cycles anyway. Reply
  • willis936 - Wednesday, August 13, 2014 - link

    But hey they got a virtualization extension! Reply
  • nevertell - Wednesday, August 13, 2014 - link

    " But then how cares? ... so almost no consumer software will use it..."
    The people who pay the most care.
    Reply
  • TheJian - Wednesday, August 13, 2014 - link

    Will we be looking at a recall here at some point? I mean if you sell me something supposed to do X, and then X gets removed, what then? That isn't "as advertised" anymore is it? Reply
  • psyq321 - Wednesday, August 13, 2014 - link

    Depends on the legal situation.

    I doubt Intel would initiate a recall just because the problem is found. In this case, it would be extremely expensive since Intel would need to replace one year of production and, now, including early batches of Xeon EPs as well.

    However, if the legal pressure mounts (lawsuits filled, etc.), they might do it. But I am sure Intel would try to fight this, or limit the exposure only to certain SKUs for which it can be demonstrated that TSX is in use.

    In any case, unlike FDIV bug which, basically, ruined calculation results and could affect pretty much anybody who was using Pentium CPU, this bug is less critical since it requires running software that uses TSX (not very common yet, at least not on the desktop/mobile where the biggest volume of Haswells were sold so far) and very specific conditions which are, presumably, hard to reproduce.
    Reply
  • Gigaplex - Wednesday, August 13, 2014 - link

    If Sony can get away with removing OtherOS, then Intel shouldn't have too many issues dealing with TSX. Reply
  • name99 - Wednesday, August 13, 2014 - link

    Damn!
    I think we have an answer as to why Broadwell desktops and laptops are delayed...

    I'm guessing they believe (probably correctly) that they can ship Broadwell Y without TSX and no-one will much care. Still not clear why the gap between the laptop and quad-core chips ship dates --- maybe other reasons, or maybe they have reason to believe that the problem is probably more easily fixed on dual-core chips.

    Personally I'd score this (if this explanation is correct) as +1 for my earlier explanation for the delay of Broadwell --- a consequence of the insane complexity of x86 becoming unsustainable and causing Intel real harm. Intel's only official comment is
    ‘a complex set of internal timing conditions and system events … may result in unpredictable system behavior’ and, yes, that COULD be a problem on any CPU --- but it's a whole lot more likely to occur (IMHO) on x86.

    It also adds fuel to my argument that Apple is probably losing patience with Intel. As I've said, the Broadwell delays have screwed up a year of their product plans; if TSX on Haswell is broken, that also delays by at least a year the plans I believe they have to introduce an innovative set of parallel programming constructs into Swift which require HW TM.
    Reply
  • name99 - Wednesday, August 13, 2014 - link

    OK, so on reading further, I see that
    (a) this likely does not affect Apple because (near as I can understand Intel's maze of feature differentiation details) the relevant parts do not ship in any Apple products. So much for that theory. (On the other hand, well done Intel --- clearly the way to get developers to support a feature you expect to charge for/be a differentiator in future is to limit it to a tiny fraction of your chips...)

    (b) in turn this suggests that my theory for this delaying Broadwell is nonsense. Unless Broadwell WAS supposed to have TSX across the laptop and desktop and Intel are still hoping they can get there by delaying a few months, with a plan B of, if necessary, simply launching without the feature?
    Reply

Log in

Don't have an account? Sign up now