Armari Magnetar X64T

Two of the most significant growth parts of my life have come around extracting as much performance out of a piece of hardware as possible. When I sat doing my PhD, in the lab, pumping out CUDA code to run through simulations in minutes instead of months, the onus was on speed – the more you could simulate in a day, the more insights you could get. As an extreme overclocker, it was pushing the silicon to its absolute limit, even if only for a few minutes, that yielded success and tasted triumph.

Today, as an editor, a technology analyst, and a journalist, the core of ‘getting work done’ is longer contingent on the highest performance computing – it’s about how carefully I can test, how I can manage relationships with vendors and experts, and then what content I can create (at least in a written sense here at AnandTech). The only ‘performance’ aspect to my work is how many systems I can test in a given time, and that is usually more limited by space, hardware, or other projects needing attention. Despite this, that desire for fast computing has never gone away. No matter if I’m dealing with laptop responsiveness, or distributing files over the network, having access to performance makes things easier (or at least if they’re wrong, I can identify a mistake faster!).

For a number of commercial verticals that demand high performance, the nature of that performance can directly affect throughput. Whether it’s something like rapid prototyping, or 3D/visual effects, or animation rendering, or medical imaging and processing, or scientific simulations, it’s all a question of throughput and data. This is the market Armari is targeting with the Magnetar X64T.

In our testing, much with the regular non-overclocked Threadripper 3990X, what the Magnetar X64T does well on it does *really* well. The system has been calibrated to handle integer and floating point workloads around that 4.0 GHz all-core frequency, and our thermal/audio analysis shows it to be easily more than suitable for the workstation market it is going into. The cherry on the top is in getting that SPECworkstation 3 world record, beating the high profile OEMs with a nicely built system.

Not only that, but the price is really impressive. Our system came with a Quadro RTX 6000, 256 GB of DDR4, 3TB of PCIe 4.0 storage, a custom 1600W 80PLUS Gold power supply, a custom chassis, and a three year warranty: for $14200 (pre-tax). Just the Threadripper 3990X and the Quadro RTX 6000 together are a base $8000. Add in the other hardware, the custom liquid cooling setup with a custom block and the TRX40 motherboard and 256 GB of high speed memory, with a 3 year warranty and a free checkup/coolant refill, and I suspect the big OEMs will be hard pressed to match the price. Not only that, the equivalent Intel system, using dual 28-core parts, starts easily costing $20k+ before even looking at memory or graphics.

*Update: The $14200 / £10790 price is a special discount for launch during September 2020. 

There are some negative things to highlight, however. For a system that encourages the CPU to draw around twice the power (or more), performance gains for our tests are more in the 30-35% range. The side-effect of overclocking a CPU is that the power efficiency is lower as the processor moves more out of its ideal efficiency range. However one might argue that to match the performance with other hardware requires multiple systems, which has more power draw. Another element will be that this system is limited to 256 GB of non-ECC memory; this is an AMD limitation rather than an Armari limitation, but some of Armari’s customers will no-doubt want similar performance but more memory, and probably ECC memory. And to that end, we also get to a potential performance bottleneck – having 64 cores and 128 threads working at this high speed needs a lot of memory bandwidth. Threadripper can only support 4 channels, and at DDR4-3200 that equates to ~100 GB/s (80-85 GB/s real world), leaving less than 2 GB/s per core. In a number of our tests, we saw this to be a limiting factor.

Something like AMD's Threadripper Pro solves most of these - more memory support, ECC support, eight memory channels. However the overclocking ability would be lost, which for a system like this where the OC performance is what makes it special, removing it would be the equivalent of ripping out its soul. Ideally AMD would need a product that pairs the 8-channel + ECC support with a processor overclock.

All that being said, Armari believes it has built something that its typical customer base will love. It’s a custom super high-performing workstation with a substantial world-record that you can buy, and for the visual effects studios in London that need the horsepower, AMD and Armari has it on tap.

As a small aside, I wondered how well the X64T would do in the ‘extreme’ overclocking leaderboards, where hell fears the liquid nitrogen. The best score I obtained for Cinebench R20 was 31006, which would put it 16th on the all-time leaderboard across all R20 submissions ever – the only way to get a higher score with air or liquid cooling would be to use a dual EPYC server. For Cinebench R15, a score of 12406 gives position #12 in the all-time list. This is somewhat insane for a system someone can just buy.

Here’s a couple our Cinebench R20 runs, in under 15 seconds apiece.

The final question is how to get one (if you were interested). The Armari Magnetar X64T-RD1600G3/FWL is already available for UK and the EU. Armari is in discussions with resellers/distributors in the US, however the warranty arrangement is slightly different. Alongside the X64T, Armari is preparing a rack-mounted 2U version of the X64T with an IPMI-enabled motherboard to come out later in Q4 – something the larger VFX houses have requested en masse. This is set for global certification, and is pending a North American distributor.

Power Consumption, Thermals, and Noise
Comments Locked

96 Comments

View All Comments

  • KillgoreTrout - Wednesday, September 9, 2020 - link

    Intelol
  • close - Wednesday, September 9, 2020 - link

    This shows some awesome performance but the tradeoff is the limited memory capacity. If you don;t need that great. If you do then Threadripper is not the best option.
  • twotwotwo - Wednesday, September 9, 2020 - link

    Hmm, so you're saying AnandTech needs a 3995WX or 2x7742 workstation sample? :)
  • close - Wednesday, September 9, 2020 - link

    A stack of them even :). Thing is memory support doesn't make for a more interesting review, doesn't really change any of the bars there. It's a tick box "supports up to 2TB of RAM".

    Memory support is of the things that makes an otherwise absurdly expensive workstation like the Mac Pro attractive (that and the fact that for whoever needs to stay within that ecosystem the licenses alone probably cost more than a stack of Pros).
  • oleyska - Wednesday, September 9, 2020 - link

    https://www.lenovo.com/no/no/thinkstation-p620

    will probably be able to help.
  • close - Wednesday, September 9, 2020 - link

    The P620 supports up to 512GB of RAM. Generally OK and probably delivers on every other aspect but for those few that need 1.5-2TB of RAM it still wouldn't cut it. For that the go to is usually a Xeon, or EPYC more recently.
  • schujj07 - Wednesday, September 9, 2020 - link

    Remember that Threadripper Pro supports 2TB of RAM in an 8 channel setup. While getting 2TB/socket isn't cheap, it is a possibility.
  • rbanffy - Thursday, September 10, 2020 - link

    I wonder the impact of the 8-channel config on single-threaded workloads. The 256MB of L3 is already quite ample to the point I'm unsure how diminished are the returns at that point.
  • sjerra - Monday, September 28, 2020 - link

    This is my biggest concern and rarely considered or studied in reviews. Design space exploration.
    CAE over many design variations. Hundreds of design variations calculated as much as possible in parallel over the available cores (one core per variation, but each grabbing a slice of the memory). I've tested this on a 7960xe, purposely running it on dual channel and quad channel memory. On dual channel memory, at 12 parallel calculations (so 6 cores/channel) I measured a 46% increase in the calculation time / sample. in quad channel, at 12 parallel calculations (so 3 cores/ channel) I already measured a 30% reduction per calculation. (can anyone explain the worse results for quad channel?)
    Either way, it leaves me to conclude that 64 cores with 4 channel memory for this type of workload is a big no go. Something to keep in mind. I'm now spec'ing a dual processor workstation with two lower core count processors and fully populated memory channels. (either epic (2x32c, 16 channels) or Xeon (2x24c, 12 channels). still deciding).
  • sjerra - Monday, September 28, 2020 - link

    Edit: 30% increase of course.

Log in

Don't have an account? Sign up now