ASRock X99 WS-E/10G Software

The software package from ASRock has gone through a slow evolution since Z77 into a single interface for all of ASRock’s functionality called A-Tuning. With the overclocking and gaming models the interface is slightly adjusted, but the green tint follows the majority of ASRock’s motherboard range. However the newest element to ASRock’s like is the APP Shop. This is essentially ASRock’s curated version of the Play Store/Microsoft Store, with only software ASRock feels is suited to their motherboard range.

Currently the software is fairly limited to Chrome, ASRock’s own software programs and a few Asian free-to-play titles. While offering this method to obtain software is interesting, it does open up a lot of questions. Will there be to-pay titles? What happens if one element of the store is filled with malware?

The APP Shop also offers a BIOS and Drivers section to update the system, but as of yet we have not had it working properly in our testing.

One suggestion has been that this software will only update the packages it downloads. There is another update tool in A-Tuning.

A-Tuning

The initial interface revolves around overclocking, giving the following three modes for automatic frequency adjustments:

Power Saving puts the CPU into a low power mode (1.2 GHz on the 5960X) and requires the system have full CPU load before slowly ramping up the speed over the next 6-10 seconds. This keeps power consumption down, but perhaps increases the responsiveness of the system by not having that initial high single core frequency. Standard mode is the default setting, and Performance mode puts the CPU into high frequency mode for any loading. Performance Mode also opens up the Advanced menu:

Here we have a list of Optimized CPU OC Settings similar to the BIOS and an auto tuning section. There is no list of auto tuning options similar to ASUS, for adjusting the stress tests or the optimum CPU temperature, although I would imagine that all the manufacturers might move in that direction at some point in the future.

The tools menu has a lot of space for ASRock to add in future features, but currently houses the ones they do have. XFast RAM allows the system to partition some of the RAM into a RAMDisk while also providing some RAMCache options:

XFast LAN is a customized interface for cFos, allowing users to prioritize certain applications over others:

Personally I find this interface extremely cumbersome, especially if there are a lot of applications to deal with. ASRock could design something with less white space and a more efficient amalgamation of the A-Tuning visual dimensions to make the process a lot easier. There is access to cFos directly with the Advance Setting button:

The software works with all the network ports on board, including the 10GBase-T ones.

Fast Boot enables options relating to UEFI quick booting by disabling certain options until the OS is loaded:

The Online Management Guard (OMG [!]) has been around for several generations of ASRock motherboards now, and offers the user the ability to disable the networking ports during certain times of the day.

ASRock’s Fan software in the software now mirrors that in the BIOS, giving a better sense for the user in what to adjust:

The FAN Test will detect the RPM for a given fan power, although the graph on the left is misnamed – what ASRock calls ‘FAN Speed (%)’ is actually deceptive because it means ‘Fan Power’ and the user has to do the mathematics in their head to adjust based on the table in the middle. If ASRock was on the ball, they would do the conversion in software and adjust the graph to read ‘Fan Speed (RPM)’ and adjust the axis appropriately from lowest Fan Speed to highest Fan Speed. Note that the high fan speeds above are actually the speeds from my liquid cooling pump.

The Dehumidifier tool in the software is identical to that in the BIOS, allowing the system to enable the fans after the system has been shut off in order to equalize the air temperature inside and outside the case. This has benefits in humid environments where the moisture in the air may condense inside the case during a cool night after a hot day.

The USB Key tool allows users to assign a particular USB drive with login data for particular Windows users. This means that users need not remember a particular long password to log in, and only specified users are allowed to log in. Though lose the USB drive and lose the ability to log in.

One of the newer tools in ASRock’s list is the DISK Health Report. This gives the critical information on the drives attached to the system, allowing SSD users to see the life of their drive. This drive has been at the heart of my motherboard test beds now for almost three years and is still going forward.

The next tab along the top is the OC Tweaker, featuring the more critical options from the BIOS for manual overclocking along with some options to save overclock profiles. The way this is shown in ASRock’s software is quite user-unfriendly, and I would suggest that the next iteration of the software gives an experienced user an easier way to adjust frequencies and voltages without dealing with sliding bars and scrolling.

The System Info tab gives the hardware monitor page by default, giving information about the frequencies, fan speeds and voltages in the system. Most other manufacturers have a way of recording this data, or seeing it plotted on a graph while running a stress test, but ASRock is behind on this front at this time.

The Hardware Monitor section of System Info is identical to that in the BIOS, showing where hardware is installed with a mouse over giving basic details. This is handy for investigating which memory stick, USB or PCIe device is not being detected.

The Live Update tab is, by comparison to MSI, limited. Although I knew there were updates to the platform when I run this software, it failed to find the updated drivers. Here it also does not say how big each download is. If a user is on a limited or slow bandwidth package having to download 300MB of audio or graphics drivers can be detrimental.

While ASRock’s software package is presented in a good way, and there are a number of helpful tools, there are various aspects here that miss the mark in terms of user experience.

BIOS In The Box, Test Setup and Overclocking
Comments Locked

45 Comments

View All Comments

  • Jammrock - Monday, December 15, 2014 - link

    You can achieve 10Gb speeds (~950MB/s-1.08Gb/s real world speeds) on a single point-to-point transfer if you have the right hardware and you know how to configure it. Out-of-the-box...not likely. The following assumes your network hardware is all 10Gb and jumbo frame capable and enabled.

    1. You need a source that can sustain ~1GB/s reads and a destination that can sustain ~1GB/s writes. A couple of high end PCIe SSD cards, RAID'ed SSDs or a RAMdisk can pull it off, and that's about it.

    2. You need a protocol that supports TCP multi-channel. SMB3, when both source and destination are SMB3 capable (Win8+/2012+), does this by default. Multi-threaded FTP can. I think NFS can, but I'm not 100% certain...

    3. You need RSS (Receive Side Scaling), LSO (Large Send/Segment Offloading), TCP window scaling (auto tuning) and TCP Chimney (for Windows), optionally RSC (Receive Side Coalescing), are setup and configured properly.

    Even modern processors cannot handle 10Gb worth of reads on a single processor core, thus RSS needs setup with a minimum of 4 physical processor cores (RSS doesn't work on Hyperthreaded logical cores), possibly 8, depending on processor, to distribute receive load across multiple processors. You can do this via PowerShell (Windows) with the Set-NetAdapterRss cmdlet.

    # example command for a 4 physical core proc w/ Hyerpthreading (0,2,4,6 are physical, 1,3,5,7 are logical....pretty much a rule of thumb)
    Set-NetAdapterRss -Name "<adapter name>" -NumberOfReceiveQueues 4 -BaseProcessorNumber 0 -MaxProcessorNumber 6 -MaxProcessors 4 -Enabled

    LSO is set in the NIC drivers and/or PowerShell. This allows Windows/Linux/whatever to create a large packet (say 64KB-1MB) and let the NIC hardware handle segmenting the data to the MSS value. This lowers processor usage on the host and makes the transfer faster since segmenting is faster in hardware and the OS has to do less work.

    RSC is set in Windows or Linux and on the NIC. This does the opposite of LSO. Small chunks are received by the NIC and made into one large packet that is sent to the OS. Lowers processor overhead on the receive side.

    While TCP Chimney gets a bad rap in the 1Gb world, it shines in the 10Gb world. Set it to Automatic in Windows 8+/2012+ and it will only enable on 10Gb networks under certain circumstances.

    TCP window scaling (auto-tuning in the Windows world) is an absolute must. Without it the TCP windows will never grow large enough to sustain high throughput on a 10Gb connection.

    4. Enable 9K jumbo frames (some people say no, some say yes...really depends on hardware, so test both ways).

    5. Use a 50GB file or larger. You need time for the connection to ramp up before you reach max speeds. A 1GB file is way too small to test a 10Gb connection. To create a dummy file in Windows use fsutil: fsutil file createnew E:\Temp\50GBFile.txt 53687091200

    This will normally get you in the 900 MB/s range on modern hardware and fast storage. LSO and TCP Chimney makes tx faster. RSS/RSC make rx faster. TCP multi-channel and auto-tuning give you 4-8 fast data streams (one for each RSS queue) on a single line. The end result is real world 10Gb data transfers.

    While 1.25GB/s is the theoretical maximum, that is not the real world max. 1.08GB/s is the fastest I've gone on a single data transfer on 10Gb Ethernet. That was between two servers in the same blade chassis (essentially point-to-point with no switching) using RAM disks. You can't really go much faster than that due to protocol overhead and something called bandwidth delay product.
  • Ian Cutress - Monday, December 15, 2014 - link

    Hi Jammrock, I've added a link in the main article to this comment - it is a helpful list of information for sure.

    For some clarification, our VMs were set for RAMDisk-to-RAMDisk operation, but due only having UDIMMs on hand the size of our RAMDisks was limited. Due to our internal use without a switch, not a lot else was changed in the operation, making it more of an out-of-the-box type of test. There might be scope for ASRock to apply some form of integrated software to help optimise the connection. If possible I might farm out this motherboard to Ganesh for use in future NAS reviews, depending on his requirements.
  • staiaoman - Monday, December 15, 2014 - link

    wow. Such a concise summary of what to do in order to achieve high speed network transfers...something so excellent shouldnt just be buried in the comments on Anandtech (although if it has to be in the comments of a site, Anand or STH.com are clearly the right places ;-P). Thanks Jammrock!!
  • Hairs_ - Monday, December 15, 2014 - link

    Excellent comment, but it just underlines what a ridiculously niche product this is.

    Anyone running workloads like this surely isn't doing it using build it yourself equipment over a home office network?

    While this sort of arrive no doubt is full of interesting concepts to research for the reviewer, it doesn't help 99% of builders or upgraders out there.

    Where are the budget/midrange haswell options? Given the fairly stagnant nature of the amd market, what about an article on long term reliability? Both things which actually might be of interest to the majority of buyers.

    Nope, another set of ultra-niche motherboard reviews for those spending several hundred dollars.

    The reviews section on newegg is more use as a resource at this stage.
  • Harald.1080 - Monday, December 15, 2014 - link

    It's not that complicated.
    We set up 2 xeon E5 single socket machines with esxi 5.1, some guests on both machines, a 800€ 10g switch, and as the NAS backup machine a xeon E3 with 2 samsung 840pro in raid0 as fastcache in front of a fast raid5 disk system. NFS. All 3 machines with intel single port 10g. Jumbo frames.

    Linux vm guest A to other hosts vm guest B with ramdiskt 1GB/s from the start.
    Vmware hosts to NAS (the xeon E3 NFS System) with ssd cache: 900 MB/s write. w/o cache: 20 MB/s

    Finally used Vmdk disk tools to copy snapshotted disks for backup. Faster than file copy.

    I think, doing the test on the SAME MACHINE is a bad idea. Interrupt handlers will have a big effect on the results. What about Queues?
  • shodanshok - Tuesday, December 16, 2014 - link

    I had similar experience on two Red Hat 6 boxes using Broadcomm's NetXtreme II BCM57810 10 Gb/s chipset. The two boxes are directly connected by a Cat 6e cable, and the 10GBASE-T adapters are used to synchronize two 12x 15K disks arrays (sequential read > 1.2 GB/s)

    RSS is enabled by default, and so are TCO and the likes. I manually enabled jumbo frames on both interface (9K MTU). Using both netperf and iperf, I recorded ~9.5 Gb/s (1.19 GB/s) on UDP traffic and slightly lower (~9.3 Gb/s) using TCP traffic.

    Jumbo frames really made a big difference. A properly working TCP windows scaling alg is also a must have (I had two 1 Gb/s NICs with very low DRBD throughput - this was due to bad window scaling decision from the linux kernel when using a specific ethernet chip driver).

    Regards.
  • jbm - Saturday, December 20, 2014 - link

    Yes, the configuration is not easy, and you have to be careful (e.g. if you want to use SMB multichannel over several NICs, you need to have them in separate subnets, and you should make sure that the receive queues for the NICs are not on the same CPU cores). Coincidentally, I configured a couple servers for hyper-v at work recently which use Intel 10Gb NICs. With two 10Gb NICs, we get live migration speeds of 2x 9.8Gb/s, so yes - it does work in real life.
  • Daniel Egger - Monday, December 15, 2014 - link

    > The benefits of 10GBase-T outside the data center sound somewhat limited.

    Inside the data center the benefits are even more limited as there's usually no problem running fibre which is easier to handle, takes less volume, uses less power and allows for more flexibility -- heck, it even costs less! No sane person would ever use 10GBase-T in a datacenter.

    The only place where 10GBase-T /might/ make sense is in a building where one has to have cross room connectivity but cannot run fibre; but better hope for a good Cat.7 wiring and have the calibration protocol ready in case you feel the urge to sue someone because it doesn't work reliably...
  • gsvelto - Monday, December 15, 2014 - link

    There's also another aspect that hasn't been covered by the review: the reason why 10GBase-T is so slow when used by a single user (or when dealing with small transfers, e.g. NFS with small files) is that it's latency is *horrible* compared to Direct Attach SFP+. A single hop over an SFP+ link can take as little as 0.3µs while one should expect at least 2µs per 10GBase-T link and it can be higher.

    This is mostly due to the physical encoding (which requires the smallest physical frame transferable to be 400 bytes IIRC) and the heavy DSP processing needed to extract the data bits from the signal. Both per-port price and power are also significantly.

    In short, if you care about latency or small-packet transfers 10GBase-T is not for you. If you can't afford SFP+ then go for aggregated 1GBase-T links, they'll serve you well, give you lower latency and redundancy as the cherry on top.
  • shodanshok - Tuesday, December 16, 2014 - link

    This is very true, but it really depend on the higher-level protocol you want to use over it.

    IP over Ethernet is *not* engineered for latency. Try to ping your localhost (127.0.0.1) address: on RHEL 6.5 x86-64 running on top of a Xeon E5-2650 v2 (8 cores at 2.6 GHz, with performance governor selected, no heavy processes running) RTT times are about 0.010 ms, or about 10 usec. On-way sending is about half, at 5us. Adding 2us is surely significant, but hardly world-changer.

    This is for a localhost connection with a powerful processor and no other load. On a moderately-loaded, identical machine, the localhost RTT latency increase to ~0.03ms, or 15us for one-way connection. RTT for one machine to another is ranging from 0.06ms to 0.1ms, or 30-50us for one way traffic. As you can see, the 2-4us imposed by the 10Base-T encoding/decoding is rapidly fading away.

    IP creators and stack writers know that. They integrated TCP window scaling, Jumbo frames et similar to overcome that very problem. Typically, when very low-latency is needed, some lightweight protocol is used *on top* of these low-latency optical links. Heck, even PCI-E, with its sub-us latency is often too slow for some kind of workload. For example, some T-series SPARC CPU include 10GB Ethernet links rightly into the CPU packages, using dedicated low-latency internal bus, but using classical IP schemes on top of these very fast connection will not give you very high gain over more pedestrian 10Base-T ethernet cards...

    Regards.

Log in

Don't have an account? Sign up now