Benchmarking with iPerf3 and ipgen

The iPerf3 tool serves as a quick check to ensure that the network link is up and running close to expectations. As a simple passthrough device, we expect the Supermicro SuperServer E302-9D to achieve line-rates for 10G traffic across various interfaces. We do expect the rates to go down as more processing is added in the form of firewalling and NAT. Towards this, each tested mode is started off with an iPerf3 test. Following that, we perform the sweep of various packet sizes with the pkt-gen tool. In both cases, each 10G interface set is tested separately, followed by both sets simultaneously. After both sets of experiments, the L3 forwarding test using ipgen is performed from each of the three machines in the test setup. This section discusses only the iPerf3 and ipgen results. The former includes IPsec evaluation also.

iPerf3

Commands are executed on the source, sink, and DUT using the Conductor python package described in the testing methodology section. The setup steps on the DUT for each mode were described in the previous section. Only the source and sink [Run] phases are described here.

On the sink side, two servers are spawned out and terminated after 3 minutes. The spawn and timeout refer to keywords specified by the Conductor package.
spawn0: cpuset -l 1,2 iperf3 -s -B 172.16.10.2 -p 5201
spawn1: cpuset -l 3,4 iperf3 -s -B 172.16.11.2 -p 5201
timeout180: sleep 180
step3: killall iperf3

On the source side, the first link is evaluated for 30s, followed by the second link. In the third iteration, the tests are spawned off for both links simultaneously.
spawn1: cpuset -l 1,2 iperf3 -c 172.16.10.2 -B 172.16.0.2 -P 4 -O 5 -t 35 --logfile /tmp/.1c.0.txt
timeout45: sleep 45
spawn3: cpuset -l 1,2 iperf3 -c 172.16.11.2 -B 172.16.1.2 -P 4 -O 5 -t 35 --logfile /tmp/.1c.1.txt
timeout46: sleep 45
spawn5: cpuset -l 1,2 iperf3 -c 172.16.10.2 -B 172.16.0.2 -P 4 -O 5 -t 35 --logfile /tmp/.2c.0.txt
spawn6: cpuset -l 3,4 iperf3 -c 172.16.11.2 -B 172.16.1.2 -P 4 -O 5 -t 35 --logfile /tmp/.2c.1.txt

The table below presents the bandwidth numbers obtained in various modes. The interfaces specified in the headers refer to the ones in the DUT.

Supermicro E302-9D as pfSense Firewall - iPerf3 Benchmark (Gbps)
Mode Single Stream Dual Stream
  ixl2 - ixl0 ixl3 - ixl1 ixl2 - ixl0 ixl3 - ixl1
Router 9.40 9.41 8.77 8.67
PF (No Filters) 6.99 6.96 6.50 6.98
PF (Default Ruleset) 5.43 5.81 4.22 5.69
PF (NAT Mode) 7.89 6.99 4.49 6.06

Line-rates are obtained for the plain router mode. Enabling the packet filtering lowers the performance, as expected - with more rules resulting in slightly lower performance. The NAT mode doesn't exhibit much performance loss compared to the plain PF mode, but, multiple streams on different interfaces needing NAT at the same time does bring the performance more compared to the PF (No Filters) mode.

IPsec Testing using iPerf3

IPsec testing also involves a similar set of scripts, except that only the ixl2 and ixl3 interfaces of the DUT are involved. The table below presents the iPerf3 bandwidth numbers for various tested combinations of encryption and authentication algorithms. The running of the iPerf3 server on the DUT itself may result in lower than actual performance - however, the comparison against the baseline case under similar conditions can still be made.

Supermicro E302-9D as pfSense Firewall - IPsec iPerf3 Benchmark (Mbps)
Algorithm Single Stream Dual Stream
  (Src)ixl2 - (DUT)ixl2 (Src)ixl3 - (DUT)ixl3 (Src)ixl2 - (DUT)ixl2 (Src)ixl3 - (DUT)ixl3
Baseline (No IPsec) 5140 7450 3020 4880
3des-hmac-md5 119 118 61.3 75.2
aes-cbc-sha 374 373 236 238
aes-hmac-sha2-256 377 376 235 212
aes-hmac-sha2-512 433 430 259 280

The above numbers are low compared to the line-rate, but closely match the results uploaded to the repository specified in the the AsiaBSDCon 2015 network performance evaluation paper for a much more powerful system. Given the 60W TDP nature of the SoC and the passively cooled configuration, coupled with the absence of QuickAssist in the SKU, the numbers are passable. It must also be noted that this is essentially an out-of-the-box benchmark number, and optimizations could extract more performance out of the system (an interesting endeavour for the homelab enthusiast).

L3 Forwarding Test with ipgen

The ipgen L3 forwarding test is executed on a single machine with two of its interfaces connected to the DUT. In the evaluation testbed, this condition is satisfied by the source, sink, and the conductor as well. The ipgen tool supports scripting of a sweep of packet and transmission bandwidth combinations. The script is provided to the tool using a command of the following form:
ipgen -T ${TxIntf},${TxGatewayIP},${TxSubnet} -R ${RxIntf},${RxGatewayIP},${RxSubnet} -S $ScriptToRun -L $LogFN
where the arguments refer to the transmitter interface, the IP of the gateway to which the interface connects, and its subnet specifications, along with a similar set for the receiver interface.

L3 Forwarding Benchmark (ipgen) with the Xeon D-2123IT (Source)

L3 Forwarding Benchmark (ipgen) with the Xeon D-1540 (Sink)

 

L3 Forwarding Benchmark (ipgen) with the AMD A10 Micro-6700T (Conductor)

 

Twelve distinct runs were processed, once in each of the four tested modes for each of the machines connected to the DUT. As mentioned earlier, these numbers are likely limited by the capabilities of the source (like in the case of the Compulab fitlet-XA10-LAN), but the other two machines present some interesting results that corraborate with results observed in the iPerf3 and pkt-gen benchmarks. In general, increasing the number of rules seems to noticeably affect the performance. Enabling NAT, on the other hand, doesn't have such a discernible impact compared to other configurations with similar number of rules to process.

pfSense Configuration for Benchmarking Packet Processing Benchmarks with pkt-gen
Comments Locked

34 Comments

View All Comments

  • eastcoast_pete - Tuesday, July 28, 2020 - link

    Thanks, interesting review! Might be (partially) my ignorance of the design process, but wouldn't it be better from a thermal perspective to use the case, especially the top part of the housing directly as heat sink? The current setup transfers the heat to the inside space of the unit and then relies on passive con
    vection or radiation to dispose of the heat. Not surprised that it gets really toasty in there.
  • DanNeely - Tuesday, July 28, 2020 - link

    From a thermal standpoint yes - if everything is assembled perfectly. With that design though, you'd need to screw attach the heat sink to the CPU via screws from below, and remove/reattach it from the CPU every time you open the case up. This setup allows the heatsink to be semi-permanently attached to the CPU like in a conventional install.

    You're also mistaken about it relying on passive heat transfer, the top of the case has some large thermal pads that will make contact with the tops of the heat sinks. (They're the white stuff on the inside of the lid in the first gallery photo; made slightly confusing by the lid being rotated 180 from the mobo.) Because of the larger contact area and lower peak heat concentration levels thermal pads are much less finicy about being pulled apart and slapped together than the TIM between a chip and the heatsink base.
  • Lindegren - Tuesday, July 28, 2020 - link

    Could be Solved by having the CPU on the opposite side og the board
  • close - Wednesday, July 29, 2020 - link

    Lower power designs do that quite often. The MoBo is flipped so it faces down, the CPU is on the back side of the MoBo (top side of the system) covered by a thick, finned panel to serve as passive radiator. They probably wanted to save on designing a MoBo with the CPU on the other side.
  • eastcoast_pete - Tuesday, July 28, 2020 - link

    Appreciate the comment on the rotated case; those thermal pads looked oddly out of place. But, as Lindegren's comment pointed out, having the CPU on the opposite site of this, after all, custom MB, one could have the main heat source (SoC/CPU) facing "up", and all others facing "down".
    For maybe irrational reasons, I just don't like VRMs, SSDs and similar getting so toasty in an always-on piece of networking equipment.
  • YB1064 - Wednesday, July 29, 2020 - link

    Crazy expensive price!
  • Valantar - Wednesday, July 29, 2020 - link

    I think you got tricked by the use of a shot of the motherboard with a standard server heatsink. Look at the teardown shots; this version of the motherboard is paired with a passive heat transfer block with heat pipes which connects directly to the top chassis. No convection involved inside of the chassis. Should be reasonably efficient, though of course the top of the chassis doesn't have that many or that large fins. A layer of heat pipes running across it on the inside would probably have helped.
  • herozeros - Tuesday, July 28, 2020 - link

    Neat review! I was hoping you could offer an opinion on why they elected to not include a SKU without quickassist? So many great router scenarios with some juicy 10G ports, but bottlenecks if you’re trafficing in resource intensive IPSec connections, no? Thanks!
  • herozeros - Tuesday, July 28, 2020 - link

    Me English are bad, should read “a SKU without Quickassist”
  • GreenReaper - Tuesday, July 28, 2020 - link

    The MSRP of the D-2123IT is $213. All D-2100 CPUs with QAT are >$500:
    https://www.servethehome.com/intel-xeon-d-2100-ser...
    https://ark.intel.com/content/www/us/en/ark/produc...
    And the cheapest of those has a lower all-core turbo, which might bite for consistency.

    It's also the only one with just four cores. Thanks to this it's the only one that hits a 60W TDP.
    Bear in mind internals are already pushing 90C, in what is presumably a reasonably cool location.

    The closest (at 235% the cost) is the 8-core D-2145NT (65W, 1.9Ghz base, 2.5Ghz all-core turbo).
    Sure, it *could* do more processing, but for most use-cases it won't be better and may be worse. To be sure it wasn't slower, you'd want to step up to D-2146NT; but now it's 80W (and 301% the cost). And the memory is *still* slower in that case (2133 vs 2400). Basically you're looking at rack-mount, or at the very least some kind of active cooling solution - or something that's not running on Intel.

    Power is a big deal here. I use a quad-core D-1521 as a CPU for a relatively large DB-driven site, and it hits ~40W of its 45W TDP. For that you get 2.7Ghz all-core, although it's theoretically 2.4-2.7Ghz. The D-1541 with twice the cores only gets ~60% of the performance, because it's _actually_ limited by power. So I don't doubt TDP scaling indicates a real difference in usage.

    A lower CPU price also gives SuperMicro significant latitude for profit - or for a big bulk discount.

Log in

Don't have an account? Sign up now