Benchmarking with iPerf3 and ipgen

The iPerf3 tool serves as a quick check to ensure that the network link is up and running close to expectations. As a simple passthrough device, we expect the Supermicro SuperServer E302-9D to achieve line-rates for 10G traffic across various interfaces. We do expect the rates to go down as more processing is added in the form of firewalling and NAT. Towards this, each tested mode is started off with an iPerf3 test. Following that, we perform the sweep of various packet sizes with the pkt-gen tool. In both cases, each 10G interface set is tested separately, followed by both sets simultaneously. After both sets of experiments, the L3 forwarding test using ipgen is performed from each of the three machines in the test setup. This section discusses only the iPerf3 and ipgen results. The former includes IPsec evaluation also.

iPerf3

Commands are executed on the source, sink, and DUT using the Conductor python package described in the testing methodology section. The setup steps on the DUT for each mode were described in the previous section. Only the source and sink [Run] phases are described here.

On the sink side, two servers are spawned out and terminated after 3 minutes. The spawn and timeout refer to keywords specified by the Conductor package.
spawn0: cpuset -l 1,2 iperf3 -s -B 172.16.10.2 -p 5201
spawn1: cpuset -l 3,4 iperf3 -s -B 172.16.11.2 -p 5201
timeout180: sleep 180
step3: killall iperf3

On the source side, the first link is evaluated for 30s, followed by the second link. In the third iteration, the tests are spawned off for both links simultaneously.
spawn1: cpuset -l 1,2 iperf3 -c 172.16.10.2 -B 172.16.0.2 -P 4 -O 5 -t 35 --logfile /tmp/.1c.0.txt
timeout45: sleep 45
spawn3: cpuset -l 1,2 iperf3 -c 172.16.11.2 -B 172.16.1.2 -P 4 -O 5 -t 35 --logfile /tmp/.1c.1.txt
timeout46: sleep 45
spawn5: cpuset -l 1,2 iperf3 -c 172.16.10.2 -B 172.16.0.2 -P 4 -O 5 -t 35 --logfile /tmp/.2c.0.txt
spawn6: cpuset -l 3,4 iperf3 -c 172.16.11.2 -B 172.16.1.2 -P 4 -O 5 -t 35 --logfile /tmp/.2c.1.txt

The table below presents the bandwidth numbers obtained in various modes. The interfaces specified in the headers refer to the ones in the DUT.

Supermicro E302-9D as pfSense Firewall - iPerf3 Benchmark (Gbps)
Mode Single Stream Dual Stream
  ixl2 - ixl0 ixl3 - ixl1 ixl2 - ixl0 ixl3 - ixl1
Router 9.40 9.41 8.77 8.67
PF (No Filters) 6.99 6.96 6.50 6.98
PF (Default Ruleset) 5.43 5.81 4.22 5.69
PF (NAT Mode) 7.89 6.99 4.49 6.06

Line-rates are obtained for the plain router mode. Enabling the packet filtering lowers the performance, as expected - with more rules resulting in slightly lower performance. The NAT mode doesn't exhibit much performance loss compared to the plain PF mode, but, multiple streams on different interfaces needing NAT at the same time does bring the performance more compared to the PF (No Filters) mode.

IPsec Testing using iPerf3

IPsec testing also involves a similar set of scripts, except that only the ixl2 and ixl3 interfaces of the DUT are involved. The table below presents the iPerf3 bandwidth numbers for various tested combinations of encryption and authentication algorithms. The running of the iPerf3 server on the DUT itself may result in lower than actual performance - however, the comparison against the baseline case under similar conditions can still be made.

Supermicro E302-9D as pfSense Firewall - IPsec iPerf3 Benchmark (Mbps)
Algorithm Single Stream Dual Stream
  (Src)ixl2 - (DUT)ixl2 (Src)ixl3 - (DUT)ixl3 (Src)ixl2 - (DUT)ixl2 (Src)ixl3 - (DUT)ixl3
Baseline (No IPsec) 5140 7450 3020 4880
3des-hmac-md5 119 118 61.3 75.2
aes-cbc-sha 374 373 236 238
aes-hmac-sha2-256 377 376 235 212
aes-hmac-sha2-512 433 430 259 280

The above numbers are low compared to the line-rate, but closely match the results uploaded to the repository specified in the the AsiaBSDCon 2015 network performance evaluation paper for a much more powerful system. Given the 60W TDP nature of the SoC and the passively cooled configuration, coupled with the absence of QuickAssist in the SKU, the numbers are passable. It must also be noted that this is essentially an out-of-the-box benchmark number, and optimizations could extract more performance out of the system (an interesting endeavour for the homelab enthusiast).

L3 Forwarding Test with ipgen

The ipgen L3 forwarding test is executed on a single machine with two of its interfaces connected to the DUT. In the evaluation testbed, this condition is satisfied by the source, sink, and the conductor as well. The ipgen tool supports scripting of a sweep of packet and transmission bandwidth combinations. The script is provided to the tool using a command of the following form:
ipgen -T ${TxIntf},${TxGatewayIP},${TxSubnet} -R ${RxIntf},${RxGatewayIP},${RxSubnet} -S $ScriptToRun -L $LogFN
where the arguments refer to the transmitter interface, the IP of the gateway to which the interface connects, and its subnet specifications, along with a similar set for the receiver interface.

L3 Forwarding Benchmark (ipgen) with the Xeon D-2123IT (Source)

L3 Forwarding Benchmark (ipgen) with the Xeon D-1540 (Sink)

 

L3 Forwarding Benchmark (ipgen) with the AMD A10 Micro-6700T (Conductor)

 

Twelve distinct runs were processed, once in each of the four tested modes for each of the machines connected to the DUT. As mentioned earlier, these numbers are likely limited by the capabilities of the source (like in the case of the Compulab fitlet-XA10-LAN), but the other two machines present some interesting results that corraborate with results observed in the iPerf3 and pkt-gen benchmarks. In general, increasing the number of rules seems to noticeably affect the performance. Enabling NAT, on the other hand, doesn't have such a discernible impact compared to other configurations with similar number of rules to process.

pfSense Configuration for Benchmarking Packet Processing Benchmarks with pkt-gen
Comments Locked

34 Comments

View All Comments

  • Jorgp2 - Thursday, July 30, 2020 - link

    Maybe you should learn the difference between a switch and a router first.
  • newyork10023 - Thursday, July 30, 2020 - link

    Why do you people have to troll everywhere you go?
  • Gonemad - Wednesday, July 29, 2020 - link

    Oh boy. I once got Wi-Fi "AC" 5GHz, 5Gbps, and 5G mobile networks mixed once by my mother. It took a while to explain those to her.

    Don't use 10G to mean 10 Gbps, please! HAHAHA.
  • timecop1818 - Wednesday, July 29, 2020 - link

    Fortunately, when Ethernet says 10Gbps, that's what it means.
  • imaheadcase - Wednesday, July 29, 2020 - link

    Put the name Supermicro on it and you know its not for consumers.
  • newyork10023 - Wednesday, July 29, 2020 - link

    The Supermicro manual states that a PCIe card installed is limited to networking (and will require a fan installed). An HBA card can't be installed?
  • abufrejoval - Wednesday, July 29, 2020 - link

    Since I use both pfSense as a firewall and a D-1541 Xeon machine (but not for the firewall) and I share the dream of systems that are practically silent, I feel compelled to add some thoughts:

    I started using pfSense on a passive J1900 Atom board which had dual Gbit on-board and cost less than €100. That worked pretty well until my broadband exceeded 200Mbit/s, mostly because it wasn’t just a firewall, but also added Suricata traffic inspection (tried Snort, too, very similar results).

    And that’s what’s wrong with this article: 10Gbit Xeon-Ds are great when all you do is push packet, but don’t look at them. They are even greater when you terminate SSL connections on them with the QuickAssist variants. They are great when they work together with their bigger CPU brothers, who will then crunch on the logic of the data.

    In the home-appliance context that you allude to, you won’t have ten types of machines to optimally distribute that work. QuickAssist won’t deliver benefits while the CPU will run out of steam far before even a Gbit connection is saturated when you use it just for the front end of the DMZ (firewall/SSL termination/VPN/deep inspection/load-balancing-failover).

    Put proxies, caches or even application servers on them as well, even a single 10Gbit interface may be a total waste.

    I had to resort to an i7-7700T which seems a bit quicker than the D-2123IT at only 35Watts TDP (and much cheaper) to sustain 500Mbit/s download bandwidth with the best gratis Suricata rule set. Judging by CPU load observations it will just about manage the Gbit loads its ports can handle, pretty sure that 2.5/5/10 Gbit will just throttle on inspection load, like the J1900 did at 200Mbit/s.

    I use a D-1541 as an additional compute node in an oVirt 3 node HCI gluster with 3x 2.5Gbit J5005 storage nodes. I can probably go to 6x 2.5Gbit before its 10Gbit NIC becomes a bottleneck.

    The D-1541’s benefit there is lots of RAM and cores, while it’s practically silent with 45 Watts TDP and none of the applications on it require vast amounts of CPU power.

    I am waiting for an 8-core AMD 4000 Pro 35 Watt TDP APU to come as Mini-ITX capable of handling 64 or 128GB of ECC-RAM to replace the Xeon D-1541 and bring the price for such a mini server below that of a laptop with the same ingredients.
  • newyork10023 - Wednesday, July 29, 2020 - link

    With an HBA (were it possible, hence my question), the 10Gbps serves a possible use (storage). Pushing and inspection exceeds x86 limits now. See TNSR for real x86 limits (wighout inspection).
  • abufrejoval - Wednesday, July 29, 2020 - link

    That would seem apply to the chassis, not to the mainboard or SoC.
    There is nothing to prevent it from working per se.

    I am pretty sure you can add a 16-port SAS HBA or even NVMeOF card and plenty of external storage, if thermals and power fit. A Mellanox 100Gbit card should be fine electrically, logically etc, even if there is nothing behind to sustain that throughput.

    I've had an Nvidia GTX1070 GPU in the SuperMicro Mini-ITX D-1541 for a while, no problem at all, functionally, even if games still seem to prefer Hertz over cores. Actually GPU accellerated machine learning inference was the original use case of that box.
  • newyork10023 - Wednesday, July 29, 2020 - link

    As pointed out, the D2123IT has no QAT, so a QAT accelerator would take up an available PCIe slot. It could push 10G packets then, but not save them or think (AI) on them.

Log in

Don't have an account? Sign up now