Packet Processing Benchmarks with pkt-gen

The pkt-gen benchmarks were processed using the Conductor python package infrastructure in a similar manner to the iPerf3 benchmarks presented in the previous section. Commands are executed on the source, sink, and DUT using the Conductor python package described in the testing methodology section. The setup steps on the DUT for each mode were described in a previous section. Only the source and sink [Run] phases are described here.

On the sink side, receivers are spawned for the two interfaces serially first. Simultaneous execution is then performed after the required wait time in order to monitor both interfaces. The spawn and timeout refer to keywords specified by the Conductor package.

spawn0:sh pkt.gen.recv.sh 1 ix0 [tested-mode]-pg-rx-1c.0.txt
timeout200:sleep 185
step1:killall pkt-gen
spawn1:sh pkt.gen.recv.sh 3 ix2 [tested-mode]-pg-rx-1c.1.txt
timeout201:sleep 185
step2:killall pkt-gen
timeout30:sleep 30
spawn2:sh pkt.gen.recv.sh 1 ix0 [tested-mode]-pg-rx-2c.0.txt
spawn3:sh pkt.gen.recv.sh 3 ix2 [tested-mode]-pg-rx-2c.1.txt
timeout202:sleep 185
step3:killall pkt-gen

The pkt.gen.recv.sh script handles the reception of the packets sent via the firewall on the appropriate interface and dumps out the statistics to the specified file.

On the source side, the first link is evaluated for 30s with each packet size, followed by the second link. In the third iteration, the tests are spawned off for both links simultaneously.

spawn0:sh pkt.gen.sweep.sh ixl2 172.16.0.2:53 172.16.10.2:53 [ixl2 mac] 1 [tested-mode]-pg-tx-1c.0.txt
timeout200:sleep 185
step1:killall pkt-gen
spawn1:sh pkt.gen.sweep.sh ixl3 172.16.1.2:53 172.16.11.2:53 [ixl3 mac] 3 [tested-mode]-pg-tx-1c.1.txt
timeout201:sleep 185
step2:killall pkt-gen
timeout30:sleep 30
spawn2:sh pkt.gen.sweep.sh ixl2 172.16.0.2:53 172.16.10.2:53 [ixl2 mac] 1 [tested-mode]-pg-tx-2c.0.txt
spawn3:sh pkt.gen.sweep.sh ixl3 172.16.1.2:53 172.16.11.2:53 [ixl3 mac] 3 [tested-mode]-pg-tx-2c.1.txt
timeout202:sleep 185
step3:killall pkt-gen

Here, the pkt.gen.sweep.sh script resident in the source's file system is a wrapper for calling pkt-gen multiple times with varying packet sizes in series. The appropriate CPU core allocation and output file specifications are also passed on to this shell script.

Two sets of metrics - the packet rate and the bandwidth - are gleaned from the log files and graphed below. Note that the bandwidth numbers reported by pkt-gen sometimes exceeds the line-rate - particularly when it misses a couple of samples in the previous timestamps. Despite that obvious discrepancy, we get an idea of the average bandwidth and packet rates for each packet size, as the source tries to saturate the links.

pkt-gen Benchmark (Packet Rates in Kpps)

The pfSense installation running on the E302-9D seems to have a best-case packt forwarding rate of 0.6 Mpps per interface, and this goes down to around 0.35 Mpps in the worst case with a large number of rules and NAT being enabled.

pkt-gen Benchmark (Bandwidth in Mbps)

On the bandwidth front, we see a best-case throughput of around 6.5 Gbps. This goes down as packet processing steps start getting enabled, as shown in the above graphs.

Benchmarking with iPerf3 and ipgen Power Consumption and Thermal Performance
Comments Locked

34 Comments

View All Comments

  • Jorgp2 - Thursday, July 30, 2020 - link

    Maybe you should learn the difference between a switch and a router first.
  • newyork10023 - Thursday, July 30, 2020 - link

    Why do you people have to troll everywhere you go?
  • Gonemad - Wednesday, July 29, 2020 - link

    Oh boy. I once got Wi-Fi "AC" 5GHz, 5Gbps, and 5G mobile networks mixed once by my mother. It took a while to explain those to her.

    Don't use 10G to mean 10 Gbps, please! HAHAHA.
  • timecop1818 - Wednesday, July 29, 2020 - link

    Fortunately, when Ethernet says 10Gbps, that's what it means.
  • imaheadcase - Wednesday, July 29, 2020 - link

    Put the name Supermicro on it and you know its not for consumers.
  • newyork10023 - Wednesday, July 29, 2020 - link

    The Supermicro manual states that a PCIe card installed is limited to networking (and will require a fan installed). An HBA card can't be installed?
  • abufrejoval - Wednesday, July 29, 2020 - link

    Since I use both pfSense as a firewall and a D-1541 Xeon machine (but not for the firewall) and I share the dream of systems that are practically silent, I feel compelled to add some thoughts:

    I started using pfSense on a passive J1900 Atom board which had dual Gbit on-board and cost less than €100. That worked pretty well until my broadband exceeded 200Mbit/s, mostly because it wasn’t just a firewall, but also added Suricata traffic inspection (tried Snort, too, very similar results).

    And that’s what’s wrong with this article: 10Gbit Xeon-Ds are great when all you do is push packet, but don’t look at them. They are even greater when you terminate SSL connections on them with the QuickAssist variants. They are great when they work together with their bigger CPU brothers, who will then crunch on the logic of the data.

    In the home-appliance context that you allude to, you won’t have ten types of machines to optimally distribute that work. QuickAssist won’t deliver benefits while the CPU will run out of steam far before even a Gbit connection is saturated when you use it just for the front end of the DMZ (firewall/SSL termination/VPN/deep inspection/load-balancing-failover).

    Put proxies, caches or even application servers on them as well, even a single 10Gbit interface may be a total waste.

    I had to resort to an i7-7700T which seems a bit quicker than the D-2123IT at only 35Watts TDP (and much cheaper) to sustain 500Mbit/s download bandwidth with the best gratis Suricata rule set. Judging by CPU load observations it will just about manage the Gbit loads its ports can handle, pretty sure that 2.5/5/10 Gbit will just throttle on inspection load, like the J1900 did at 200Mbit/s.

    I use a D-1541 as an additional compute node in an oVirt 3 node HCI gluster with 3x 2.5Gbit J5005 storage nodes. I can probably go to 6x 2.5Gbit before its 10Gbit NIC becomes a bottleneck.

    The D-1541’s benefit there is lots of RAM and cores, while it’s practically silent with 45 Watts TDP and none of the applications on it require vast amounts of CPU power.

    I am waiting for an 8-core AMD 4000 Pro 35 Watt TDP APU to come as Mini-ITX capable of handling 64 or 128GB of ECC-RAM to replace the Xeon D-1541 and bring the price for such a mini server below that of a laptop with the same ingredients.
  • newyork10023 - Wednesday, July 29, 2020 - link

    With an HBA (were it possible, hence my question), the 10Gbps serves a possible use (storage). Pushing and inspection exceeds x86 limits now. See TNSR for real x86 limits (wighout inspection).
  • abufrejoval - Wednesday, July 29, 2020 - link

    That would seem apply to the chassis, not to the mainboard or SoC.
    There is nothing to prevent it from working per se.

    I am pretty sure you can add a 16-port SAS HBA or even NVMeOF card and plenty of external storage, if thermals and power fit. A Mellanox 100Gbit card should be fine electrically, logically etc, even if there is nothing behind to sustain that throughput.

    I've had an Nvidia GTX1070 GPU in the SuperMicro Mini-ITX D-1541 for a while, no problem at all, functionally, even if games still seem to prefer Hertz over cores. Actually GPU accellerated machine learning inference was the original use case of that box.
  • newyork10023 - Wednesday, July 29, 2020 - link

    As pointed out, the D2123IT has no QAT, so a QAT accelerator would take up an available PCIe slot. It could push 10G packets then, but not save them or think (AI) on them.

Log in

Don't have an account? Sign up now