Technology behind the Killer NIC

We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.

The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.

The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.

The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.

These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.

However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.

Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.

Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?

BigFoot Networks believes it is and we will see what they have to say about it and their technology next.

Index Killer NIC Technology
Comments Locked

87 Comments

View All Comments

  • Gary Key - Wednesday, November 1, 2006 - link

    We tested these two cards as part of our Killer NIC testing routine. We did not report the numbers as they did not vary greatly from the NIVIDA 590SLI NIC solution.
  • Crassus - Wednesday, November 1, 2006 - link

    Sorry, but that plus the server test would have been very useful information I would have liked in the review. Or maybe we can have a different review, sort of a "NIC roundup". If your results are the same across the board, it's a finding worth mentioning as well, isn't it?
  • Frumious1 - Tuesday, October 31, 2006 - link

    I bet it doesn't even beat the onboard NVIDIA NIC. Or rather, it will tie the NVIDIA solution, which means it's equal to the Killer in most situations and fractionally slower in a few games. Maybe it has lower CPU usage when doing gigabit transmits, but high bandwidth with low CPU usage isn't going to matter much for gaming. Not that any NIC related stuff matters much for gaming these days.
  • vaystrem - Tuesday, October 31, 2006 - link

    I agree its very difficult to test NIC performance and kudos to Anandtech for trying. But, it seems to me that using the card as a server for the games where it saw the most improvement, Fear/CS, and even for those that it didn't may be more enlightening than exploring the client side of things.
  • Gary Key - Tuesday, October 31, 2006 - link

    We set up one of our test beds as a server for a couple of the games we tested. The performance was actually worse than our NVIDIA NIC (dualnet/teaming), Intel PRO/1000 PT, and barely did better than our D-Link PCI NIC in half of the tests. We will not fault the card for its peformance since it was specifically designed as a client side card. This very well could change in the future due to their ability to optimize driver code on the FPGA unit. The article could have gone another five pages with the server and LAN tests that we completed (neither showed any significant differences). It appears from several of the comments that anything over three pages was a waste anyway. ;-)
  • EODetroit - Tuesday, October 31, 2006 - link

    Except according to the PR material the card is made for game clients, not optimized for game servers.
  • VooDooAddict - Tuesday, October 31, 2006 - link

    While the POSIBILITY of embeded linux apps is interesting. For someone who would have the $$ do buy this ... they would have the money to put inexpensive PC parts together into a linux machine. Likely they have spare parts leftover from thier last upgrade.

    Anyone else think the company name is strangly fitting? "BigFoot" ... Myth and Hype?

    Certainly not saying a nice NIC isnt'a good investment ... but at almost $300 ... it's a joke. Drop the embedded linux, hit the $50 price point and this thing would probably sell like mad to WoW Addicts. (eventually also have a PCIe version)

    The aegia(sp?) physics processor is the same way. Great concept, but the tangable benefits are so minimal for the price. $300 Video cards took off because there was a tangable benefit.

    Dropping anothre $300 into the Storage System, Monitor, CPU, Video card, RAM, or even Audio system (surround speakers) would give one a much mroe imersive experience.

    Someone made the wrong decission to stick with the embedded linux thing. Seriously a sperate leftover parts Linux box and a DLINK 4100 router would be a far better way to go.

    So any guesses as to the next $300 (*caugh* gimic *caugh*) expendature to "improve" gaming?

    For those comparing this to SLI/Crossfire. SLI and Crossfire can offer substantial image quality enhancements for people with large pixel count LCDs. The ability to run LCDs at native resolution for gaming is a very tangable benefit. Not something everyone agrees it worth the $$. But the benefit is there.
  • VooDooAddict - Tuesday, October 31, 2006 - link

    I was really hoping for somethign major from this card ... just from the perspective of reliving history.

    Around 10 years ago now (back around '96-'97) when NICs weren't build onboard spending $60-$70 on a 3Com 3c905 or a server class Intel NIC would make a bug differance in overall system performace when working with the Internet, LAN, and Gaming. They gave me big advantages over anyone who just went out an bought a cheapo $20 NE2000 comptable NIC (16-bit ISA even!).

    I'm talking Quake 1, Duke3D, Quake 2, Quake CTF, Original Unreal... A single 3dFx VooDooGraphics Board + 3c905 = pwnage (back when "pwnage" was still a typo). I was so often accused of cheating by laptop wielding, software emulating, newbies (wasn't spelt "noob" then).

    ... That above is why I picked up the handle "VooDooAddict"
  • EODetroit - Tuesday, October 31, 2006 - link

    And they deleted it. Claiming it needed to be moved from "Testimonials" to "General" forum. Whatever, but they didn't actually move my post, they deleted it and replaced it with a post of their own, with the link, plus quoted all the good things said in the Anandtech review and none of the bad. Typical and misleading, but hey, its their web site. They can do what they want. Still, misleading people doesn't endear them to anyone.
  • LoneWolf15 - Tuesday, October 31, 2006 - link

    I admire the amount of engineering that went into this product. It's obvious that the product isn't "snake oil" in the same way that, say, SoftRAM software was back in the day. There's a lot more to this card than just a NIC.

    That said, I don't think it provides enough benefit to justify $279 (unless perhaps you're making $50k+ a year in the PGL). Today's NICs are already pretty well optimized for most situations, plus many mainboard NICS are directly on the PCIe bus, something the Killer NIC can't offer (and as someone pointed out, try doing gig ethernet across a PCI slot; it really isn't feasible, especially if you already have the PCI bus shared with other components like a TV tuner or sound card). The Killer NIC's most interesting feature, FNApps, is not useful at the moment, and I'm still concerned that it might pose a security risk through a malformed application (that's assuming someone coded that app in the first place, considering how little marketshare the Killer NIC is likely to have). Like the Ageia PhysX, at this point in time, I don't see the justification.

    P.S. Is it just me, or does the heatsink "K" look like a Klingon weapon? I'm thinking either Klingon brass-knuckles or a hybrid bat-lef. ;)

Log in

Don't have an account? Sign up now