Technology behind the Killer NIC

We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.

The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.

The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.

The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.

These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.

However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.

Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.

Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?

BigFoot Networks believes it is and we will see what they have to say about it and their technology next.

Index Killer NIC Technology
Comments Locked

87 Comments

View All Comments

  • rqle - Tuesday, October 31, 2006 - link

    I never like to bash a company product because always believed there a niche market, but at its very best, this product doesnt seem to justified the $300 price cost. I do believe many would buy it at a lower price.

    My current broadband is 3.0mbps/512kb, i can pay $5 more a month more for Verizon Fiop 15mbps/2mbp, i would rather go that route for my improve network + other capibilities. As for the side processing function, a cheap $200 (Athlon64 3200+ computer system) can do a whole lot more. Am not saying it a bad product, it just price way to high for me.
  • cornfedone - Tuesday, October 31, 2006 - link

    SOS, DD.

    Just as we've seen with Asian mobos in the last few years, this NIC card is an over-hyped, under-performing POS. Just as mobos from Asus, DFI, Sapphire, Abit, and more have suffered from vcore, BIOS, memory, PCI slot and many other issues, the BigJoke NIC card is more defective goods with zero customer support. I'm sure everyone looks forward to a wiped hard drive image... due to a poorly designed NIC card.

    Sooner or later consumers are gonna wise up and stop buying these defective products. Until they vote with their wallet instead of their penis, unscrupulous companies will continue to ship half baked POS products and fail to provide proper customer support. If consumers stop buying these defective goods, then the company will either correct their problems or go tits up.
  • autoboy - Tuesday, October 31, 2006 - link

    I didn't buy a NIC to make up for my small penis, I bought a Porsche. Chicks don't notice my NIC.

    The K does look pretty cool. Maybe it will fit on my Porsche.
  • Frumious1 - Tuesday, October 31, 2006 - link

    I dare say few if any people have purchased AGEIA or Killer NIC cards. As for your whining about ASUS, DFI, etc. I guess you're one of the people running a budget $50 mobo that can't understand what it's like to really push a system? Or are you the other extreme: you overclocked by 50% or more and are pissed that the system wasn't fully stable?
  • slashbinslashbash - Tuesday, October 31, 2006 - link

    You talk a lot about how it's virtually impossible to test something whose entire performance is based almost entirely on an Internet connection which is inherently variable. That's why you need some more control over the variables in your test. Namely, a LAN.

    For example... have 4 computers, each with exactly the same configs except for the network cards. One of them has the KillerNIC, the rest have different NICs. They are all running Unreal Tournament. You also have one computer set up as the server, on the same Gigabit Ethernet switch as the 4 "player" machines. The level is a small plain square room with no doors, trenches, or any other features. Each of the "players" is running the same script where they have unlimited ammo and a machine gun, running circles around and around, shooting constantly from the minute they respawn. Let the scripts run for 100 hours and capture the framerate and pings on each machine.

    So you have 4 computer players running around in circles in a small square room, shooting each other for 100 hours. Yes, there will still be randomness, but over the course of 100 hours it should cancel out, and this test should be replicable. Any real differences between the NICs would come out over time. Run it again to make sure.

    Or maybe that's not the best way of doing it. I don't even play UT2003 so I don't know what's really possible and what's not, but I've heard of people doing scripts and stuff. Maybe there are better ways of doing it, but you can eliminate the variable of the Internet connection by limiting your testing to a LAN.
  • Gary Key - Wednesday, November 1, 2006 - link

    quote:

    You talk a lot about how it's virtually impossible to test something whose entire performance is based almost entirely on an Internet connection which is inherently variable. That's why you need some more control over the variables in your test. Namely, a LAN.


    We tested over a LAN, the results were not that different, in fact the NVIDIA NIC and Intel PRO/1000 PT cards had better throughput and latencies the majority of the time. We did not show these results as the card is marketed to improve your Online Gaming experience. If the card had been marketed as a must have product to improve your gaming capability on a LAN then it would have been reviewed as such.

    When we tested on the LAN the steps you outlined were basically followed from a script viewpoint in order to ensure the variables were kept to a minimum. We did not provide these results, maybe we should have in hindsight. Our final opinion of the card would not have changed.

    Thanks for the comments. :)
  • Frumious1 - Tuesday, October 31, 2006 - link

    The product is targeted at gamers. Look at the marketing material. now, while a LAN party goer might get some advantage out of it, there are FAR more people playing games from home using broadband connections. If this only improves performance in a LAN environment (clearly NOT what is being advertised), then it's already a failure. I like what Gary did here: look at real world testing and let us know how it turned out. Who gives a rip about controlled environments and theoretical performance increases if the reality is that the product basically doesn't help much? What's really funny is that they even show a ping "advantage" in FEAR of maximum 0.40ms and average 0.13ms. WTF!? Like anyone can notice a .13ms improvement in ping times! The frame rate improvements might be good (if they were available in many games)... still not $270+ good, though.
  • Bladen - Tuesday, October 31, 2006 - link

    Or as you touched on, do the test for a long time, or many many repeats, and let the averages soeak for themselves.
  • shoRunner - Tuesday, October 31, 2006 - link

    You pay almost $300 for the ultimate NIC card, and its PCI so it can't even get anywhere near gigbit throughput. AND the CPU utilization isn't even better than an onboard solution. PLEASE. If anyone is truely thinking about buying this, send me a PM I've got some beautiful ocean front property in Montanta to sell you for pennies on the dollar.
  • mlau - Tuesday, October 31, 2006 - link

    You underestimate this card greatly. This is the ultimate network card for linux:
    it could theoretically offload almost all of linux' network stack (including linux'
    advanced filtering/routing capabilites and protocols). It's a firewall-router on a
    card. IMHO the card is targeted for the wrong crowd (although I understand it somewhat,
    since gamers are usually stupid enough to buy 2 video cards and other completely
    unnecessary stuff [ageia comes to mind])

Log in

Don't have an account? Sign up now