Technology behind the Killer NIC

We will not be spending several pages and displaying numerous charts in an attempt to explain in absolute detail how the networking architecture and technology operates. Instead we will provide a high level technology overview in our explanations, which will hopefully provide the basic information needed to show why there are advantages in offloading the data packet processing from the CPU to a dedicated processing unit. Other technologies such as RDMA and Onloading are available but in the interest of space and keeping our readers awake we will not detail these options.

The basic technology the Killer NIC utilizes has been in the corporate server market for a few years. One of the most prevalent technologies utilized and the one our Killer NIC is based upon is the TCP/IP Offload Engine (TOE). TOE technology (okay that phrase deserves a laugh) is basically designed to offload all tasks associated with protocol processing from the main system processor and move it to the TOE network interface cards (TNIC). TOE technology also consists of software extensions to existing TCP/IP stacks within the operating system that enable the use of these dedicated hardware data planes for packet processing.

The process required to place packets of data inside TCP/IP packets can consume a significant amount CPU cycles dependent upon the size of the packet and amount of traffic. These dedicated cards have proven very effective in relieving TCP/IP packet processing from the CPU resulting in greater system performance from the server. The process allows the system's CPU to recover lost cycles so that applications that are CPU bound are now unaffected by TCP/IP processing. This technology is very beneficial in a corporate server or datacenter environment where there is a heavy volume of traffic that usually consists of large blocks of data being transferred, but does it really belong on your desktop where the actual CPU overhead is generally minimal? Before we address this question we need to take a further look at how the typical NIC operates.

The standard NIC available today usually processes TCP/IP operations in software that can create a substantial system overhead depending upon the network traffic on the host machine. Typically the areas that create increased system overhead are data copies along with protocol and interrupt processing. When a NIC receives a typical data packet, a series of interactions with the CPU begins which will handle the data and route it to the appropriate application. The CPU is first notified there is a data packet waiting and generally the processor will read the packet header and determine the contents of the data payload. It then requests the data payload and after verifying it, delivers it to the waiting application.

These data packets are buffered or queued on the host system. Depending upon the size and volume of the packets this constant fetching of information can create additional delays due to memory latencies and/or poor buffer management. The majority of standard desktop NICs also incorporate hardware checksum support and additional software enhancements to help eliminate transmit-data copies. This is advantageous when combined with packet prioritization techniques to control and enhance outbound traffic with intelligent queuing algorithms.

However, these same NICs cannot eliminate the receive-data copy routines that consume the majority of processor cycles in this process. A TNIC performs protocol processing on its dedicated processor before placing the data on the host system. TNICs will generally use zero-copy algorithms to place the packet data directly into the application buffers or memory. This routine bypasses the normal process of handshakes between the processor, NIC, memory, and application resulting in greatly reduced system overhead depending upon the packet size.

Most corporate or data center networks deal with large data payloads that typically are 8 Kbit/sec up to 64 Kbit/sec in nature (though we fully understand this can vary greatly). Our example will involve a 32 Kbit/sec application packet receipt that usually results in thirty or more interrupt-generating events between the host system and a typical NIC. Each of these multiple events are required to buffer the information, generate the data into Ethernet packets, process the incoming acknowledgements, and send the data to the waiting application. This process basically reverses itself if a reply is generated by the application and returned to the sender. This entire process can create significant protocol-processing overhead, memory latencies, and interrupt delays on the host system. We need to reiterate that our comments about "significant" system overhead are geared towards a corporate server or datacenter environment and not the typical desktop.

Depending upon the application and network traffic a TNIC can greatly reduce the network transaction load on the host system by changing the transaction process from one event per Ethernet packet to one event per application network I/O. The 32 Kbit/sec application packet process now becomes a single data-path offload process that moves all data packet processing to the TNIC. This eliminates the thirty or so interrupts along with the majority of system overhead required to process this single packet. In a data center or corporate server environment with large content delivery requirements to multiple users the savings in system overhead due to network transactions can have a significant impact. In some instances replacing a standard NIC in the server with a TNIC almost has the same effect as adding another CPU. That's an impressive savings in cost and power requirements, but once again is this technology needed on the desktop?

BigFoot Networks believes it is and we will see what they have to say about it and their technology next.

Index Killer NIC Technology
Comments Locked

87 Comments

View All Comments

  • WileCoyote - Friday, November 3, 2006 - link

    I said it's too wordy. That means take out words. Are you people for real?
  • Frumious1 - Tuesday, October 31, 2006 - link

    That post was too short. Three sentences with no real support to back it up. Lamest comment ever.

    Look, sorry you have poor reading skillz, but a lot of us are quite educated and like to know the WHY and HOW besides just the WHAT. My first thought was "how in the hell is this product supposed to do ANYTHING for frame rates!?" Gary answered that by looking into the details more than superficially. Oh, sure, it doesn't really deliver -- a 5% increase on a moderately high-end system is pretty silly for a $270 product -- but that it can affect frame rates at all is surprising to me. I now have hope for Windows Vista, at least on the networking stack side of things.

    What's really odd is that the review is pretty clearly advising people to NOT go out and buy this "killer" product, because it just isn't that great. If I saw a review on AnandTech that bashed a product without any backing material, I'd feel I was reading something at HardOCP. Well, okay, they back up their complaints sometimes, but their testing methodology is worse than suspect. Remember the Core 2 "launch" reviews where they used a midrange GPU config to conclude that it did nothing for gaming performance?

    Anyway, there seem to be quite a few new names on here for this article. Wonder how many are employed by BigFoot? It's like there are a bunch of people bashing Gary for even reviewing the product at all, another group that's bashing him for not liking the card, a third group bashing him for not being able to do a 1 page writeup, and then a few people that think: "nice review; didn't surprise me much with the results, but at least we can now know that Killer can help in certain situations, even if it's overpriced as hell."

    I'm sure Gary was tested a lot more than is shown in this article. I've conversed with him in the past, and I'd wager he was pulling his hair out over this article. How many times did you write it Gary? Ten? More? I think you ought to post here and set the record straight, because my bet is that in terms of improving gaming performance the Killer NIC is as good as a NIC is going to get... which means the other NICs from Intel, 3COM, etc. are basically all as good as onboard solutions if that. There's really not much you can do to truly improve performance of a NIC for online gaming when you're looking at maybe 5ms worst case being delay added by the hardware and OS.

    As I said above in an earlier comment, it would be nice to have more developers like Jon Carmac around, because he apparently knows how to already perform well without lots of extra hardware. Can't say his games are the greatest, but the Doom 3 engine and networking aspects (client/server architecture) are clearly ahead of most other FPS solutions.
  • WileCoyote - Wednesday, November 1, 2006 - link

    I was trying to keep my comment short, simple, and to the point. I don't need to write a page defending my opinion on something as trivial as a product review. The numbers speak for themselves in this review. And it wasn't so much the quantity but rather the quality - there is little to no structure in the review. That's fact, not opinion. There is a better way to write a review - I've read hundreds of them on Anandtech.

    I guess my Bachelor's Degree in Anthropology from a well-respected U.S. university doesn't count for much and neither does the successful business I created/own which pays a six-figure salary.
  • Gary Key - Wednesday, November 1, 2006 - link

    quote:

    I guess my Bachelor's Degree in Anthropology from a well-respected U.S. university doesn't count for much and neither does the successful business I created/own which pays a six-figure salary.


    Apparently my MBA and the fact I am already retired (at 44 and doing this because I really enjoy it) does not count for very much either. LOL.....

    quote:

    And it wasn't so much the quantity but rather the quality - there is little to no structure in the review.


    Seriously, I am always looking to improve. How would you change the structure of the article?
  • Aikouka - Tuesday, October 31, 2006 - link

    Being that I run an FTP server that tends to see a decent amount of internal traffic, it actually sounds like a TNIC (or simply making a dedicated server) could actually be beneficial to me. Although, I sure have no desire to pay $300 for that card when I could easily spend the money on a second video card for SLi purposes or such. Also, I know switching from single to dual-core really helped to off-set the issue of FTP uploading on the local intranet. It really won't matter as when I build a new PC next month, I'm simple setting my old PC as a dedicated server to offload those annoying tasks.

    Also, Mlau, there are some games where "real estate" matters. World of Warcraft is a great example of this and I'm glad that I play the game in 1650x1080, because in certain situations, there's so much junk on your screen (I may call it junk but it actually helps :P), that you need all the extra room you can get. You may be happy in 1024x768, but that gives you no right to vehemently demean people for wanting to play in higher resolutions, which doing so also provides a better quality picture without wasting resources on Anti-Aliasing. Almost all the time, enjoying a game the way the developers designed/envisioned it can be an enriching experience for the gamer.

    Gary, your comment about WoW being limited to 64 FPS... I think you may've left Vertical Sync on :D. I can easily get 90FPS on my dated Athlon 64 X2 4400+ with a GeForce 6800GT OC playing in 1650x1080 with max graphics settings. Albeit, I don't get a constant 90FPS, but it can be easily attainable in non-expansive places. So yeah, if your refresh rate on your LCD/CRT was set to 60Hz, you probably would see your game hover around 60-64FPS or somwhere between that and 30FPS. I know I turned VSync off on mine as I couldn't constantly achieve 60FPS, so the game lowered my FPS to about 30 with VSync turned on. Simply turning it off raised me to an easy 45 minimum with no tearing evident. I know that in the Hillsbrad/Alterac area, I would probably get around 45-60 depending on how far into the distance I could see.
  • Gary Key - Wednesday, November 1, 2006 - link

    http://forums.worldofwarcraft.com/thread.html;jses...">Blizzard Response

    quote:

    Gary, your comment about WoW being limited to 64 FPS... I think you may've left Vertical Sync on :D.


    Hi,

    We tried your suggestions during testing and nothing helped. We used both LCD and CRT monitors with vsync off. This was with several different video cards and Core 2 Duo/AM2 X2 processors. The frame rates were always capped to 64 until we switched to a single core processor on either system. We contacted Blizzard directly and they confirmed the dual core bug. The link above has their response on line item 6. Are you using FRAPS to capture the frame rates? If so which version please?

    Thanks!!!
  • goinginstyle - Thursday, November 2, 2006 - link

    I downloaded FRAPS 2.81 and sure enough my 4800+ X2 is capped at 64FPS.
  • otherwise - Tuesday, October 31, 2006 - link

    You can get an 10/100 ethernet card with a TOE dirt cheap for much less then $300 if you really want one. With most people who actuially need a TOE also demanding 10/100/1000 support, there is a glut of 10/100 TOE NICs.
  • dijuremo - Tuesday, October 31, 2006 - link

    You will probably get more out of $300 if you get a hardware raid controller for your system (Areca comes to mind) which will not only provide a speed upin storage, but also redundancy for your system. I know it does nothing for your network performance, but is money better spent which is my point.

    I considered getting a Killer NIC for my new system, but did not do it because of price, no PCI-Express support (Mobo only has 2 PCI slots used for sound card and HDTV tuner and has 4 PCI Express slots, one used currently for Nvidia 7950GX2), plus performance gains were not that good (I read another review of this card about a month ago elsewhere).

    As for the person saying sli/xfire is useless, you are totally wrong. At 1920x1080 (using the LVM-37w3 LCD monitor - 1080p native) you need sli or xfire to have decent speeds to play games with AA and AF. If you don't play games or play at 1280x1024 or less it does not matter, but above that you really need sli or xfire.

    I also agree the article was a bit too long and actually more effective that a Lunesta overdose. Not sure if the writer is trying to avoid what just happened to HEXUS where they reviewed an Alienware PC and got e-mail back from the company saying they would not get any more harware because of the bad review...
  • heated snail - Tuesday, October 31, 2006 - link

    I don't mean to be a jerk, and I appreciate any sincere and fact-finding test/review article. However:

    I'm amazed, was this really a review of a basic hardware item? Because instead it reads like a mini-novel about all the difficulty the testers/reviewers had in doing their job. Is it too much to ask for a more verbally efficient writing style? About two paragraphs briefly acknowledging that this product has been hyped in the media, and acknowledging that testing was a little more difficult than usual... then get straight to the tests on page two and conclusions on page three. I can't believe how long it took to read through this whole thing with its very repetitive descriptions and self-references.

Log in

Don't have an account? Sign up now