Linux Neophyte Troubleshooting (by Jarred)

I have to give Chris credit: he knows a lot about Linux. In fact, I'm pretty sure he's forgotten more about the subject than I have yet learned! However, being "new" (relatively) to Linux allows me to provide some insight that he may have glossed over. If you're an experienced Linux user, nothing I say here is likely to help out, but for the rest of you I thought before posting this article I'd take a stab at setting up my own proxy server. The "simple" process ended up taking a couple days of on-and-off troubleshooting to get everything working properly. What follows is a brief summary of the things I learned/experienced during my Linux proxy crash course.

First, let's start with the hardware. I had a mini-ITX motherboard and parts available, which would have been perfect! Sadly, the board only has a single network adapter and an x16 PCI-E slot, so I looked elsewhere. I ended up piecing together a system from spare parts.

Jarred's Test System
Component Description
Processor Intel Core 2 Quad Q6600
(2.40GHz, 65nm, 2x4MB cache, quad-core, 1066FSB, 105W)
Memory 2x2048MB DDR2-800 RAM
Motherboard ASRock Conroe1333-eSATA2
Hard Drives 300GB Maxtor SATA
Video Card NVIDIA GeForce 7600GT
Operating Systems Arch Linux (64-bit)
Network Cards Onboard NVIDIA Gigabit (NForce)
PCI TRENDnet Gigabit (Realtek 8169)

Obviously, my spare hardware is a bit more potent than what Chris had lying around, and frankly it's complete overkill for this sort of box. On the bright side, it runs 64-bit Linux quite well, and the NVIDIA GPU makes it gaming capable (if you're not too demanding). Getting Arch installed was the easy part, though; configuring things properly took quite a bit more effort.

I followed the directions and… nothing worked. Ugh. Now, I have to put a disclaimer here: I initially used an old Compaq PCI NIC as my secondary network adapter… and I discovered it was non-functional after a while spent troubleshooting. Or at least, it didn't work with Linux and caused the PC to lock up when I tried to load the driver. Good times! So make sure your hardware works properly in advance and you'll save yourself a headache or two. I picked up the TRENDnet Gigabit NIC at a local shop for just $20 and it installed without a hitch.


Old hardware isn't a problem with Linux; broken on the other hand…

As far as configuring Linux, the wikis Chris linked were generally helpful, though they're more detailed than most people will want/need. The "Arch Way" essentially boils down to giving you a fishing pole and some bait and trying to teach you to fish rather than providing you with a nice salmon dinner. Arch has benefits, and you will learn something about Linux (whether you want to or not), but if you're a newbie plan on spending a fair amount of time reading wikis and searching for solutions as you come to grips with the OS.

After Arch was running and I discovered my Compaq NIC was dead, installing the second NIC required a bit of unexpected work. Since it wasn't present during the OS install, the drivers weren't loaded by default. Using lspci, I was able to find my new NIC, determined it was a Realtek 8169 chipset, and a short Google later I found the necessary driver: modprobe r8169. After spending some time reading about ifconfig and trying a few settings, I got the NIC installed and (apparently) functional, so now it was time to get squid and shorewall configured. (Note that this would have likely been unnecessary had the NIC been present during the Arch install.)

While Chris likes the 10.4.20.x network, I prefer the customary 192.168.x.x. Chris listed a global DNS name server of 216.242.0.2, which will work fine (a name server from CiberLynx), but I put in the name servers from my ISP (Comcast). I grabbed this information from the /etc/resolv.conf file, placed there by DHCP from the cable modem. I also wanted to use DHCP as much as possible. The result is that I have my onboard NIC plugged into my cable modem, and the TRENDnet NIC connected to my wireless router. I set a static IP of 192.168.1.1 for the TRENDnet NIC, with DHCP providing IPs from 192.168.1.5 through 192.168.1.250. Really, though, I only need one for the wireless router, which then provides its own DHCP for a different subnet: 192.168.10.x. The good thing about this setup is that I never had to touch the configuration on my wireless router, which has been working fine. I just unplugged it from the cable modem and connected it to the Linux box.

Configuring shorewall was simple, but I ended up not getting network access from my Linux box. That was a "works as intended" feature, but I wanted to surf from the Linux box as well. I had to add ACCEPT $FW net tcp www to /etc/shorewall/rules file to get my local networking back, and I added a line to allow FTP to work as well. Getting squid to work wasn't a problem… after figuring out that Chris forgot the "transparent" option for the http_port setting. I created the directory /home/squidcache for the proxy (mkdir /home/squidcache then chmod 777 /home/squidcache), just because I liked having the cache as a root folder. With everything finally configured properly, I did some testing and found everything worked about as expected. Great! I also installed X Windows, the NVIDIA driver, and the KDE desktop manager as per the Beginner's Guide Wiki—useful for editing multiple text files, surfing the web for configuration information, etc. Then I decided to reboot the Linux box to make sure it was truly working without a hitch.

After the reboot, sadly to say I was back to nothing working… locally or via the proxy. Some poking around (using dmesg and ifconfig) eventually led me to the discovery that my NICs had swapped names after the reboot, so the NForce NIC was now eth1 and TRENDnet was eth0. One suggestion I found said that if I put the drivers for my NICs into the MODULES section of rc.conf, I could specify the order. That didn't work, unfortunately, but another option involved creating a file called /etc/udev/rules.d/10-network.rules with two lines to name my NICs. (Get your MAC Address via dmesg|grep [network module] or udevadm info -a -p /sys/class/net/[Device: eth0/eth1/wlan0/etc.].) So I added:

SUBSYSTEM=="net", ATTR{address}=="[NVIDIA NForce MAC]", NAME="eth0"
SUBSYSTEM=="net", ATTR{address}=="[TRENDnet MAC]", NAME="eth1"

At this point, everything worked properly, but I did run into a few minor quirks over the next day or so of testing. One problem was that Futuremark's Peacekeeper benchmark stopped working. Troubleshooting by Chris ended up showing that there was a problem with the header being sent from the Futuremark server (Message: "Invalid chunk header" in /var/log/squid/cache.log). Telling squid not to cache that IP/server didn't help, as the malformed header problem persisted, but we were able to work around the issue by modifying the shorewall rules. Now the redirect line reads: REDIRECT loc 3128 tcp www - !service.futuremark.com—in other words, redirect all web traffic except for service.futuremark.com through the proxy.

Wrapping things up, here are the final configuration files that I modified for my particular setup. Providing these files almost certainly goes against the Arch Way, but hopefully having a sample configuration can help a few of you out.

/etc/dhcpd.conf: Put your own ISP name servers in here (from /etc/resolv.conf).
/etc/rc.conf: Specify your network setup, server name, and startup daemons.
/etc/shorewall/rules: The necessary redirect for web traffic to work with your proxy.
/etc/shorewall/shorewall.conf: Only changed the one line to STARTUP_ENABLED=Yes.
/etc/squid/squid.conf: Huge file full of proxy options; here's the short version without comment lines.

Update: It seems my proxy was throttling performance when using "diskd" for the cache directory; changing it to aufs has fixed the situation. With diskd, I experienced intermittent bursts of Ethernet transfer rates, with other transfers limited to <500KB/s. We're not sure why this happened, but you may want to check your network transfer rates with iptraf (pacman -S iptraf, then run it and choose the "S" option to view real-time network transfers).

So what are the benefits to running the proxy cache? If you run multiple machines (I've got more than a dozen at present, with systems constantly arriving and leaving), the proxy cache means things like Windows Updates won't have to go to the web every time and download several hundred megabytes of data. That same benefit is potentially available for other services (i.e. FTP), and in an ideal world I'd be able to cache the various Steam updates. Sadly, Valve doesn't appear to like that, so all of my systems need to go out to the Valve servers to update. Except, you can manually copy your steamapps folder from one system to another and avoid the downloads. But I digress. The squid proxy can also provide a host of other capabilities, from anti-virus support to web filtering and even limiting access to certain times of the day.

The bottom line is that if you have an old system lying around—certainly my quad-core proxy is overkill, and even a Pentium 4 is more than you actually need—you can definitely benefit. A small ITX box or perhaps even an Atom nettop would be perfect for this sort of thing, but most of those lack the requisite dual NICs. You could try a PCIe NIC with mini-ITX, though it's questionable whether the x1 cards will function properly in a mini-ITX board with a single x16 slot intended for graphics use. Barring that, a uATX setup would work fine. Our only recommendation is that you consider the cost of electricity compared with the hardware. Sure, Linux will run fine on "free" old hardware, but a proxy server will generally need to be up and running 24/7, so you don't want to have a box sucking down 100W (or more) if you can avoid it.

Proxy Server How To
Comments Locked

96 Comments

View All Comments

  • eleon - Tuesday, May 11, 2010 - link

    I really encourage everyone to use or try linux, and to reuse old hardware. but this concept is the wrong solution in so many ways.

    The main advantage of caching proxies is not to save bandwith, it is to reduce the downloaded data volume.
    If you have the problem that the bandwith of your internet connection isn't shared between your client and/or applications fair enough, you need to think about QOS not a proxy server.

    a rolling release distribution for a router???? I use archlinux myself on my laptop, and I like the rolling release cycle and the cutting edge packages on my Desktop. but it's really the wrong distribution for a infrastructure-box like a router. your argument that you never have to care about updating anymore is wrong. I would say you have to care/worry everytime you are updating! The advantage of a distribution with stable releases is that you set up the box, and if it's up and running you have only securityupdates. this means only minor updates and there are no configuration changes. with a rolling release you have major versionupdates and there is a greater chance that your config isn't working after updating a package. so there a two szenarios: you update frequently and risk everytime to break the system (which provides your internet-access). or you don't update, und your router/firewall may have serious security-issues. so using a rolling-release-distro on a router isn't a good idea at all!

    use a pc that needs more than 100W for this? maybe you should think about investing this energy-costs in a faster internetconnection?

    I was thinking about a caching proxy myself, but for a shared G3 connection which has a data volume limitation of 6GB/Month. in this area a caching proxy can make sense, and you can add something like ziproxy to reduce the transmitted data by compressing the pictures. but one youtube video produces more traffic than 100 pictures. so whats the point, and squid doesn't cache dynamic content like flashvideos.

    so for your problem/goal to have a "fast surfing experience" while your family is doing what ever on the internet, you solution is QOS, which can handle this very effectively. use embedded hardware to be energyefficent, and use a specialized router distribution ( openwrt, pfsense ,... http://en.wikipedia.org/wiki/List_of_router_or_fir... ) so that you don't spend lot's of hours to get it running, which is really inefficient too.

    but if your goal is to learn something about linux, your family proxy project is the way to go! :)
  • Dravic - Tuesday, May 11, 2010 - link

    My reply was similar to yours a qos solution is what would fit best in this situaion, unless you're dealing with usage caps or low bandwidth service. I've tried this several times over the past ~7 years at home and the browsing experience was noticeably slower when using a proxy. The extra latency of even a hashed disk look up of an object is slower then just gettng the object on a broadband connection.

    But I was told this just wasnt "true" .. well see

    On a saturated link i can see where a proxy would help because your not going over the link, but that is the job of qos. I'd like to see FULL page load metrics for both types of data retrieval (while link saturaded and unencumbered).
  • JarredWalton - Tuesday, May 11, 2010 - link

    I'm not sure some of you are on the same page as me. First, my particular setup was done purely for initial testing. As I comment (multiple times), it's complete overkill--both from a hardware performance as well as a power requirement perspective. From the conclusion:

    'Our only recommendation is that you consider the cost of electricity compared with the hardware. Sure, Linux will run fine on "free" old hardware, but a proxy server will generally need to be up and running 24/7, so you don't want to have a box sucking down 100W (or more) if you can avoid it.'

    We're not saying you need to do Arch, or you need high-end hardware. In fact I'm going to try setting up a proxy with a CULV and Atom laptop to see how that works.

    As far as QoS, we never even mentioned that. The point of a caching proxy is to avoid going out to the Internet multiple times for the same data. For me in particular, where I review lots of laptops that need frequent updates, and I have to get new video drivers regularly, the idea of a proxy means that I can speed up the process for quite a few things. I'm not worried about "saving bandwidth" in the way you're discussing, though if you had a plan that charged you for downloading over a certain amount it might be useful. I'm interested in speeding up patching and such.

    Hence, the comments about wishing Steam would work with my proxy... as it stands, I have to manually copy updated files from one PC to another, or else let each download the latest updates manually. L4D2 has had a few 200MB+ updates recently, and I'm sure I've downloaded that on various PCs/laptops at least four times. At 1.5MB/s, it can take a while, especially if I just wanted to play a quick game.

    Everything we discussed in this particular article can easily be applied to Red Hat, Debian, SuSE, Ubuntu, or whatever favorite distro you choose. As a typical non-Linux user, it amazes me how much time people spend arguing over the benefits of their chosen distribution. It's attitudes like that that frighten away potential converts more than anything. Instead of arguing about why one of our specific configurations was bad, why not point out the good?

    Linux can do all this and save time on downloading patches and updates for multiple computers, and you can even get a faster surfing experience on frequently visited sites. You can run it on old or new hardware, and in fact a nettop with a USB adapter might be the ideal way of doing this from a power perspective. And all of this is free, assuming you have the necessary hardware. Pointing out flaws we already list in the article (i.e. the power concern) is a waste of time. I put it as the last sentence figuring that if nothing else, people would read the conclusion and see our discussion of power concerns.
  • michal1980 - Tuesday, May 11, 2010 - link

    I get your point Anandtech guru's. And the article is fine. But it seems like you guys are deaf right now.

    For most users, even power users, the question remains, why? What REALLY benefits will I see for all this new up keep.

    IMHO, for a home user this proxy is equal to the killer nic. Might work, but the money at the end of the day is better spent elsewhere.
  • dezza - Wednesday, May 12, 2010 - link

    With an Atom PC or any small form factor PC that has at least 1GHz or whatever depending on the services you will be running - You will be better left off with combining DHCP/Proxy so you have one connection open always to gateway/proxy .. And instead of auto-detect explicitly define it ..

    http://www.broadband-help.com/articles/networking/...

    I found this ..

    Brings a few points into the light once again .. Static content is the only thing that is affected, which is of course a big part, but since many big sites uses systems like imageGet()'ers etc. in PHP/ASP and thumbnail() functions - Your proxy can't touch this (MC Hammer) ..

    Again .. Chris, I respect your article and I agree that ArchLinux is a great distribution (In my case for bleeding-edge workstation) - I love reading anandtech's hardware articles as well and this is the main reason for having it in my feeder, but I will patiently wait while more of these articles get to the surface so we give feedback and maybe even come with suggestions or help you in forging them .. Would be lovely to extend this site with some killer articles on software/programming etc. I never doubt your quality of hardware articles and I think indeed you wrote a decent article. This is no bashing.
  • dezza - Wednesday, May 12, 2010 - link

    http://tools.ietf.org/html/rfc3143

    another official rfc documenting problems with the proxy ..

    Not even on my work where we have 8000 clients connected to the internet and using BitTorrent heavily (We have BitTorrent shaping/filtering with encryption support) we would benefit anything from using a proxy.

    Also with a proxy you will have to scale your proxy tremendously with another 1000 users I/O performance of the proxy server drops incredibly ..
  • jamyryals - Tuesday, May 11, 2010 - link

    They are linux experts. This means they know too much to actually read the article.

    Jarred, you are on point with the distro v distro comment.
  • eleon - Wednesday, May 12, 2010 - link

    "Do you have a growing family at home slowly eating away at your bandwidth? Maybe you're a web surfing fanatic looking for a little more speed? If you answered yes to either, a caching proxy is for you."

    That the first paragraph of this article and that's the first thing readers will see. and I really doubt that a caching proxy is the right solution. A caching proxy won't help if one client use the whole bandwidth with bittorent. It will only have a benefit if you have multiple downloads of the same static (http or ftp) content, and that's not the scenario families are dealing with. And if you really have some big updates or Servicepacks, so why not only downloading them one and share between the client. So this maybe a solution for you special needs, but obviously not for a "normal" family. So it's right that you didn't mentioned QOS, but if someone is eating away your bandwidth you need QOS!

    and replied to my comment:
    "I'm not worried about "saving bandwidth" in the way you're discussing, though if you had a plan that charged you for downloading over a certain amount it might be useful. I'm interested in speeding up patching and such."
    PLEASE differentiate between"bandwidth" and "transfer-volume". I didn't talk about saving bandwidth, I said that proxyserver can be a solution for reducing the transfer-volume, that is something completly different. As long as you don't distinguish between this two things you will never understand what you can do with QOS, and what you can do with a caching proxy.

    and I didn't start a discussion about distribution X is better than distribution Y.
    I really love Archlinux.
    But said Archlinux is a good choice for this proxy, because it has a rolling release cycle.
    My comment about archlinux only relies to this, because it shows that you have no idea about the advantages of distributions with stable releases (+security updates), and releases with rolling release cycle. And in my opinion it is really irresponsible to recommend a rolling release distro for a router/firewall/proxy. (the reasons for that are in my first post).

    So please don't get me wrong, if this is satifying your needs, it's perfect and I'm happy for you.
    but if someone can answer your questions "Do you have a growing family at home slowly eating away at your bandwidth? Maybe you're a web surfing fanatic looking for a little more speed?" with Yes, he or she wouldn't be happy with a caching proxy. It isn't a direct solution for this, it will maybe help in a indirect way in some special situation if you download the same big files by http (ftp) multiple times.
    So if you answer this questions with "Yes" you should consider QOS.

    My main concern is, that your solution is not effective! You can improve the efficiency by low-power or even embedded hardware, a special router-distribution which will minimize the setup time, so it will be efficient on multiple levels, but it won't change the fact, that it is really ineffective in solving the problems of "eaten bandwith", and slow websurfing experience.

    Running QOS an embedded hardware would be effective and efficient. (and many SOHO- and even consumer-routers support it out of the box, and if not, many are supported by alternative firmware-distributions like openwrt, dd-wrt,... ) so if you have these problems/needs, this probably would be the way to go.
  • JarredWalton - Wednesday, May 12, 2010 - link

    But QoS won't give you more speed, it will just prioritize bandwidth. A caching proxy, on the other hand, can actually boost page load speeds a lot (though not always). It's not for everyone, and I suppose part of the problem is I view things from my world while Chris has his own idea on things. Anyway, you're still getting caught up on what is essentially a hook to the article. Read that paragraph this way, and it's just a less dramatic restatement of Chris' paragraph:

    "Do you find your web surfing experience to be slower than you'd like? Do you have lots of PCs and do you frequently download the same file on multiple computers? If so, you might want to consider reading this article about proxy servers and what they can do, because it might be something that will help alleviate some of your bandwidth congestion."

    Call it over-exuberance on the part of the author or whatever. Just because someone likes the idea of proxies and writes an article -- OMG it's on AnandTech so it must be true! -- doesn't mean it's the right solution for every single situation. Given this is a Linux article, I personally thought it was more of an interesting idea that may be useful to some of our readership. I know the caching of Windows Updates is definitely useful for me, even though I have a relatively fast 16Mbit download speed.
  • epi 1:10,000 - Tuesday, May 11, 2010 - link

    It would be nice if someone could review a realtime av scanning proxy w/ caching. Has anyone tried SafeSquid, or dansguard squid w/ clamav?

Log in

Don't have an account? Sign up now