Introduction to Proxy Servers

Do you have a growing family at home slowly eating away at your bandwidth? Maybe you're a web surfing fanatic looking for a little more speed? If you answered yes to either, a caching proxy is for you. This simple addition to your home network can provide you with additional bandwidth by reducing common internet bandwidth usage. Normally these types of proxies are found in the commercial world, but they're just as useful at home. Below is an image of a traditional multi-computer home network.


Traditional Home Network

So what is a caching proxy server? The concept is pretty simple: when a request is made to a website, that content is then saved locally on the local caching proxy server. When another request for the same data is made by any machine on your network, that data is retrieved from your local proxy rather than the internet. The content can be anything from regular website content to a file you downloaded. For those with multiple computers in a single household, the bandwidth savings really adds up with patches and multi computer driver updates. The change to the network configuration is really quite small:


Home Network with Proxy Server

At this point many are likely asking how much this costs. If you read my previous article, you would know the answer right away: "It's free and it's on Linux". I suppose I need to preface that last comment with the qualification that you need some old "junky but functional" hardware lying around. There are many different Linux solutions we can deploy to achieve this goal. For this article I have chosen a solution of Arch Linux, Shorewall, and Squid.

We selected Arch Linux because it is a rolling release and has the latest and greatest packages. If you are not familiar with the phrase "rolling release", in Linux it indicated a distribution that keeps you up-to-date with the latest software updates via the package manager. You will never have to re-install or upgrade your server from one release version to the next with this style of distribution. The great part about a rolling release on a proxy/firewall setup is that once it's set up and working correctly, you will not have to go back and completely overhaul the server when a newer distribution update comes out.

Along with the different types of OS and application solutions, there are also multiple ways to set up a caching proxy. My preferred setup is a transparent caching proxy. A transparent proxy does not require you to make any additional changes to the client computers on your network. You utilize the proxy server as your home gateway, allowing the proxy server to automatically forward the ports to Squid. The second way to utilize Squid would be to set up your client machines to utilize the proxy server via the proxy settings in your browser. Although this may be the easiest way to set up a proxy server, it requires you to make changes for any machine that attaches to your network. The table below shows what I selected for my transparent caching proxy server.

Test Proxy System
Component Description
Processor Intel Pentium 4 3.06GHz
(3.06GHz, 130nm, 512K cache, Single-core + Hyper-Threading, 70W)
Memory 2x256MB PC800 RDRAM
Motherboard Asus P4T
Hard Drives 120GB Western Digital SATA
Video Card ATI Radeon 7000
Operating Systems Arch Linux (32-bit)
Network Cards Onboard Intel Gigabit
PCI 100Mbit 3Com 3c905C-TX

I could have selected older equipment, but this is what I had laying around the house. As seen in the table, one of the hardware requirements for a transparent proxy is to have two network cards or a dual port network card. We recommend against using wireless for either of the connections to the proxy server, and a Gigabit Ethernet connection from the proxy to the rest of the network is ideal. (The connection to your broadband link can be 100Mbit without imposing any bottleneck.) Another quick suggestion: If you download a fair amount of files, it may be a wise idea to utilize at least a 120GB HDD. The idea is that the more space you have, the longer you can keep your files stored on your proxy server. With storage being so cheap, you could easily add a 500GB or larger drive for under $100.

Now that we have our hardware and a good idea what we want to set up, it's time to get installing. I'll try to keep this portion simple and to the point, although if you have questions later feel free to post a comment.

Proxy Server How To
POST A COMMENT

97 Comments

View All Comments

  • eleon - Tuesday, May 11, 2010 - link

    I really encourage everyone to use or try linux, and to reuse old hardware. but this concept is the wrong solution in so many ways.

    The main advantage of caching proxies is not to save bandwith, it is to reduce the downloaded data volume.
    If you have the problem that the bandwith of your internet connection isn't shared between your client and/or applications fair enough, you need to think about QOS not a proxy server.

    a rolling release distribution for a router???? I use archlinux myself on my laptop, and I like the rolling release cycle and the cutting edge packages on my Desktop. but it's really the wrong distribution for a infrastructure-box like a router. your argument that you never have to care about updating anymore is wrong. I would say you have to care/worry everytime you are updating! The advantage of a distribution with stable releases is that you set up the box, and if it's up and running you have only securityupdates. this means only minor updates and there are no configuration changes. with a rolling release you have major versionupdates and there is a greater chance that your config isn't working after updating a package. so there a two szenarios: you update frequently and risk everytime to break the system (which provides your internet-access). or you don't update, und your router/firewall may have serious security-issues. so using a rolling-release-distro on a router isn't a good idea at all!

    use a pc that needs more than 100W for this? maybe you should think about investing this energy-costs in a faster internetconnection?

    I was thinking about a caching proxy myself, but for a shared G3 connection which has a data volume limitation of 6GB/Month. in this area a caching proxy can make sense, and you can add something like ziproxy to reduce the transmitted data by compressing the pictures. but one youtube video produces more traffic than 100 pictures. so whats the point, and squid doesn't cache dynamic content like flashvideos.

    so for your problem/goal to have a "fast surfing experience" while your family is doing what ever on the internet, you solution is QOS, which can handle this very effectively. use embedded hardware to be energyefficent, and use a specialized router distribution ( openwrt, pfsense ,... http://en.wikipedia.org/wiki/List_of_router_or_fir... ) so that you don't spend lot's of hours to get it running, which is really inefficient too.

    but if your goal is to learn something about linux, your family proxy project is the way to go! :)
    Reply
  • Dravic - Tuesday, May 11, 2010 - link

    My reply was similar to yours a qos solution is what would fit best in this situaion, unless you're dealing with usage caps or low bandwidth service. I've tried this several times over the past ~7 years at home and the browsing experience was noticeably slower when using a proxy. The extra latency of even a hashed disk look up of an object is slower then just gettng the object on a broadband connection.

    But I was told this just wasnt "true" .. well see

    On a saturated link i can see where a proxy would help because your not going over the link, but that is the job of qos. I'd like to see FULL page load metrics for both types of data retrieval (while link saturaded and unencumbered).
    Reply
  • JarredWalton - Tuesday, May 11, 2010 - link

    I'm not sure some of you are on the same page as me. First, my particular setup was done purely for initial testing. As I comment (multiple times), it's complete overkill--both from a hardware performance as well as a power requirement perspective. From the conclusion:

    'Our only recommendation is that you consider the cost of electricity compared with the hardware. Sure, Linux will run fine on "free" old hardware, but a proxy server will generally need to be up and running 24/7, so you don't want to have a box sucking down 100W (or more) if you can avoid it.'

    We're not saying you need to do Arch, or you need high-end hardware. In fact I'm going to try setting up a proxy with a CULV and Atom laptop to see how that works.

    As far as QoS, we never even mentioned that. The point of a caching proxy is to avoid going out to the Internet multiple times for the same data. For me in particular, where I review lots of laptops that need frequent updates, and I have to get new video drivers regularly, the idea of a proxy means that I can speed up the process for quite a few things. I'm not worried about "saving bandwidth" in the way you're discussing, though if you had a plan that charged you for downloading over a certain amount it might be useful. I'm interested in speeding up patching and such.

    Hence, the comments about wishing Steam would work with my proxy... as it stands, I have to manually copy updated files from one PC to another, or else let each download the latest updates manually. L4D2 has had a few 200MB+ updates recently, and I'm sure I've downloaded that on various PCs/laptops at least four times. At 1.5MB/s, it can take a while, especially if I just wanted to play a quick game.

    Everything we discussed in this particular article can easily be applied to Red Hat, Debian, SuSE, Ubuntu, or whatever favorite distro you choose. As a typical non-Linux user, it amazes me how much time people spend arguing over the benefits of their chosen distribution. It's attitudes like that that frighten away potential converts more than anything. Instead of arguing about why one of our specific configurations was bad, why not point out the good?

    Linux can do all this and save time on downloading patches and updates for multiple computers, and you can even get a faster surfing experience on frequently visited sites. You can run it on old or new hardware, and in fact a nettop with a USB adapter might be the ideal way of doing this from a power perspective. And all of this is free, assuming you have the necessary hardware. Pointing out flaws we already list in the article (i.e. the power concern) is a waste of time. I put it as the last sentence figuring that if nothing else, people would read the conclusion and see our discussion of power concerns.
    Reply
  • michal1980 - Tuesday, May 11, 2010 - link

    I get your point Anandtech guru's. And the article is fine. But it seems like you guys are deaf right now.

    For most users, even power users, the question remains, why? What REALLY benefits will I see for all this new up keep.

    IMHO, for a home user this proxy is equal to the killer nic. Might work, but the money at the end of the day is better spent elsewhere.
    Reply
  • dezza - Wednesday, May 12, 2010 - link

    With an Atom PC or any small form factor PC that has at least 1GHz or whatever depending on the services you will be running - You will be better left off with combining DHCP/Proxy so you have one connection open always to gateway/proxy .. And instead of auto-detect explicitly define it ..

    http://www.broadband-help.com/articles/networking/...

    I found this ..

    Brings a few points into the light once again .. Static content is the only thing that is affected, which is of course a big part, but since many big sites uses systems like imageGet()'ers etc. in PHP/ASP and thumbnail() functions - Your proxy can't touch this (MC Hammer) ..

    Again .. Chris, I respect your article and I agree that ArchLinux is a great distribution (In my case for bleeding-edge workstation) - I love reading anandtech's hardware articles as well and this is the main reason for having it in my feeder, but I will patiently wait while more of these articles get to the surface so we give feedback and maybe even come with suggestions or help you in forging them .. Would be lovely to extend this site with some killer articles on software/programming etc. I never doubt your quality of hardware articles and I think indeed you wrote a decent article. This is no bashing.
    Reply
  • dezza - Wednesday, May 12, 2010 - link

    http://tools.ietf.org/html/rfc3143

    another official rfc documenting problems with the proxy ..

    Not even on my work where we have 8000 clients connected to the internet and using BitTorrent heavily (We have BitTorrent shaping/filtering with encryption support) we would benefit anything from using a proxy.

    Also with a proxy you will have to scale your proxy tremendously with another 1000 users I/O performance of the proxy server drops incredibly ..
    Reply
  • jamyryals - Tuesday, May 11, 2010 - link

    They are linux experts. This means they know too much to actually read the article.

    Jarred, you are on point with the distro v distro comment.
    Reply
  • eleon - Wednesday, May 12, 2010 - link

    "Do you have a growing family at home slowly eating away at your bandwidth? Maybe you're a web surfing fanatic looking for a little more speed? If you answered yes to either, a caching proxy is for you."

    That the first paragraph of this article and that's the first thing readers will see. and I really doubt that a caching proxy is the right solution. A caching proxy won't help if one client use the whole bandwidth with bittorent. It will only have a benefit if you have multiple downloads of the same static (http or ftp) content, and that's not the scenario families are dealing with. And if you really have some big updates or Servicepacks, so why not only downloading them one and share between the client. So this maybe a solution for you special needs, but obviously not for a "normal" family. So it's right that you didn't mentioned QOS, but if someone is eating away your bandwidth you need QOS!

    and replied to my comment:
    "I'm not worried about "saving bandwidth" in the way you're discussing, though if you had a plan that charged you for downloading over a certain amount it might be useful. I'm interested in speeding up patching and such."
    PLEASE differentiate between"bandwidth" and "transfer-volume". I didn't talk about saving bandwidth, I said that proxyserver can be a solution for reducing the transfer-volume, that is something completly different. As long as you don't distinguish between this two things you will never understand what you can do with QOS, and what you can do with a caching proxy.

    and I didn't start a discussion about distribution X is better than distribution Y.
    I really love Archlinux.
    But said Archlinux is a good choice for this proxy, because it has a rolling release cycle.
    My comment about archlinux only relies to this, because it shows that you have no idea about the advantages of distributions with stable releases (+security updates), and releases with rolling release cycle. And in my opinion it is really irresponsible to recommend a rolling release distro for a router/firewall/proxy. (the reasons for that are in my first post).

    So please don't get me wrong, if this is satifying your needs, it's perfect and I'm happy for you.
    but if someone can answer your questions "Do you have a growing family at home slowly eating away at your bandwidth? Maybe you're a web surfing fanatic looking for a little more speed?" with Yes, he or she wouldn't be happy with a caching proxy. It isn't a direct solution for this, it will maybe help in a indirect way in some special situation if you download the same big files by http (ftp) multiple times.
    So if you answer this questions with "Yes" you should consider QOS.

    My main concern is, that your solution is not effective! You can improve the efficiency by low-power or even embedded hardware, a special router-distribution which will minimize the setup time, so it will be efficient on multiple levels, but it won't change the fact, that it is really ineffective in solving the problems of "eaten bandwith", and slow websurfing experience.

    Running QOS an embedded hardware would be effective and efficient. (and many SOHO- and even consumer-routers support it out of the box, and if not, many are supported by alternative firmware-distributions like openwrt, dd-wrt,... ) so if you have these problems/needs, this probably would be the way to go.
    Reply
  • JarredWalton - Wednesday, May 12, 2010 - link

    But QoS won't give you more speed, it will just prioritize bandwidth. A caching proxy, on the other hand, can actually boost page load speeds a lot (though not always). It's not for everyone, and I suppose part of the problem is I view things from my world while Chris has his own idea on things. Anyway, you're still getting caught up on what is essentially a hook to the article. Read that paragraph this way, and it's just a less dramatic restatement of Chris' paragraph:

    "Do you find your web surfing experience to be slower than you'd like? Do you have lots of PCs and do you frequently download the same file on multiple computers? If so, you might want to consider reading this article about proxy servers and what they can do, because it might be something that will help alleviate some of your bandwidth congestion."

    Call it over-exuberance on the part of the author or whatever. Just because someone likes the idea of proxies and writes an article -- OMG it's on AnandTech so it must be true! -- doesn't mean it's the right solution for every single situation. Given this is a Linux article, I personally thought it was more of an interesting idea that may be useful to some of our readership. I know the caching of Windows Updates is definitely useful for me, even though I have a relatively fast 16Mbit download speed.
    Reply
  • epi 1:10,000 - Tuesday, May 11, 2010 - link

    It would be nice if someone could review a realtime av scanning proxy w/ caching. Has anyone tried SafeSquid, or dansguard squid w/ clamav? Reply

Log in

Don't have an account? Sign up now