Up one level
Setting up a Web Content Filter (Parental Controls) using DANSGUARDIAN in Debian Linux
Spencer Stirling

Unfortunately there are some who cannot take a little bit of pornography as part of their daily diet. For those matronly types here's a way to protect your kids and basically control their web viewing. When they rebel out from under your thumb and they turn out to be pregnant teenagers with sexual hangups then don't blame me - I warned you.

The best software (OpenSource rocks, as usual) for this sort of content filtering is called Dansguardian. Basically, this software sits on top of a so-called "proxy" to filter all of your web requests. It is highly configurable. Here I'll outline a simple home configuration.

Networks with a dedicated firewall ("Standalone Computer" section is below, but maybe you should just read straight through?)
The most powerful implementation of this type of service is when you are running a network with a dedicated firewall (a dedicated Linux box that has two network cards - one for the inside LAN, and one for the outside world). In this case the "firewall" part of the software links the inside to the outside, but filters out the bad "hacker" traffic from getting in.

Usually, if you want, you can add a so-called "proxy" into the mix. This is a piece of software (sometimes on a different machine on the LAN, or sometimes on the firewall box itself) through which certain incoming and outgoing traffic (like web traffic) can be cached, monitored, and/or censored. Some might argue that this is part of the firewall, but really the purpose of the firewall is to keep out hackers and such. The purpose of the proxy is at a somewhat "higher level" in that it actually caches (speed up the internet), filters and/or record CONTENT (like maybe your boss wants to spy on your website visitation).

A good piece of proxy software is called "squid". Squid has various plug-ins that can perform recording, *authentication* (perhaps VERY useful to you), and filtering duties, however for our purposes we are going to add a separate piece of software ON TOP OF SQUID called Dansguardian (so in a way we are going to chain up two proxies in a row, but whatever). I am unsure why Dansguardian needs to daisy-chain with Squid, but it does - so get over it.

Like I said, Dansguardian and Squid are really written to be put on a dedicated machine - separate from the users' machines. As such, it will seem like "administrator" software - annoying to configure and such. This is true, but this also makes such filtering much more powerful than you usual at-home filtering software. What I'm going to say applies to Squid 2.5.12-4 and to Dansguardian 2.8.0.6 (which has a built-in antivirus Clamav, in case you are firewalling any of those virus-prone Windows machines!)

I must say that I haven't installed Dansguardian/Squid on a separate dedicated firewall since I've only needed to use the software once for a standalone home computer, but here are some general hints how such a system SHOULD be set up Update March 2006 - I have now installed the system in this configuration. I only needed to modify the entry "filterip =" to be blank. Otherwise, what I wrote here is correct!:

Usually, web traffic passes to port 80. On the firewall you could trap any traffic coming in on the INTERNAL network card which happens to be destinated for port 80 (meaning somebody on your LAN is trying to look up a web page) and redirect it to whichever port Dansguardian listens on (default is 8080 I think). Then Dansguardian sends the request to Squid, which listens to port 3128 by default. Then Squid actually sends the web request out to the EXTERNAL card to port 80. Then it all has to come back through this chain of software BACKWARDS! This type of proxying is called TRANSPARENT proxying. Basically, this forces your users to go through the proxy - they have no choice. Furthermore, they probably won't even know about it. Note that if you use transparent proxying then you will be UNABLE to use the "authentication" part of Squid - i.e. forcing your users to "log in" in order to use the internet. I'm not worried about that here, so I like transparent proxying.

To get you started, here are some IPTABLES commands that you add to your firewalling script. Here $INTIF = network interface for card connected to INTERNAL LAN (e.g. eth0) (please see my Gateway/Firewall HOWTO for an idea about how this fits into a firewalling script):

# Allow port 8080 (Dansguardian) to receive connections
iptables -A INPUT -i $INTIF -p tcp --dport 8080 -j ACCEPT
# Redirect port 80 to Dansguardian (port 8080)
iptables -t nat -A PREROUTING -i $INTIF -p tcp \
   --dport 80 -j REDIRECT --to-ports 8080

As far as configuring Squid is concerned, the relevant configuration file is located at /etc/squid/squid.conf. I would install Squid and Dansguardian on the firewall by apt-getting the packages "squid" and "dansguardian" (easy, huh).

Then, I would edit /etc/squid/squid.conf to reflect the following settings:

http_port 127.0.0.1:3128

This tells Squid to listen on port 3128 ONLY on the local loopback interface (you wouldn't want people out there on that bad internet accessing your Squid server, would you? Also I *think* this will keep users on your LAN from bypassing Dansguardian and going straight to Squid).

Next you'll need some settings to help Squid understand the redirected data

httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

Finally make sure that Squid is operating as user "proxy" (sometimes the user might be "squid" - this seems to depend upon your distro)

cache_effective_user proxy
cache_effective_group proxy

There is more hell in this file than I care to discuss here. Anyway, somewhere deep in the middle there are some lines that say something LIKE

#acl our_networks src 192.168.200.0/24
#http_access allow our_networks
http_access allow localhost

or something to that effect. The first line defines a so-called "access control list", i.e. this just defines the internal LAN (notice that it's commented out, because we're not going to use it). The second line would allow the local LAN to directly access port 3128 - that's commented out for A REASON. We don't WANT people on the local LAN to be able to bypass Dansguardian and go directly to the Squid server. The third line must be there - you'll need Dansguardian (installed on the local machine) to be able to access the Squid server. I'm probably just being paranoid here because Squid is only listening to localhost anyway, but better safe than sorry.

OK, you can go nuts in there configuring all sorts of handy stuff.

Dansguardian
Now to configure Dansguardian. I'll leave the actual "content" discussion for the next session (since that'll be the same anyway). The important configuration here is located in "/etc/dansguardian/dansguardian.conf". First set up:

filterip =
filterport = 8080

This just tells Dansguardian to listen to traffic only locally there.

Next you need to tell Dansguardian where to find Squid. So set

proxyip = 127.0.0.1
proxyport = 3128

Finally you need to configure the user/group ID under which Dansguardian operates (this also will vary depending upon distro)

daemonuser = 'dansguardian'
daemongroup = 'dansguardian'

That's it! DON'T FORGET: you will need to COMMENT OUT the line that reads

UNCONFIGURED

at the top of /etc/dansguardian/dansguardian.conf for Dansguardian to work. You can go crazy in there with lots of other options (see below for some VERY important ones which will affect how your filter behaves!!!).

Standalone Computer
If you only have a standalone computer then you can STILL have a parent control using pretty much the same "transparent proxy" setup. Rather than reinvent the wheel, follow the steps EXACTLY as outlined at this link (WOOPS... that link no longer pulls up the relevant article. This is why I should've copied it here. Sorry folks). The only change will be that (at least in Debian) Squid operates by default as user "proxy" and Dansguardian operates as user "dansguardian".

Dansguardian tweaking
Now for the good part - the actual porn rules!!! I'm considering putting a reverse filter on my network and allowing ONLY porn and websites which contain enough swearwords. OK OK... you want the opposite effect. Too bad.

First things first: what I say here will apply to Dansguardian 2.8.0.6 (with the Clamav antivirus extension, about which I'll say nothing for now) from Debian. Most of this will probably apply to you anyway. As I said, the configuration files live in /etc/dansguardian. You will need to download a set of blacklists - try URLBlacklists.com (they say only 1 free download - then you have to pay... I have no idea if that's true. At least you shouldn't need to download new blacklists often - maybe once every half-year or so). Unzip this file in the directory /etc/dansguardian (which should create a further subdirectory "blacklists" which contains the goods).

Now you need to go into the files "bannedsitelist" and "bannedurllist" and uncomment whichever blacklists you want to enforce (don't do them all... seriously, do you have a jewelry addiction?)!!!

Now go into "bannedextensionlist" and decide which extensions you want to allow (as it is the rules are ridiculously STRICT - you should probably comment out many/most of these. I always need to download files like this, and I'm not going to fight a filter).

Here's something that's a bit interesting: the antivirus Clamav actually looks at all of your traffic. You might try to download something, and if it has a virus then Clamav won't download it (however, you will THINK that you have it downloaded - nothing will complain to you. Instead of having downloaded the file that you were expecting, however, you will have a file that contains a warning (in html) that the file had a virus. Beware of this if you start downloading .zip files and then you find that you can't unzip them for some reason - it's probably just a bogus file full of html code).

Next up: "bannedmimetypelist". Again, you will probably find the rules in there TOO STRICT. Comment out many/most of these. Streaming video/audio is nice.

Now for "bannedphraselist". This is the beauty of Dansguardian over other Web filters. This is the "content" part of the filter, in that it doesn't JUST rely on banned site lists, but rather it tries to check each page for bad phrases. The phrases themselves are found in the /etc/dansguardian/phraselists directory. In each subdirectory there you find both "weighted" files and "banned" files. If something is in a "banned" file (and if you have that file indicated in the /etc/dansguardian/bannedphraselist file) then any page containing any of those phrases will be blocked.

The last file of interest for my simple little setup is the "weightedphraselist" file. This is much like the "bannedphraselist" file - you should indicate any "weighted" phraselists that you want to include. Obviously this sort of filtering is more intelligent - it gives relative weight to certain words and if enough bad stuff comes up then the site is blocked.

Now for an interesting fact: English swearwords don't appear to be included by default in the phraselists!!! You should add them in yourself. For example, in the file /etc/dansguardian/phraselists/badwords/weighted_dutch I added several entries such as

< frak ><90> #except I didn't write frak there

This gives 90 points anytime "frak" is seen (obviously I was too lazy to make a separate "weighted_english" file - I just stuffed them into the weighted_dutch file). As long as you make sure that the "weighted_dutch" file has an entry in /etc/dansguardian/weightedphraselist then these words will be taken into account. Also DON'T just rely on the references that appear by DEFAULT in /etc/dansguardian/weightedphraselist (and /etc/dansguardian/bannedphraselist, for that matter)! There are some real beauties in the "phraselists" directory, for example /etc/dansguardian/phraselists/googlesearches/banned.

One thing to keep in mind is that you MUST HAVE the Google search filter ON (can be found at www.google.com in "Preferences"). Otherwise Dansguardian detects that it's not and blocks EVERY search (not just the bad ones).

Now for some final thoughts: in the file /etc/dansguardian/dansguardianf1.conf you will find places where you can set things like "naughtynesslimit" (after a webpage racks up so many points then it's banned). Back in good old /etc/dansguardian/dansguardian.conf (the file you had to play with just to get Dansguardian started!) I like to turn off logging

loglevel=0

and perhaps MORE IMPORTANTLY I like to change the mode by which weighted phraselists ban sites - I like

weightedphrasemode = 1

the default is "2" for the standard Debian install, but I think you'll find that "1" really makes more sense...

This page has been visited   times since March 14, 2005