Dansguardian/ConfigFiles

From SME Server
Revision as of 20:13, 30 March 2012 by Steve288 (talk | contribs) (→‎Blacklists: Formatting and few spelling error)
Jump to navigation Jump to search

More Dansguardian Config files

Back to Dansguardian wiki page

Blacklists

The general procedure is to locate a suitable blacklist on the Internet, download the tgz file, uncompress it and move it to the /etc/dansguardian/blacklists directory. The SME server admin user would need to configure a cron job to regularly run the download & update script (see below).

There is a commercial blacklist available from URLBlacklist.com (commercial at cost list but free for first download)

There is a free blacklist available from http://www.shallalist.de/ (free for private or personal & non commercial users, registration is required for commercial use, although still free). See full registration details here http://www.shallalist.de/licence.html Scripts for automating the shallalist download process are here http://www.shallalist.de/helpers.html

A current (at March 2012) blacklists.tar.gz is available from http://cri.univ-tlse1.fr/blacklists/download/blacklists.tar.gz Here is a script to download and configure this list

If you wish to make dansguardian use squidguard blocking rules & have them updated weekly then add the following to the /etc/cron.weekly/dansguardian file Please check the location of the blacklists is still current, if necessary search Google on "squidGuard blacklists" or "blacklists.tar.gz" to find a current location.

Create cron job

cd /etc/cron.weekly
pico -w dansguardian

Add the following lines

# blacklists update script for dansguardian
cd /etc/dansguardian
rm -f blacklists.tar.gz
wget -qnv http://cri.univ-tlse1.fr/blacklists/download/blacklists.tar.gz -O blacklists.tar.gz
tar -zxf blacklists.tar.gz
chown -R root.root blacklists
chmod -R 640 blacklists
find blacklists -name new\* -exec rm {} \;
rm -f blacklists/README
chmod ug+x blacklists
chmod ug+x blacklists/*

Then to save & exit

Ctrl o
Ctrl x

Change permissions on cron job & restart crond

chmod +x dansguardian
service crond restart

The scripts runs OK (manually instigated), but integration with Dansguardian not fully tested or documented (at 28 March 2012).


Also refer to this Forum post http://forums.contribs.org/index.php?topic=48449.new;topicseen which refers to list sites and an older blacklists update script from an earlier DG Howto. Previously blacklists were available from mesd.k12.or.us but this site appears non functional at 28 March 2012 Previously there was a blacklist available from dungog.net which was originally installed with packages from dungog.net in an earlier Howto, but this appears no longer accessible.

Troubleshooting Blacklists

Why are sites not being blacklisted?

Keep in mind when using blacklists the interrelations between all blacklists, banned lists, exception lists, gray lists etc. If sites are not being blacklisted even though the site is in a blacklist file, make sure you have added the path of the blacklist file or uncommented (removed the #) from the path of the blacklisted file from /etc/dansguardian/lists/bannedsitelist or bannedurllist. You must tell DansGuardian which blacklists to use!

Allow some blacklisted sites / Add my own blacklisted sites

If you have a site you want to allow or others you do not want to allow, read up for example on the exceptions lists. They override the banned lists. If you regularly update the blacklists with the method above, then any personal changes made in the blacklist folder /etc/dansguardian/lists/blacklists/* will be lost on the next update. Instead make changes to the various exception, banned, and gray list files in /etc/dansguardian/lists. Changes there will say put.

Send yourself email earning message

The above script is straight forward and clean and works well. It is suggested to try it first if you are setting up a cron job to regularly download new blacklists. The below script could be a replacement for those who wish a little more functionality. It adds a log file to the wget command. This might be helpful if for some reason the new blacklist file is not being downloaded. Examining the log file might be useful in troubleshooting. This script also emails you if there is an error in downloading. Obviously this is nice to automatically inform you of possible problems. First test the email portion to make sure it works. With all the spam filters etc out there it may or may not be caught by them. Put the following into a command prompt, (add your email.) This will confirm that the main part of the script works. You should receive an email from your server.

echo "See /var/log/blacklists_dl.log on" `uname -n` |/bin/mail -s'Blacklist DL Error' email@someware.com

If the above email test works it confirms that your email can be send. Now here is the script. Follow the instructions above for setting up the blacklists accept use this script instead.

# blacklists update script for dansguardian
# Creates wget log and emails if error downloading.
cd /etc/dansguardian
rm -f blacklists.tar.gz
#If the following site stops allowing downloads you will need to find another
wget -v  http://cri.univ-tlse1.fr/blacklists/download/blacklists.tar.gz -O blacklists.tar.gz -o \
/var/log/blacklists_dl.log
#If an error occurs during download then an email will be sent via the mail program and then will exit.
[ $? -gt 0 ] && echo "See /var/log/blacklists_dl.log on" \
`uname -n` |/bin/mail -s'Blacklist DL Error' colin@scottmission.com && exit 1
tar -zxf blacklists.tar.gz
chown -R root.root blacklists
chmod -R 640 blacklists
find blacklists -name new\* -exec rm {} \;
rm -f blacklists/README 
chmod ug+x blacklists
chmod ug+x blacklists/*

dansguardian.conf & dansguardianf1.conf

The only setting that is vital for you to configure in the dansguardian.conf file is the accessdeniedaddress setting. You should set this to the address (not the file path) of your Apache server with the perl access denied reporting script. For most people this will be the same server as squid and DansGuardian. If you really want you can change this address to a normal html static page on any server.

Reporting Level

You can change the reporting level for when a page gets denied. It can say just 'Access Denied', or report why, or report why and what the denied phrase is. The latter may be more useful for testing, but the middler would be more useful in a school environment. Stealth mode logs what would be denied but doesn't do any blocking.

Logging Settings

This setting lets you configure the logging level. You can log nothing, just denied pages, text based and all requests. HTTPS requests only get logged when the logging is set to 3 - all requests.

Log Exception Hits

Log if an exception (user, ip, URL, or phrase) is matched and so the page gets let through. This can be useful for diagnosing why a site gets through the filter.

Log File Format

This setting alters the format of the DansGuardian log file. Please note option 3 (standard log format) is not yet unimplemented.

Network Settings

These allow you to modify the IP address that DansGuardian is listening on, the port DansGuardian listens on, the IP address of the server running squid as well as the squid port. It is possible to configure the Access Denied reporting page here also.

Content Filtering Settings

Here you can modify the location of the list files. Adjusting these locations is not recommended.

Naughtyness limit

This setting refers to the weighted phrase limit over which the page will be blocked. Each weighted phrase is given a value either positive or negative and the values added up. Phrases to do with good subjects will have negative values, and bad subjects will have positive values. See the weightedphraselist file for examples. As a rough guide, a value of 50 is for young children, 100 for older children, 160 for young adults.

Show weighted phrases found

If enabled then the phrases found that made up the total which exceeds the naughtyness limit will be logged and, if the reporting level is high enough, reported. The logged message will look like this.

DENIED* Weighted phrase limit of 50 : 60 ((pink, lips)+(proxy, block)+(proxy, filter)+-main+-transparent+-tumor)\ 
 GET 115503 60 Proxies, Pornography 1 403 text/css   -

The 50 : 60 is the weight. The first number inicates your default allowable weight or naughtyness limit. The second number represents the weight for the site that the user went to. In this case the site is blocked because the second number representing the site is greater than the allowed limit. The weight based on the reasons given on the rest of the line.

Reverse Lookups for Banned Sites and URLs

If set to on, DansGuardian will look up the forward DNS for an IP URL address and search for both in the banned site and URL lists. This would prevent a user from simply entering the IP for a banned address. It will reduce searching speed somewhat so unless you have a local caching DNS server, leave it off and use the Blanket IP Block option in the bannedsitelist file instead.

Build bannedsitelist and bannedurllist Cache Files

This will compare the date stamp of the list file with the date stamp of the cache file and will recreate as needed. If a bsl or bul .processed file exists, then that will be used instead. It will increase process start speed by 300%. On slow computers this will be significant. Fast computers do not need this option.

POST protection (web upload and forms)

This is for blocking or limiting uploads, not for blocking forms without any file upload. The value is given in kilobytes after MIME encoding and header information.

Username identification methods (used in logging)

The proxyauth option is for when basic proxy authentication is used (obviously no good for transparent proxying). The ntlm option is for when the proxy supports the MS NTLM authentication. This only works with IE5.5 sp1 and later, and has not been implemented yet. The ident option causes DansGuardian to try to connect to an identd server on the computer originating the request.

Forwarded For

This option adds an X-Forwarded-For: <clientIP> to the HTTP request header. This may help solve some problem sites that need to know the source IP.

Max Children

This sets the maximum number of processes to spawn to handle the incoming connections. This will prevent DoS attacks killing the server with too many spawned processes. On large sites you might want to double or triple this number.

Log Connection Handling Errors

This option logs some debug info regarding fork()ing and accept()ing which can usually be ignored. These are logged by syslog. It is safe to leave this setting on or off.


Further customisation

DansGuardian is highly configurable. The source code is available so you have the ultimate in configurability, although most people will be content with modifying the configuration files.

After you have modified any configuration file, to apply the changes you will need to restart DansGuardian.

There are two main configuration files, several banned lists and exception lists. These are all explained below:

exceptionsitelist

This contains a list of domain endings that if found in the requested URL, DansGuardian will not filter the page. Note that you should not put the http:// or the www. at the beginning of the entries.

exceptioniplist

This contains a list of client IPs who you want to bypass the filtering. For example, the network administrator's computer's IP.

exceptionmimetypelist

MIME stands for Multi-purpose Internet Mail Extensions. MIME types form a standard way of classifying file types on the Internet. Internet programs such as Web servers and browsers all have a list of MIME types, so that they can transfer files of the same type in the same way, no matter what operating system they are working in. If a site does not display properly with Dansguardian it is possible that the mime type is not being allowed. Look at the log file /var/log/dansguardian/access.log and view the message regarding the web site you are viewing. If it is a mime type that is being blocked you will see something like the following near the end of the line refering to the web site that is not resolving properly.

*DENIED* Banned extension: .com GET 0 0 Banned extension 1 403 application/json   -

In this case the mime type is application/json. If you feel you want to allow this mime type you may add the following application/json on a single line into the #/etc/dansguardian/lists/exceptionmimetypelist. This should be done carfully as now you are allowing this mime type. However it is not uncommon to add mime types. Of course after any changes run the command ...

/etc/init.d/dansguardian restart
exceptionuserlist

Usernames who will not be filtered (basic authentication or ident must be enabled).

exceptionphraselist

If any of the phrases listed here appear in a web page then the filtering is bypassed. Care should be taken adding phrases to this file as they can easily stop many pages from being blocked. It would be better to put a negative value in the weightedphraselist.

exceptionurllist

URLs in here are for parts of sites that filtering should be switched off for.

bannediplist

IP addresses of client machines to disallow web access to. Only put IP addresses here, not host names.

bannedphraselist

This contains a list of banned phrases. The phrases must be enclosed between < and >. DansGuardian is supplied with an example list. You can not use phrases such as <sex> as this will block sites such as Middlesex University. The phrases can contain spaces. Use them to your advantage. This is the most useful part of DansGuardian and will catch more pages than PICS and URL filtering put together.

Combinations of phrases can also be used, which if they are all found in a page, it is blocked. Exception phrases are no longer listed in this file - see exceptionphraselist.

banneduserlist

Users names, who, if basic proxy authentication is enabled, will automatically be denied web access.

bannedmimetypelist

This contains a list of banned MIME-types. If a URL request returns a MIME-type that is in this list, DansGuardian will block it. DansGuardian comes with some example MIME-types to deny. This is a good way of blocking inappropriate movies for example. It is obviously unwise to ban the MIME-types text/html or image/*.

bannedextensionlist

This contains a list of banned file extensions. If a URL ends in an extension that is in this list, DansGuardian will block it. DansGuardian comes with some example file extensions to deny. This is a good way of blocking kiddies from downloading those lovely screen savers and hacking tools. You are a fool if you ban the file extension .html, or .jpg etc.

bannedregexpurllist

This contains a list of banned regular expression URLs. For more information on regular expressions, see http://www.opengroup.org/onlinepubs/7908799/xbd/re.html

Regular expressions are a very powerful pattern matching system. This file allows you to match URLs using this method.

bannedsitelist

This file contains a list of banned sites. Entering a domain name here bans the entire site. For banning specific parts of a site, see bannedurllist. Also, you can have a blanket ban all sites except those specifically excluded in exceptionsitelist. You can also block sites specified only as an IP address, and include a stock squidGuard blacklists collection. To enable these blacklists, download them from the extras section http://dansguardian.org/?page=extras

Simply put them somewhere appropriate, un-comment the squidGuard blacklists collection lines at the bottom of the bannedsitelist file, and check the paths are correct. For URL blacklists, edit the bannedurllist in a similar way.

bannedurllist

This allows you to block specific parts of a site rather than the whole site. To block an entire site, see bannedsitelist. To enable squidGuard blacklists for URLs, you will need to download the blacklists and edit the squidGuard blacklists collection section at the bottom (as for bannedsitelist above).

weightedphraselist

Each phrase is given a value either positive or negative and the values are added up. Phrases to do with good subjects will have negative values, and bad subjects will have positive values. Once the naughtyness limit is reached (within dansguardian.conf), the page is blocked. See the Naughtyness Limit description within the dansguardian.conf section below.

pics

This file allows you to finely tune the PICS filtering. Each PICS section comes with a description of the allowed settings and what they represent. The default settings with DansGuardian are set for youngish children, for example mild profanities and artistic nudity are allowed. PICS filtering can also be totally disabled / enabled using the enablePICS = on | off option.

For more detailed information on PICS ratings, see http://www.w3.org/PICS/

contentregexplist
ICRA

The ICRA section is fairly self-explanatory. A value of 0 means nothing of that category is allowed, whereas a value of 1 allows it. For example,

ICRAnudityartistic = 1

allows nude art. For more in-depth information see http://www.rsac.org/

RSAC

RSAC is an older version of ICRA. The values here range from 0 meaning none allowed, through 2 (the default value), to 4, which allows wanton and gratuitous amounts of the given category. For more in-depth information see http://www.rsac.org/

evaluWEB

evaluWEB rating uses a system similar to the British Film classification system:

0 = U (Universal, ie. suitable for even the youngest viewer)

1 = PG (Parental Guidance recommended)

2 = 18 (Only suitable for viewers aged 18 and over)

SafeSurf

Similar to RSAC, but containing a larger range of categories with the range from 0 = full filtering to 9 = wanton and gratuitous. For more in-depth information, see http://www.safesurf.com

Weburbia

See evaluWEB. For more in-depth information, see http://www.weburbia.com/safe/index.shtml

Vancouver Webpages

This is yet another ratings scheme. See http://vancouver-webpages.com/VWP1.0/

for more information.