Diary and notebook of whatever tech problems are irritating me at the moment.

20101124

Two more utilites for DansGuardian Users

I reduced the DansGuardian user account and blacklist maintenance hassles with my previous two utilities but while working on whitelisting I found the need for a few more.

In DansGuardian (DG) terms a blacklist bans something, a greylist allows something (overrides blacklisting) but still filters it, and an exceptionlist allows something without filtering (overriding greylists and blacklists). The "something" can be URLs, IP addresses, server names, etc., depending upon the specific list type. Blacklisting a site is easy but blacklisting a specific type of content is very difficult and error-prone. It works the same way as anti-malware utility definitions - if the undesirable items are on the list, and they match a particular requested target, then it's blocked. If not, then it gets through. It's a big Internet out there and trying to block all the bad is rather difficult. "Bad" is also relative and what is bad for one person/religious group/company/government may not be bad for another. Whitelisting has the opposite problems in that you gain strict control over what is available but trying to predict where the user wants to go, determining if that is a safe destination, and maintaining the lists is also difficult.

I found that I needed to use both blacklisting and whitelisting. I use whitelisting with younger children and blacklisting for older. Older children won't put up with strict constraints and will either figure out how to bypass them or simply go somewhere else to browse the Internet. Younger children are easier to keep happy but you still have to spend time figuring out all the web sites they will want access to, preferably with the initial configuration so they're not whining every five minutes about another toy/game/whatever site they can't access.

With DG a "whitelist" configuration is basically a blacklisting of all sites with a "**" in the bannedsitelist file with entries in greylist and exceptionlist files to bypass it. The exceptionlist file entries will enable site access but this is not what you want for allowing a user to browse a particular site because it disables all filtering. Use greylist files instead. This way if there is an offensive part of a site that you didn't know about (or it gets defaced by black hats) then you still have the filters to rely on. The exception lists are useful for sites that are not normally browsed but may trigger the filters inadvertently such as Linux distro repositories using http.

One of the problems with whitelisting is that the user won't necessarily know where they can go on the Internet. To solve this problem you need an index page of some sort. This is the problem I encountered when creating my greylists and I came up with a solution.

I didn't want to maintain an index separately from the greylists so I figured out a way to embed the data in the lists. DG recognizes a # in the list as a comment. I added a comment at the end of each list entry with a Wiki-style link after it. This isn't all that unusual as Debian/Ubuntu did something similar with the menu.lst file in Grub. The comment hides data that isn't relevant to DG but the defined format allows extraction of the data to create an index. Soon after I started adding the links to the list entries I figured out two things - it was a lot of typing and was going to be a very big index. To organize the index better I added a category tag on the end which could be used in the index. The final format is:

<exception link> #[<URL><space><label>][Category:<category text>]

The brackets are required characters. The parsing is somewhat whitespace tolerant but in the Category tag don't leave any spaces between the colon and the category text (sed and regex expressions can be tedious). Example:

gutenberg.org # [http://www.gutenberg.org Project Gutenberg][Category:Books]

To save some typing I wrote add-exceptionlist-url-comments which creates a default URL comment. First it pads the end with tabs (up to 5) to keep it pretty. The default link is made by slapping an http protocol prefix on the exception entry. It then uses wget to try to fetch the default web page and scrape the page title to use as a default link label. This works for most pages and redirects but not those that are using a meta refresh. It finally adds an undefined category tag at the end. Anything in the list that starts with a # is ignored. Note that not every entry will need a link. Some sites you don't want may serve data to a site you want. A lot of USA government sites that are kid-specific will link to media on the main government sites which aren't of interest to kids and just clutter the index. Some web stores also use third-party search services which will need exceptions but not links. In many cases you'll want a link that points to a specific part of the site, not just the server root, so you'll have to edit the defaults.

To create the index page I wrote exceptions-index-page-generator. It looks for the bracket-formatted URLs in the input files. It also builds a list of category tags, assigning a default tag (defined in the script) to any that are missing. It then creates a basic html file with entries separated by category. If a category has more than a certain number of entries (default 5 as defined in the script) it makes two columns to reduce the page length. It doesn't try to normalize category names so they must match in the entries exactly in order to be combined. It ain't pretty but it works. These are both command-line utilities but are rather easy to use.

UPDATE: I updated exceptions-index-page-generator. Version 1.1 adds a category table of contents to the top of the page. It will also make two columns of these if the number of categories exceeds the column threshold.

You can use my greylists to test with and as a base for your own lists for younger children. I haven't performed in-depth checking of these but they look relatively safe. Some of the entries may seem odd but they're intended to aid holiday gift buying. You will also notice that I used html entity codes in the labels for some punctuation as they didn't display correctly in Firefox.

No comments:

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.