Stubborn Tech Problem Solving: 2011

20111215

Full-featured Ubuntu online installation using kickstart

This is an elaborate fault-tolerant Kickstart script for an on-line Ubuntu installation, optimized for home users, with extensive remote administration support and documentation. Not recommended for beginners.

This isn't just another trivial automated installation script although it started out that way. Basic installation presets led to integrated bug workarounds, setting defaults for many applications and servers, more features, etc. While you may disagree with some of my package choices, they were selected for my clients - not you. Change it if you have different needs. First, a little background on my deployments.

All of my clients have cheap desktop systems or laptops, usually outdated. Almost any CPU, chipset, GPU, and drive configuration. They're either stand-alone or connected together on small Ethernet networks. Some have broadband, some only dial-up (POTS). Ages vary from toddlers to senior citizens. A few are Windows gamers. This mix results in a wide variety of system hardware, peripherals, application requirements, and configurations. I've had to deal with most every type of kernel, application, and hardware bug. Every deployment unearths a new bug to fight. Some of these are Ubuntu's fault but many are upstream.

Inevitably I spend many hours doing full OS conversions to Ubuntu or dual-boot configurations. I've found that using a Live CD to install Ubuntu is about 4x faster than installing Windows when drivers, updates, and application installs are figured in. While I could set up slipstream builds of Windows I don't install it enough to bother with and the variety of versions (Home, Pro, upgrade, OEM,...) and licenses makes it impractical. Relatively speaking, I spend about 3x as long transferring documents, settings, and game/application files (scattered all over C:) to Ubuntu than I do installing either it or Windows. But I'll take any time savings I can get.

A while back, when Ubuntu 10.04 (Lucid Lynx) was released, I decided to streamline my installations. This wasn't just to save time. I also needed to make my installations more uniform as I couldn't remember all the various tweaks and bug fixes that I performed from installation to the next.
I had several goals for this project, not necessarily all at the beginning as some were the result of test installs, client feedback, and feature creep.

Fix all the bugs that my clients encountered on their existing installs plus all the other Ubuntu annoyances I've been manually correcting.
Do everything the "correct way" instead of blindly following HOW-TOs from amateurs that involved script and text file hacking that would be lost on the next update. I had to learn proper use of Gconf, PolicyKit, Upstart, init scripts, and dpkg.
Configure all of the network features that my clients had asked for, usually file or peripheral sharing. Internet content filtering for kids was a requirement.
Secure remote access and administration. It's bad enough when a client has a software problem. Having to waste time with an on-site visit is idiotic when it's not an Internet access problem and a broadband connection is available. The same kickstart configuration can be used for both an "administration" system as well as clients. Having them nearly identical makes both remote and verbal support easier.
Make it easier to obtain diagnostic and status information, for me and the client.
Research applications that meet customer needs and are stable. Configure them so the customer doesn't need to.
Document everything, especially anything I spent significant time researching.

On all of these I mostly succeeded. There are still a few gaps but they're minor (for my deployments at least) but after working on this for 18 months I needed to get on with my life. I figure that after a few million deployments I should break even. I'm now busy updating the dozen or so I currently have.

So what's in it? The base is just a plain 10.04 (i386 or amd64) installation. Two reasons for that - it's the LTS release and I didn't have time to upgrade to newer releases or workaround their new bugs. It's supported for another year or so. I probably update it for 12.04 after it is released (and clean up my code). Highlights:

Apache. Used for sharing the public directory (see below) and accessing the various web-based tools. The home page is generated from PHP and attempts to correct for port-forwarding (SSH tunnel) if it detects you are not using port 80.

Webmin. It's the standard for web-based administration. I added a module for ddclient (Dynamic DNS). The module is primitive but usable and I fixed the developer's Engrish.

DansGuardian. Probably three months work on just this. For content filtering there isn't really anything else. Unfortunately it has almost no support tools so I had to write them. Most of these have been announced in previous blog postings although they've been updated since then. The most complicated is "dg-squid-control" which enables/disables Squid, DansGuardian, and various iptables rules. Another loads Shalla's blacklist. It doesn't have system group integration so I wrote "dg-filter-group-updater" to semi-integrate it. There are four filter groups - no access, restricted (whitelist with an index page), filtered, and unrestricted. I added a Webmin module for it I found on Sourceforge. It's not great but makes it easier to modify the grey and exception lists. Included are lists I wrote that allow access to mostly kid sites (a couple of hundred entries). The entries have wiki-style links in comments that are extracted by "dg-filter-group-2-index-gen" to create the restricted index page. There's a How-To page for proxy configuration that users are directed to when they try to bypass it.

The only limitation is that browser configurations are set to use the proxy by default but dg-squid-control doesn't have the ability to reset them if the proxy is disabled. I spent two weeks working on INI file parsing functions (many applications still use this bad Windows standard for configuration files). While they seem to work I need to significantly restructure the tool to make use of them.

DansGuardian had no development for a few years but recently a new maintainer is in charge and patches are being accepted. Hopefully full system account integration will be added.

UFW. The Uncomplicated Firewall is a front-end to iptables and there is a GUI for it. One feature it has is application profiles, which make it easy to create read-to-use filter rules. I created about 300 of them for almost every Linux service, application, or game (and and most Windows games on Wine).

File sharing. The /home/local directory is for local (non-network) file sharing between users on the same system. There is also a /home/public directory that is shared over Samba, HTTP, FTP, and NFS. WebDAV didn't make the cut this time around.

Recovery Mode. I added many scripts to the menu for status information from just about everything. Several of my tools are accessible from it.

SSH server. You make a key with ssh-keygen, client_administrator_id_dsa (should be encrypted), and include the public (*.pub) part in the kickstart_files/authentication sub-directory. It is added to the ssh configuration directory on every system. Using another tool, "remote-admin-key-control", system owners (sysowner group) can enable or disable remote access. This is for several reasons including privacy, liability, and accounting (for corporate clients where the person requesting support may not have purchase authority).

When the remote-admin-key-control adds the key to the administrator account ~/.ssh/authorized_keys, you can connect to the system without a password using the private key (you still need to enter the key passphrase). The radmin-ssh tool takes this one step further and forwards the ports for every major network service that can function over ssh. It also shows example command lines (based on the current connection) for scp, sftp, sshfs, and NFS. You still need the administrator password to get root access.

X2Go. Remote desktop access that's faster than VNC. Uses SSH (and the same key).

OpenVPN. A partially configured Remote Technical Support VPN connection is installed and available through Network Manager. If the client system is behind a firewall that you can't SSH through, the client can activate this VPN to connect to your administration system so that you can SSH back through it. Rules for iptables can be enabled that prevent the client accessing anything on the administration system. It connects using 443/udp so should work through most firewalls.

Books and guides. Located in the desktop help menu (System > Help) is a menu entry that opens a directory for books. My deployments have subdirectories with Getting Started with Ubuntu 10.04 - Second Edition from the Ubuntu Manual Project and OpenOffice.org user guides. You can easily add more as the kickstart script grabs everything in its local-books subdirectory. For the end-user I wrote networks-and-file-sharing-help.html (in the same help menu).

For the installer the main source of documentation is the kickstart script itself. I got a little carried away with comments. The next major document is TODO.html which is added to the administrator's home directory. It was intended to list post-install tasks that needed to be completed since there are many things the installer can't do (like compile kernel modules). After adding background information on the various tasks, troubleshooting help, and example commands, it's basically another book. You should read it before using the kickstart script.

Scanner Server. Allows remote access to a scanner through a web interface. Simpler than using saned (but that is also available if you enable it). It had several bugs so I fixed it and added a few features (with help from a Ubuntu Forum member pqwoerituytrueiwoq). Eventually we hit the limit of what it could do so pqwoerituytrueiwoq started writing PHP Server Scanner as a replacement. For a 12.04 release I will probably use that instead. I wrote "scanner-access-enabler" to work around udev permission problems with some scanners (especially SCSI models).

Notifications. Pop-up notices will be shown from smartd, mdadm, sshd, and OpenVPN when something significant happens. Without the first two the user doesn't know about pending drive problems until the system fails to boot. I've also had them turn the system off when I was in the process of updating it and the SSH notification helps prevent that. The OpenVPN notification is mostly for the administration system and includes the tunnel IP address of the client. OpenSSH has horrible support for this kind of scripting. OpenVPN's scripting support is absolutely beautiful.

Webcam Server. A command-line utility that I wrote a GUI for. It has a Java applet that can only be accessed locally but a static image is available from the internal web server to anywhere.

BackuPC. It uses its default directory for backups so don't enable it unless you mount something else there. A cron job will shut the system down after a backup if there are no users logged in. It has been somewhat hardened against abuse with wrapper scripts for tar and rsync.

There are many bugs, both big and small, that are either fixed or worked around. The script lists the numbers where applicable. The TODO documents lists a bunch also. Some packages were added but later removed (Oracle/Sun Java due to a licensing problem, Moonlight since it didn't work with any Silverlight site I tested).

There are some limitations to Ubuntu's kickstart support. I'm not sure why I used kickstart in the first place. Perhaps the name reminded me of KiXtart, a tool I used when I was a Windows sysadmin. Kickstart scripts are the standard for automating Red Hat installations (preseeding is the Debian standard) but Ubuntu's version is a crippled clone of it. In part it acts like a preseed file (even has a "preseed" command) but also has sections for scripts that are exported and executed at different points during the installation. About 90% of the installation occurs during the "post-install" script. The worst problem with Ubuntu's kickstart support is that the scripts are exported twice and backslashes are expanded both times. This means that every backslash has to be quadrupled. This gets real ugly with sed and regular expressions. Because of this you'll see "original" and "extra slashy" versions of many command lines. I wrote quad-backslash-check to find mistakes.

The other problem is that the way the script is executed by the installer hides line numbers when syntax errors occur, making debugging difficult. I wrote quote-count and quote-count-query to find unmatched quotes (and trailing escaped whitespace that was supposed to be newlines) which were the most common cause of failure.

I've made an archive of my kickstart file, its support files, and configuration files for various services on my server for you to download (12.5MB, MD5: b5e79e6e287da38da75ea40d0d18f07f ). The script, error checking and ISO management tools, and server configuration files are in the "kickstart" sub-directory. A few packages are included because they are hard to find but others are excluded because of size. Where a package is missing there is a "file_listing.txt" file showing the name of the package I'm using. My installation includes the following which you should download and add back in:

Amazon MP3 Downloader (./Amazon/amazonmp3.deb)
DansGuardian Webmin Module (./DansGuardian Webmin Module/dgwebmin-0.7.1.wbm)
Desura client (./Desura/desura-i686.tar.gz)
G'MIC (./GMIC/gmic_1.5.0.7_*.deb)
Gourmet (./Gourmet/gourmet_0.15.7-1_all.deb)
VMware Player (./VMware/VMware-Player-*.bundle)

VMware Player is optional. It has kernel modules so the kickstart script only retrieves the first install file returned from the web server whose name matches the architecture. It puts it in /root for later installation.

The target systems need network-bootable Ethernet devices, either with integrated PXE clients or a bootable CD from ROM-o-matic.

You need a DHCP sever that can send out:

filename "pxelinux.0"
next-server

The tftp server needs to serve the pxelinux.0 bootstrap, vesamenu.c32, and the menu files. These are available from the Ubuntu netboot images. The bootstrap and vesamenu.c32 are identical between the i386 and amd64 versions, only the kernel, initrd, and menus are different. You can use my menu files instead of the standard set in the netboot archive. The most important is the "ubuntu.cfg" file. You'll notice that my menu files list many distros and versions. Only the utility, Knoppix, and Ubuntu menus function fully. The rest are unfinished (and probably obsolete) experiments. FreeDOS is for BIOS updates.

My tftp server is atftpd which works well except it has a 30MB or so limit on tftp transfers. This only affects the tftp version of Parted Magic (they have a script to split it up into 30MB parts). It is started by inetd on demand.

I use loopback-mounted ISOs for the kickstart installs and all LiveCDs netboots. Because I have so many, I exceeded the default maximum number of loopback nodes available. I set max_loop=128 in my server's kernel command line to allow for many more.

The Ubuntu Minimal CD ISOs are the source for the kernel and initrd for the kickstart install. The architecture (and release) of the kernel on these ISOs must match the architecture of Ubuntu you want to install on the target system. You'll probably want both the i386 and amd64 versions.

PXE Linux doesn't support symlinks so my ISOs are mounted in the tftp directory under ./isomnt. Symlinks to the ISOs are in ./isolnk and are the source of the mounts. I set it up this way originally because the ISOs were in /srv/linux in various subdirectories so having the links in one place made it easier to manage. But my ISO collection grew too big to manage manually so I wrote "tftp-iso-mount" that creates the mountings for me. It searches through my /srv/linux directory for ISO files and creates isomnt_fstab.txt that can be appended to fstab. It also deletes and recreates the isomnt and isolnk directories and creates the "isomnt-all" script to mount them.

The ISOs are accessed through both NFS and Apache. I originally intended to use NFS for everything but I found that debian-installer, which performs the installation and executes the kickstart script (also on the "alternate" ISOs), doesn't support NFS. So I had to set up Apace to serve them. The Apache configuration is rather simple. There are a few symlinks in /var/www that link to various directories elsewhere. One named "ubuntu" links to /srv/linux/Ubuntu. The kickstart support files are placed in /srv/linux/Ubuntu/kickstart_files and are accessed via the link. NFS is still used for booting LiveCDs (for bug testing and demos). There is also a "tftp" symlink to /srv/tftp used for local deb loading (see below).

The kickstart script itself, Ubuntu-10.04-alternate-desktop.cfg, is saved to /srv/tftp/kickstart/ubuntu/10.04/alternate-desktop.cfg after being backslash and quote checked.

Several preseed values are set with the "preseed" command at the beginning of the script. You'll probably want to change the time zone there. License agreements are pre-agreed to as they will halt the installation if they prompt for input.

Like I mentioned earlier, the vast majority of work happens in the post-install script. The executes after the base Ubuntu packages are installed. The most important variable to set is $add_files_root which must point to the URL and directory of your web server where the rest of the kickstart support files are located (no trailing backslash). The script adapts for 32-bit and 64-bit packages as needed based on the architecture of the netboot installer. There is also a "late_command" script that executes near the end of the installation, after debian-installer creates the administrator account (which happens after the post-install script finishes).

The debug variables are important for the initial tests. The $package_debug variable has the most impact as it will change package installations from large blocks installed in one pass (not "true") to each package individually ("true"). When true, it slows down installation significantly but you can find individual package failures in the kickseed-post-script.log and installer syslog (located in /var/log/installer after installation). Setting $wget_quiet to null will guarantee a huge log file. The $script_debug variable controls status messages from the package install and mirror selection functions.

The $mirror_list variable contains a list of Ubuntu mirrors (not Medibuntu or PPAs) that should have relatively similar update intervals. This is used by the fault-tolerant mirror selection function, f_mirror_chk, that will cycle through these and check for availability and stability (i.e., not in the middle of sync). The mirrors included in the list are good for the USA. These are exported to the apt directory so that the apt-mirror-rotate command can use them to change mirrors quickly from the command line or through the recovery mode menu. When a package fails to be installed via the f_ftdpkg and f_ftapt functions, another mirror will be tried to attempt to work around damaged packages or missing dependencies.

To save bandwidth the post-install script looks for loopback mounted ISOs of the Ubuntu 10.04 live CD and Ubuntu Studio (both i386 and amd64 versions) in the isomnt sub-directory via the tftp link in the Apache default site. It copies all debs it finds directly into the apt cache. It also copies the contents of several kickstart support sub-directories (game-debs* and local-debs*). This is a primitive way to serve the bulk of the packages locally while retrieving everything else from the mirrors. You need to change the URLs in the pre-load debs section to the "pool" sub-directories of the mounted ISOs in "./tftp/isomnt/".

Because loading this many debs can run a root volume out of space, the $game_debs variable can be used to prevent game packages from being retrieved. Normally you should have at least a 20GB root (/) volume although it could be made smaller with some experimentation. An alternative to this method would be a full deb-mirror or a large caching proxy.

Set the OpenVPN variables $openvpnurl to the Internet URL of your administration system or the firewall it's behind. Set $openvpnserver to the hostname of your administration system (which can have the same values as it won't be connecting to itself).

Basic usage starts with netbooting the client system. Some have to be set to netboot in the BIOS and some have a hotkey you can press at POST to access a boot selection menu. The system then obtains an address and BOOTP information from the DHCP server. It then loads pxelinux.0 from the TFTP server which will in turn load vesamenu.c32 which displays the "Netboot main menu". Select Ubuntu from the list and look for the Ubuntu 10.04 Minimal CD KS entries. Select the one for your architecture and press the Tab key to edit the kernel boot line. Set any kernel parameters you want to be added to the default Grub2 configuration after the double dash (--), like "nomodeset". Set the hostname and domain values for the target as these are used in several places for bug workarounds and configurations. Then press Enter. The installer should boot. If nothing happens when you press Enter and you are returned to the Ubuntu boot listing menu, verify the ISOs are mounted on the server then try again (you will need to edit the entry again).

If there are no problems then you will be asked only two questions. The first is drive partitioning. This can be automated but my client systems are too different to do so. Then next question will be the administrator password. After that it will execute the post-install script and late-command scripts then prompt you to reboot. Just hit the enter key when it does as Ctrl-Alt-Delete will prevent the installer from properly finishing the installation (it's not quite done when it says it is). Full installation will take 2-3 hours depending on debug settings, availability of local debs, and Internet speeds.

In case of problems see the TODO document which has a troubleshooting section. The only problems I've had installing was missing drivers or bugs in the installer (especially with encrypted drives - see the TODO). My Dell Inspiron 11z, which has an Atheros AR8132/L1c Ethernet device, wasn't supported by the kernel the minimal CD was using. To work around it I made a second network connection with an old Linksys USB100TX. The Atheros did the netboot (the Linksys does not have the capability) but the installer only saw the Linksys afterwards and had no problems using it (other than it being slow).

I welcome comments and suggestions (other than my package choices and blog color scheme :D).

20111125

Haphazard proxy support in Linux programs

Some of my clients require Internet content filtering on computers their kids are using. The solution to that is DansGuardian. While it has many problems there really isn't a better F/OSS alternative. Its development has been stagnant for years but recently a new maintainer joined the project so submitted patches are being applied to fix bugs and add features (like system group integration).

DansGuardian requires a proxy. The common options are TinyProxy and Squid. TinyProxy has a few annoying bugs so I use Squid with my clients. One challenge with content filtering is preventing the proxy from being bypassed. The two solutions are transparent interception or an explicit-proxy with dropping of connections that aren't destined for the proxy ports.

With a transparent proxy all outgoing connections are routed via iptables rules to DansGuardian regardless of the client settings. While this simplifies deployment by eliminating client configuration it also prevents using different content filtering levels on a per-user basis as it masks the source port of the connection. Without the source port the associated user can't be identified. Since the systems I maintain have a variety of users within the same household and thus different filtering requirements, this doesn't meet their needs.

The alternative method is to use iptables rules that drop connections that aren't destined for the DansGuardian. Here are the nat rules that I use:

*nat :PREROUTING ACCEPT :POSTROUTING ACCEPT :OUTPUT ACCEPT -A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 80 -j REDIRECT --to-ports 8090 -A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 443 -j REDIRECT --to-ports 8090 -A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 21 -j REDIRECT --to-ports 8090 -A OUTPUT -p tcp -m tcp --dport 3128 -m owner ! --uid-owner dansguardian -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -j REDIRECT --to-ports 8080 COMMIT

Fairly simple but note that I'm not dropping the packets. Any TCP connection that is destined for ports 80 (HTTP), 443 (HTTPS), and FTP (21) are rerouted to port 8090. Some accounts are excluded to prevent false-positive blocking by DansGuardian.

DansGuardian is using port 8080 (and connects to Squid on 3128). So what is 8090? Its an Apache server. One of the problems with programs that aren't configured to use the proxy is that the users won't know why their connections are failing. The web site, known as a network billboard, displays a page that informs them that their programs need to be configured to use the proxy and how to do it. This is much friendlier than just dropping the packets. DansGuardian uses ident2 to identify the user that is the source of the connection and applies the filtering rules specific to the filter group they are assigned to.

This configuration works very well with web browsers. Most use the system proxy settings through gconf on Gnome. Some need manual configuration so I created default configuration files and put them in /etc/skel so that new user accounts have them at creation. Unfortunately, many other programs rely on environment variables to determine the proxy address and Ubuntu's proxy configuration tool (gnome-network-properties) has a really stupid bug and they aren't set correctly. Some are set in bash in terminal windows but not in the session so any graphical program that doesn't use gconf fails to access the proxy correctly. It's easy to demonstrate. Open a terminal window and enter:

tail -f ~/.xsession-errors

Then create a custom application launcher in the panel and enter "printenv" for the command. Then just click it and check the output from tail. On my system, variables for "HTTP_PROXY" and the like aren't present. I created a fix for this. Just extract the file and add it to the end of ~/.profile and relogin. Run the tail/printenv commands again with a proxy set in System>Preferences>Network Proxy. Add this fix to /etc/skel/.profile to use it as the default for new user accounts.

Even with this fix it is surprising is how many Internet-using programs don't support proxies correctly. I tested every streaming media player I could find and a few other programs and here are the results with my systems (Ubuntu 10.04 Lucid Lynx i386 and amd64):

Clementine (0.7.1): Neither Last.fm and SomaFM work. Jamendo lists songs but doesn't play them but this is due to Ogg problems at Jamendo. Unlike other players Clementine's plug-in for Jamendo is not configurable for MP3 so I couldn't work around it. Mangatune and Icecast work.

Rhythmbox (0.13.1): Jamendo failed to work. Magnatune was really slow to load.

Miro (4.0.3-82931155): Could find video podcasts but not download them (except VODO which uses BitTorrent). Its integrated web browser would always show the network bulletin for any other link in the side panel.

Banshee (2.0.1): Internet Archive links work. Live365.com and xiph.org show results but nothing plays (I can copy the xiph links to VLC and they play). Miro Guide works (unlike Miro) but likes to freeze. Amazon MP3 Store, Jamendo, Magnatune (both extensions), RealRadios.com, and SHOUTcast.com extensions fail to load. Last.fm would log in but not much else. I noticed that according to ~/.xsession-errors Banshee is an exceptional media player.

Gnome MPlayer (0.9.9.2): Nothing fancy but it functioned with the streams I tried.

VLC (1.0.6): About the same as Gnome MPlayer. A lot of complaints about some playlists like radio.wazee when it encounters unavailable entries. Needs a less ugly way to handle error messages with playlists of Internet streams since they are usually just alternate servers.

Google Earth (6.1): It would connect to the DB and you could navigate the worlds but none of the Panoramio pictures would show. Wikipedia entries wouldn't show after being enabled until the app was restarted. Even then, clicking on "Full Article" resulted in the network bulletin page being shown (webkit?). Changing the preferences to use an external browser is an adequate workaround.

Totem (2.30.2): Functioned but was picky about some streams (radio.wazee).

gPodder (2.2): Useless.

Hulu beta functions but is mostly relying on Flash.

Skype beta (2.2.0.35): Connected to their network without problems and I successfully called their sound testing service.

Sun Java Plug-in (1.6.0_26 in Firefox 3.6.24): Useless with a proxy. Even without a proxy you have to work around IPv6 bugs (Debian bug #618725). With that working the online test usually fails and I've found that Pogo.com Boggle Bash is a better test. Manually setting the proxy with jcontrol doesn't have any effect. Debian is dropping the plug-in so it may not matter.

FrostWire (5.1.5): Useless with a proxy. It uses Java so not surprising. It has its own proxy settings but it couldn't connect to anything even with manual settings.

Update - Added a few more tests:

Desura (110.22): Could login and see items I had ordered (free demos) but could not download them for installation or show any web pages. Some of the links on the menu bar opened in Firefox but showed the network bulletin. Apparently it was resolving the links (maybe querying their servers) to localhost:8090 and then sending that to the default browser even though Firefox could access the Internet through the proxy without problems.

Konqueror (4.4.5): No problems (KHTML).

Epiphany (2.30.2): No problems (webkit).

X-Moto (0.5.9): No problems. Can use environment variables, manually-specified proxy, or SOCKS proxy.

DraftSight (Beta V1R1.3): Couldn't connect to the registration server initially. The browser in the Home panel showed the network bulletin. Setting the proxy manually in "Tools>Options>System Options>General>Proxy server settings" and restarting allowed the registration to function but not the Home panel browser. I found that reapplying the proxy settings (without changing anything) then right-clicking the Home panel and reloading it fixed the problem for that session but it would reoccur if DraftSight was restarted.

Clarification: My proxy configuration doesn't use authentication or SOCKS. My bug work-around script supports the environment variables for authentication but I didn't test it.

Update 20111202: I removed Sun Java because of the security problems and switched to OpenJDK/IcedTea6 (1.9.10) but it didn't do any better. I did try FrostWire again with a manually specified proxy but it had no effect. I did come across an interesting Java library for proxy detection named proxy-vole but it won't solve my immediate problem.

Update 20111204: Corrected the DansGuardian/Squid port usages mentioned in the article and added a forgotten DansGuardian anti-bypass iptables rule. They now match my test environment.

I think part of the problem is that the developers test against a proxy and if the program works then its assumed to be proxy-compatible. That can be misleading, especially when multiple components are involved, as some may use the proxy while others access the network directly (Miro being a prime example). Adding some iptables rules to drop anything bypassing the proxy would close that testing hole.

20111123

Documentation standards for commands

Here are some references for shell script developers, man page creators, README writers, etc. While documentation styles are a bit haphazard and vary with OS and programming language, there are some standards.

For man pages see man-pages(7). What does that mean? You open a terminal window then type:

man 7 man-pages

The GNU project has some guidelines on writing software manuals. They recommend using Texinfo to create them.

The Debian Policy Manual says where the different documentation files should be located but not what they should look like.

The most detailed standard I've found is the Open Group Base Specifications utility conventions and typographical conventions.

I'm not going to admit to following these but please post any other IT technical writing style guides you know of. :D

20110921

Extracting EML files

EML files are a problem for some of my users on Ubuntu. They receive these as Email attachments but can only view them as text (usually in gedit) even if they contain pictures. The senders are probably using Outlook Express or a related mail application to attach them. While some non-Microsoft mail clients can open them properly this is a hassle for my users as they all use web mail. There is a command-line tool, munpack, that will extract non-text objects automatically (part of the mpack package in Ubuntu/Debian). To make it easier for them I wrote a little script that integrates munpack with their file manager via a mime type association. To use it, download munpack_eml and extract the files. Put munpack_eml in /usr/local/bin with root ownership and u=rwx,go=rx (0755) permissions. Put munpack_eml.desktop in /usr/local/share/applicatons with root ownership and u=rw,go=r (0644) permissions. Then right-click on any *.eml file from your file manager and you should see and option to extract the contents with munpack.

20110829

Simple off-site backup of a MD RAID 1 system

Standard backup tools like BackupPC are great for backing-up moderate amounts of user data but they can be impractical with huge data stores such as multi-terabyte RAID arrays as they need a backup store that is larger than the source data. My simple solution is to clone the array with another drive and store it off-site.

For this to work I had to categorize the data between smaller dynamic files (like documents) and larger static files (videos). The smaller files are backed up daily with BackupPC. The larger files are not backed up. Both are stored on a RAID 1 (mirror) array for redundancy in case of drive failure. On my server BackupPC uses a different, smaller RAID 1 array for a backup store. Since it is only backing up part of the data it doesn't have to be the same size as the main array. For backing up the larger/static files (and everything else) I simply add another drive to the main array, let it sync, then remove it and store off-site.

Ideally this system would use hot-swap but I don't have removable bays so I have to power-off the server each time. The rest of the procedure is relatively easy. With a RAID 1 array I have two drives (sda, sdb) and the added drive may show up as sdc. I say "may" because Ubuntu uses UUIDs for drive mappings and the actual device assignments may change. I always check with:

cat /proc/mdstat

to verify what devices are being used. I also check the partition sizes of all drives using "fdisk -l" and make sure the new drive has the same size partitions as the original RAID members. The partitions need to be of type fd "Linux raid autodetect" but no formatting with mkfs is necessary. Next I grow each RAID 1 MD device from 2 to 3 devices. For example:

mdadm -G -n 3 /dev/md0

This just tells the kernel that the array will now have three devices but does not assign another device to it. To allocate the device:

mdadm -a /dev/md0 /dev/sdc1

Resync should begin immediately. To monitor, I just use "cat /proc/mdstat" but the kernel will also send status messages to the console. After resynching, I disable the backup device by failing it:

mdadm -f /dev/md0 /dev/sdc1

This results in the RAID degradation warnings to be emailed to root. Next I remove it:

mdadm -r /dev/md0 /dev/sdc1

Finally, I shrink the array back to two devices:

mdadm -G -n 2 /dev/md0

This works well for my simple server setup. Obviously some scripting could be used to automate it. While this works well for a 2-drive RAID 1 array, it doesn't scale well with a larger number of drives or other RAID types.

20110117

Expanding Ubuntu Recovery Mode

Recovery Mode is a text-based interface to a few quick repair tools that is installed by default with most Ubuntu releases and derivatives. I wrote a few add-ons for it that increase its usefulness in remote repair and diagnostics situations. These were developed and tested on Ubuntu 10.04 (Lucid Lynx).

Starting Ubuntu in Recovery Mode (aka. Friendly Recovery) is relatively easy. Just hold down the shift key after the BIOS POST to get Grub2 to show its menu, then just select the kernel with the "recovery" option. Also note the memtest86+ option which is useful for identifying bad RAM.

Adding on to Recovery Mode is relatively simple. At its heart is a shell script, "/usr/share/recovery-mode/recovery-menu", that is started at the end of the single mode (runlevel S) boot. It looks through the options subdirectory and starts every script it finds, passing it a parameter of "test". It looks for a return status of 0 and the description of the script on stdout. Scripts with valid responses are added together and shown in a menu listing using the whiptail dialogger. The user selects one from the menu to execute it.

My additions are more informative than corrective. The intention is to help with diagnostics when dealing with a remote non-technical client. They are also useful for beginners who lack command-line experience and simply don't know where to look for system status information.

Many of my scripts check their respective system configuration and return a non-zero status if required executables are not installed or configured. This keeps the menu from getting cluttered. For example, the sensors script checks for output from the sensors command. Lack of such indicates that the hardware sensors haven't been configured with sensors-detect or the required modules haven't been added to /etc/modules. When this happens it does an exit 1 when started with the test parameter. The ddclient script looks for run_daemon="true" in /etc/default/ddclient and the presence of the ddclient executable. The ssh script looks for the sshd process and its description changes if it is found or not. If you write your own, the only limitation to keep in mind is that the description returned should be 45 characters or less as longer ones will corrupt the whiptail display.

Some of the scripts deserve special attention:

shallablud: works with my shall-bl-update v1.3 or later. It forces an update to the Shalla blacklists for DansGuardian.

lynx: requires the Lynx text browser. It does a su to the default admin member (the first one listed in the admin group) before starting. It defaults to the DynDNS.com check IP page. I used Lynx because it has options for lockdown (prevent shell escapes, etc.) that the others don't offer.

wicd: requires wicd-curses. While the netroot script already provides network activation before switching to a shell, it just starts dhclient to get an IP address and nothing else. This was something I requested back in Hardy. It's better than nothing but is rather useless if you only have a wireless connection. Wicd solves the problem but creates another - it conflicts with Network Manager. Luckily the packages themselves don't conflict on Lucid but the daemons do. The script will stop Network Manager before starting wicd-curses (which starts the wicd daemon). To keep this from happening when starting wicd from a root shell you need to either stop Network Manager first or modify the Upstart job configuration to keep it from starting in recovery mode (runlevel S). The conf file also needs to be diverted by dpkg to keep it from being overwritten on updates (and reverting the changes). The commands to do this are:

dpkg-divert --rename --divert /etc/init/network-manager.conf.original /etc/init/network-manager.conf cp /etc/init/network-manager.conf.original /etc/init/network-manager.conf sed -i 's/$.*and started dbus$$).*$/\1\n\t and runlevel [!S]\2/' /etc/init/network-manager.conf

You need to either add a sudo in front of these or open a root terminal with "sudo su". The divert tells dpkg to rename the file and always redirect new installations to "network-manager.conf.original". The file is then copied back to use as a template. The sed expression then adds a condition to not start in runlevel S.

This only solves half of the problem. The Wicd daemon still needs to be prevented from starting during regular operation (runlevel 2) unless you plan to use it instead of Network Manager. Wicd's configuration hasn't been changed to Upstart yet so it's still using init scripts. To disable it do:

mv /etc/rc2.d/S20wicd /etc/rc2.d/K80wicd

This by itself is not enough. If wicd-gtk is installed, it will start when the desktop loads and start the daemon if it is not active. You need to purge it with aptitude or apt-get. In addition, another function somewhere will also start the wicd daemon. The only option I've found is to change the wicd executable, which is just a script that starts the daemon with Python, to not function unless the runlevel is single mode. These commands will make the change:

dpkg-divert --rename --divert /usr/sbin/wicd.original /usr/sbin/wicd cp /usr/sbin/wicd.original /usr/sbin/wicd sed -i 's/$[[:space:]]*exec[[:space:]]\+.*$/[ \"$RUNLEVEL\" = \"S\" ] \&\& \1/' /usr/sbin/wicd

If you make this change you won't have to disable the init script. You will also have to fix the AppArmor profile for dhclient so that wicd can use it (bug #588635). Just add the text in the report before the entry for Network Manager.

One option that isn't listed in the menu is "fsck". This is easy to fix as the script just needs execute permission (bug #566200).

Currently the "resume" option doesn't function (bug #651782).

If you want to prevent the "root" and "netroot" options from providing an uncontested root prompt try my rootlock.

Consider a theoretical example of how this all works with a remote user. They have a problem with X not starting and contact you. They are a considerable distance away and don't have time to ship their PC to you for repair. The system is bootable and they have high-speed Internet so remote access is possible. You tell them how to enter Recovery Mode and how to start wicd. It automatically gets an IP from a wired connection but if they are using wireless they have to select an AP from whatever wicd finds. If they are using Network Manager and their normal wireless connection is encrypted, you will have to set it up beforehand with wicd as SSIDs and keys aren't shared with Network Manager (or the root account which is the one being used here). If they have a dynamic WAN IP address then you have them start ddclient (which also needs to have been configured) or start Lynx and read to you the WAN IP from DynDNS.com. Then they can start sshd. At this point you should be able to access it remotely over SSH assuming that any intervening firewall/NAT routers are forwarding the correct ports. Obviously you should be using key-based authentication with SSH, not passwords. If you can't access it remotely you can still have them perform updates with the dpkg option (also an upgrade), fix the X configuration with failsafeX, or read you the root mail, SMART drive status, and sensor readings (if configured).

Obviously many problems can't be fixed this way but if it saves you a road trip or two it's worth it.

Update: I filed bug #706145 to get these into Ubuntu. Following the normal submit/reject/resubmit/ignore cycle it should be in the repositories within a few years.

Stubborn Tech Problem Solving