User Tools

Site Tools


unix:webmastering

Webmastering

Websites have been a hobby of mine since I was about 15, and over time I've started to do some fairly advanced things to take care of simple, usually annoying, problems I encounter. This section is intended to serve as a reminder to myself of those tricks, and possibly of use to others as well.

Monitoring logs during development

It's really useful to be able to watch the error log from the webserver when you are developing a site, it'd be even more useful if you could be alerted when a new error has been logged.

tail -f error.log | grep -v --line-buffered favicon.ico | sed 's/$/\x07/'

First, open the log and follow appends to it, then filter this through grep and remove any lines with “favicon.ico” in them, then replace end of lines with bell characters to make the console beep/alert us on each new line.

:!: NB: It is important to use the “–line-buffered” swtich with grep or the next pipe will recieve no data (or at least recieve data in unusefully large chunks, making the data old).

"Content-Location:" being a nuisance

This one is a weird problem I came accross when trying to add some fancy url rewriting to a site — the plan was to have flat links such as “/properties/Tala Hill” remap to either “/properties/Tala Hill.php” if it exists, or to “/search.php?cat=properties&q=Tala Hill”. However, I couldn't get to step one - for some reason “/properties/blahblah” was always being mapped to “/properties.php” even when URL rewriting was off!.

The culprit? mod_negotiation. I noticed that the HTTP headers from the server would have the following unusual entries when I requested “/properties/Tala Hill”:

Content-Location: properties.php
Vary: negotiate
TCN: choice

and this would only happen if adding “.php” to the end pointed to an actual file.

Commenting out mod_negotiation and it's settings (LanguagePriority and ForceLanguagePriority) from my apache configuration fixed this and the request now results in 404 Not Found as it should. So, back to playing with the Rewrite Engine

Content-Disposition

The upshot of the problems I encountered in the previous section is that I discovered the Content-Disposition header which can be used to turn a normal page into a file download - i.e. you can present an HTML / text page (which would normally be displayed by the browser) which the user is prompted to “Save as…”.

See http://www.faqs.org/rfcs/rfc2183.html for details.

Upgrading & Testing a live server

When developing a website, it is common to have two separate servers: development and live. However, problems arise when you are upgrading an existing side to a new, incompatible one. I use the trick below to over come this.

The plan

We're going to change our browser to report a different name (technically “User Agent”) to any website it connects to, and make sure the server displays a “Down for Maintenance” banner for any request which does not report to be our special browser.

For the following example I have a plain HTML file called DownTime.html sitting in the site's root directory, and will be setting my browser to report that it is “RobM”, instead of it's usual one1).

Configuring the server

First we'll set the server to redirect everyone to a notice page about the maintenance, we'll take care of letting ourselves in once we know the public are seeig this notice.

:!: Note: You need to have mod_rewrite and some level of “AllowOverride” enabled in your Apache configuration. If in doubt try (on your development server) setting “AllowOverride All” for your site to allow .htaccess files to set all directives possible

Add the following lines near the top of your site's /.htaccess on the live server:

## Enable the URL rewriting engine
RewriteEngine on
Options +FollowSymlinks
 
## Send non-developers to a "site down for maintenance" page
RewriteCond %{HTTP_USER_AGENT} !^RobM.*$
RewriteRule .* DownTime.html   [L]
 
## Normal .htaccess content below this line
...

Notes

  • You need +FollowSymlinks in order for mod_rewrite to work.
  • %{} refers to a server variable. This is akin to $_SERVER in PHP
  • HTTP_USER_AGENT will contain whatever was sent in the User Agent: XXXX line your browser sent, if it sent one.
  • !^$ is a negated regluar expression.
    • ! → “If not matched”
    • ^ → Very beginning of variable (as opposed to being flexible like regular expressions usually are)
    • $ → Very end of variable
  • So !^RobM$ will be satisfied for any user-agent which is not exactly “RobM” (case-sensitive)
  • RewriteRule will replace one part of the requested URI with another
    • .* → Replace the whole thing (it's a regular expression which matches everything)
    • DownTime.html → means the request now becomes /DownTime.html, even if the original request was for /subdir/subdir2/page.ext?param=value&param2=value
    • [L] → “Last” rule to consider. Normally mod_rewrite would continue checking for applicable rules right up to the end of the file - which'd mean if you had, say, a rule to convert “*.html → *.htm” that DownTime.html would become DownTime.htm and ultimately break.

Getting your browser ready

I hear Opera browser has built in support for changing the User Agent, so you can just use that. In Firefox you'll want the User Agent Switcher extension. I don't care about Internet Explorer, so if you use that you're on your own2).

So I've now changed my User Agent to “RobM” and set it as current.

==

So with some luck, the general public will be seeing DownTime.html for any request them make, and you'll be seeing the site as “normal”. To allow the general public back in simply comment out the two rules (and possibly the lines that enable mod_rewrite if you don't need them). You'll probably want to revert your User-Agent too, or some sites will give you very basic pages3).

Migrating your site to a wiki

Just a quick aside related to the above — when you move your content into a wiki everyone's old bookmarks will break and search engine indexes will become out of date and possibly other nasty things will happen too. I found that using Apache's RewriteEngine was a very neat way to overcome this without having lots of the old file-system structure lying around with minimal redirect pages in them:

RewriteRule ^Events.html$                                   menu/eventsi
RewriteRule ^Events_Schedule.php$                           events/schedule
RewriteRule ^Events_Schedule_Past.php$                      events/schedule_past
RewriteRule ^Events_Schedule_Static.php$                    events/schedule_static
RewriteRule ^Events_SocRunnings.php$                        events/general
 
RewriteRule ^ArtMedia.html$                                 menu/art_media
RewriteRule ^ImproManga.php$                                art_media/impromanga
...
 
## DokuWiki use_rewrite handler
RewriteCond %{REQUEST_FILENAME}                             !-f
RewriteCond %{REQUEST_FILENAME}                             !-d
RewriteRule (.*)                                            wiki/$1  [QSA,L]

What this does is compare the left-hand side regular expression against the URL request, and change any matching part to the right-hand side when it can; so by the time it get to the bottom of the file the DokuWiki rewrite directives recieve a request for a (virtual) wiki page.

Per-Host Configurations for lighttpd

While using lighttpd's simple virtual hosting I found myself wanting a simple way to define per-host configurations, but there apparently wasn't one available. So I wrote a script in Python which /etc/lighttpd/lighttpd.conf runs to acquire the vhost-config entries:

Add the following to /etc/lighttpd/lighttpd.conf:

## Load per-vhost configurations
include_shell "/var/www/lighttpd-vhost-confs.py"

And create a new file at /var/www/lighttpd-vhost-confs.py:

#!/usr/bin/python
 
"""
Generates per-vhost configuration directives from conf files found in the root
of each vhost
 
Add the following line to /etc/lighttpd/lighttpd.conf to make use of this script:
 
	include_shell "/var/www/lighttpd-vhost-confs.py"
"""
 
import os, sys
 
basedir = os.path.dirname(sys.argv[0])
dirlist = os.walk(basedir).next()[1]
 
for dir in dirlist:
	conf = os.path.join(basedir, dir, "lighttpd.conf")
	if os.path.exists(conf):
		f = file(conf)
		conf_data = f.read()
		f.close()
		print """$HTTP["host"] == "%s" {\n%s\n}""" % (dir, conf_data)

The directory layout this was used in was as follows:

/var/www
/var/www/disk-browser
/var/www/disk-browser/html
/var/www/lighttpd-vhost-confs.py
/var/www/tilltroll.robmeerman.co.uk
/var/www/tilltroll.robmeerman.co.uk/lighttpd.conf
/var/www/tilltroll.robmeerman.co.uk/html

where /var/www/tilltroll.robmeerman.co.uk/lighttpd.conf contains the following:

	# deny access completly to these
	$HTTP["url"] =~ "/\.ht" { url.access-deny = ( "" ) }
	$HTTP["url"] =~ "/_ht" { url.access-deny = ( "" ) }
	$HTTP["url"] =~ "^/(bin|data|inc|conf)/"  { url.access-deny = ( "" ) }

and the script produces the following output when run:

$HTTP["host"] == "tilltroll.robmeerman.co.uk" {
	# deny access completly to these
	$HTTP["url"] =~ "/\.ht" { url.access-deny = ( "" ) }
	$HTTP["url"] =~ "/_ht" { url.access-deny = ( "" ) }
	$HTTP["url"] =~ "^/(bin|data|inc|conf)/"  { url.access-deny = ( "" ) }
 
}

Changing the default umask for Lighttpd + PHP

It doesn't seem to be possible to do this neatly from a vhost-specific configuration file. Popular work-arounds seem to be:

  • Use PHP's “php_value auto_prepend_file” configuration settings (via php.ini or via the webserver's config/env) to spit “umask(0002);” at the beginning of each PHP invocation. I don't like this because it only affects PHP, what if you run bespoke CGI scripts?
  • Change the umask of the process which launches your webserver. E.g. edit /etc/init.d/lighttpd so that it has “umask 0002” somewhere near the top. This is what I did.
1)
For example, mine currently is: “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
2)
Please let me know if you find a nice way of doing it though.
3)
Such as Wikipedia or GMail
unix/webmastering.txt · Last modified: 2008/09/04 11:38 (external edit)