User Tools

Site Tools


unix:webmastering

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

unix:webmastering [2008/09/04 12:38] (current)
Line 1: Line 1:
 +====== Webmastering ======
 +Websites have been a hobby of mine since I was about 15, and over time I've started to do some fairly advanced things to take care of simple, usually annoying, problems I encounter. This section is intended to serve as a reminder to myself of those tricks, and possibly of use to others as well.
 +
 +===== Monitoring logs during development =====
 +It's really useful to be able to watch the error log from the webserver when you are developing a site, it'd be even more useful if you could be alerted when a new error has been logged.
 +
 +<code bash>
 +tail -f error.log | grep -v --line-buffered favicon.ico | sed '​s/​$/​\x07/'​
 +</​code>​
 +
 +First, open the log and follow appends to it, then filter this through grep and remove any lines with "​favicon.ico"​ in them, then replace end of lines with bell characters to make the console beep/alert us on each new line.
 +
 +:!: **NB:** It is important to use the "​--line-buffered"​ swtich with ''​grep''​ or the next pipe will recieve no data (or at least recieve data in unusefully large chunks, making the data old).
 +
 +===== "​Content-Location:"​ being a nuisance =====
 +This one is a weird problem I came accross when trying to add some fancy url rewriting to a site --- the plan was to have flat links such as "/​properties/​Tala Hill" remap to either "/​properties/​Tala Hill.php"​ if it exists, or to "/​search.php?​cat=properties&​q=Tala Hill". However, I couldn'​t get to step one - for some reason "/​properties/​blahblah"​ was always being mapped to "/​properties.php"​ **even when URL rewriting was off**!.
 +
 +The culprit? **''​mod_negotiation''​**. I noticed that the HTTP headers from the server would have the following unusual entries when I requested "/​properties/​Tala Hill":
 +<​code>​
 +Content-Location:​ properties.php
 +Vary: negotiate
 +TCN: choice
 +</​code>​
 +and this would only happen if adding "​.php"​ to the end pointed to an actual file. 
 +
 +Commenting out mod_negotiation and it's settings (LanguagePriority and ForceLanguagePriority) from my apache configuration fixed this and the request now results in 404 Not Found as it should. So, back to playing with the Rewrite Engine
 +
 +
 +===== Content-Disposition =====
 +The upshot of the problems I encountered in the previous section is that I discovered the ''​Content-Disposition''​ header which can be used to turn a normal page into a file download - i.e. you can present an HTML / text page (which would normally be displayed by the browser) which the user is prompted to "Save as..."​.
 +
 +See [[http://​www.faqs.org/​rfcs/​rfc2183.html]] for details.
 +
 +===== Upgrading & Testing a live server =====
 +When developing a website, it is common to have two separate servers: development and live. However, problems arise when you are upgrading an existing side to a new, incompatible one. I use the trick below to over come this.
 +
 +==== The plan ====
 +
 +We're going to change our browser to report a different name (technically "User Agent"​) to any website it connects to, and make sure the server displays a "Down for Maintenance"​ banner for any request which //does not// report to be our special browser.
 +
 +For the following example I have a plain HTML file called ''​DownTime.html''​ sitting in the site's root directory, and will be setting my browser to report that it is "​RobM",​ instead of it's usual one((For example, mine currently is: "''​Mozilla/​5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/​20060111 Firefox/​1.5.0.1''"​)).
 +
 +=== Configuring the server ===
 +First we'll set the server to redirect everyone to a notice page about the maintenance,​ we'll take care of letting ourselves in once we know the public are seeig this notice.
 +
 +:!: **Note:** //You need to have ''​mod_rewrite''​ and some level of "​AllowOverride"​ enabled in your Apache configuration. If in doubt try (on your development server) setting "​AllowOverride All" for your site to allow ''​.htaccess''​ files to set all directives possible//
 +
 +Add the following lines near the top of your site's ''/​.htaccess''​ on the live server:
 +<code conf>
 +## Enable the URL rewriting engine
 +RewriteEngine on
 +Options +FollowSymlinks
 +
 +## Send non-developers to a "site down for maintenance"​ page
 +RewriteCond %{HTTP_USER_AGENT} !^RobM.*$
 +RewriteRule .* DownTime.html ​  [L]
 +
 +## Normal .htaccess content below this line
 +...
 +</​code>​
 +
 +Notes
 +  * You need ''​**+FollowSymlinks**''​ in order for mod_rewrite to work.
 +  * ''​%{}''​ refers to a server variable. This is akin to ''​$_SERVER''​ in PHP
 +  * ''​HTTP_USER_AGENT''​ will contain whatever was sent in the ''​User Agent: XXXX''​ line your browser sent, if it sent one.
 +  * ''​!^$''​ is a negated regluar expression.
 +    * ''​!''​ -> "If not matched"​
 +    * ''​^''​ -> Very beginning of variable (as opposed to being flexible like regular expressions usually are)
 +    * ''​$''​ -> Very end of variable
 +  * So ''​!^RobM$''​ will be satisfied for any user-agent which is not //exactly// "​RobM"​ (case-sensitive)
 +  * ''​RewriteRule''​ will replace one part of the requested URI with another
 +    * ''​.*''​ -> Replace the whole thing (it's a regular expression which matches everything)
 +    * ''​DownTime.html''​ -> means the request now becomes ''/​DownTime.html'',​ even if the original request was for ''/​subdir/​subdir2/​page.ext?​param=value&​param2=value''​
 +    * ''​[L]''​ -> "​Last"​ rule to consider. Normally ''​mod_rewrite''​ would continue checking for applicable rules right up to the end of the file - which'​d mean if you had, say, a rule to convert "​*.html -> *.htm" that DownTime.html would become DownTime.htm and ultimately break.
 +
 +=== Getting your browser ready ===
 +I hear Opera browser has built in support for changing the User Agent, so you can just use that. In Firefox you'll want the [[http://​chrispederick.com/​work/​useragentswitcher/​|User Agent Switcher]] extension. I don't care about Internet Explorer, so if you use that you're on your own((Please [[robert.meerman@gmail.com|let me know]] if you find a nice way of doing it though.)).
 +
 +So I've now changed my User Agent to "​RobM"​ and set it as current.
 +
 +==
 +
 +So with some luck, the general public will be seeing DownTime.html for any request them make, and you'll be seeing the site as "​normal"​. To allow the general public back in simply comment out the two rules (and possibly the lines that enable ''​mod_rewrite''​ if you don't need them). //​You'​ll probably want to revert your User-Agent too//, or some sites will give you very basic pages((Such as Wikipedia or GMail)).
 +
 +===== Migrating your site to a wiki =====
 +Just a quick aside related to the above --- when you move your content into a wiki everyone'​s old bookmarks will break and search engine indexes will become out of date and possibly other nasty things will happen too. I found that using Apache'​s RewriteEngine was a very neat way to overcome this without having lots of the old file-system structure lying around with minimal redirect pages in them:
 +
 +<code bash>
 +RewriteRule ^Events.html$ ​                                  ​menu/​eventsi
 +RewriteRule ^Events_Schedule.php$ ​                          ​events/​schedule
 +RewriteRule ^Events_Schedule_Past.php$ ​                     events/​schedule_past
 +RewriteRule ^Events_Schedule_Static.php$ ​                   events/​schedule_static
 +RewriteRule ^Events_SocRunnings.php$ ​                       events/​general
 +
 +RewriteRule ^ArtMedia.html$ ​                                ​menu/​art_media
 +RewriteRule ^ImproManga.php$ ​                               art_media/​impromanga
 +...
 +
 +## DokuWiki use_rewrite handler
 +RewriteCond %{REQUEST_FILENAME} ​                            !-f
 +RewriteCond %{REQUEST_FILENAME} ​                            !-d
 +RewriteRule (.*)                                            wiki/​$1 ​ [QSA,L]
 +</​code>​
 +
 +What this does is compare the left-hand side regular expression against the URL request, and change any matching part to the right-hand side when it can; so by the time it get to the bottom of the file the DokuWiki rewrite directives recieve a request for a (virtual) wiki page.
 +
 +===== Per-Host Configurations for lighttpd =====
 +While using ''​lighttpd'''​s simple virtual hosting I found myself wanting a simple way to define per-host configurations,​ but there apparently wasn't one available. So I wrote a script in Python which ''/​etc/​lighttpd/​lighttpd.conf''​ runs to acquire the vhost-config entries:
 +
 +Add the following to ''/​etc/​lighttpd/​lighttpd.conf'':​
 +<code bash>
 +## Load per-vhost configurations
 +include_shell "/​var/​www/​lighttpd-vhost-confs.py"​
 +</​code>​
 +
 +And create a new file at ''/​var/​www/​lighttpd-vhost-confs.py'':​
 +<code python>
 +#​!/​usr/​bin/​python
 +
 +"""​
 +Generates per-vhost configuration directives from conf files found in the root
 +of each vhost
 +
 +Add the following line to /​etc/​lighttpd/​lighttpd.conf to make use of this script:
 +
 + include_shell "/​var/​www/​lighttpd-vhost-confs.py"​
 +"""​
 +
 +import os, sys
 +
 +basedir = os.path.dirname(sys.argv[0])
 +dirlist = os.walk(basedir).next()[1]
 +
 +for dir in dirlist:
 + conf = os.path.join(basedir,​ dir, "​lighttpd.conf"​)
 + if os.path.exists(conf):​
 + f = file(conf)
 + conf_data = f.read()
 + f.close()
 + print """​$HTTP["​host"​] == "​%s"​ {\n%s\n}"""​ % (dir, conf_data)
 +</​code>​
 +
 +The directory layout this was used in was as follows:
 +
 +<​code>​
 +/var/www
 +/​var/​www/​disk-browser
 +/​var/​www/​disk-browser/​html
 +/​var/​www/​lighttpd-vhost-confs.py
 +/​var/​www/​tilltroll.robmeerman.co.uk
 +/​var/​www/​tilltroll.robmeerman.co.uk/​lighttpd.conf
 +/​var/​www/​tilltroll.robmeerman.co.uk/​html
 +</​code>​
 +
 +where ''/​var/​www/​tilltroll.robmeerman.co.uk/​lighttpd.conf''​ contains the following:
 +
 +<code python>
 + # deny access completly to these
 + $HTTP["​url"​] =~ "/​\.ht"​ { url.access-deny = ( ""​ ) }
 + $HTTP["​url"​] =~ "/​_ht"​ { url.access-deny = ( ""​ ) }
 + $HTTP["​url"​] =~ "​^/​(bin|data|inc|conf)/" ​ { url.access-deny = ( ""​ ) }
 +</​code>​
 +
 +and the script produces the following output when run:
 +
 +<code python>
 +$HTTP["​host"​] == "​tilltroll.robmeerman.co.uk"​ {
 + # deny access completly to these
 + $HTTP["​url"​] =~ "/​\.ht"​ { url.access-deny = ( ""​ ) }
 + $HTTP["​url"​] =~ "/​_ht"​ { url.access-deny = ( ""​ ) }
 + $HTTP["​url"​] =~ "​^/​(bin|data|inc|conf)/" ​ { url.access-deny = ( ""​ ) }
 +
 +}
 +</​code>​
 +
 +===== Changing the default umask for Lighttpd + PHP =====
 +It doesn'​t seem to be possible to do this neatly from a vhost-specific configuration file. Popular work-arounds seem to be:
 +
 +  * Use PHP's "​php_value auto_prepend_file"​ configuration settings (via ''​php.ini''​ or via the webserver'​s config/env) to spit "​umask(0002);"​ at the beginning of each PHP invocation. I don't like this because it only affects PHP, what if you run bespoke CGI scripts?
 +  * Change the umask of the process which launches your webserver. E.g. edit ''/​etc/​init.d/​lighttpd''​ so that it has "umask 0002" somewhere near the top. This is what I did.
  
unix/webmastering.txt · Last modified: 2008/09/04 12:38 (external edit)