Stability Issues

My Linux box tends to lock up sometimes. Bad enough that all network activity ceases (dropping SSH connections), the on-screen GUI cursor stops moving (and I understand that it's a hardware cursor, so it ought to keep going even if processes are deadlocked) and the SysRq key doesn't work - although I suspect that might be because it's a USB keyboard…

To my mind this evidence leads to one of

  1. Kernel problems (Perhaps an “Oops!” I cannot see?)
  2. Hardware lockup

If it's the kernel, I might be able to setup my laptop to monitor kernel messages over a serial port. I'll have to get myself a null-modem cable and give it a try.

Hardware Description

For anyone who isn't me, here is the relevant output from the lshw command:

    description: Desktop Computer
    product: OEM
    vendor: OEM
    version: OEM
    serial: OEM
    width: 32 bits
    capabilities: smbios-2.2 dmi-2.2 smp-1.4 smp
    configuration: boot=normal chassis=desktop cpus=2 uuid=00000000-0000-0000-0000-00508D9D7584
       description: Motherboard
       product: AB9/AB9RPO(Intel965+ICH8)
       physical id: 0
       version: 1.x (BIOS:15)
          description: BIOS
          vendor: Phoenix Technologies, LTD
          physical id: 0
          version: 6.00 PG (04/02/2007)
          size: 128KiB
          capacity: 448KiB
          capabilities: isa pci pnp apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot zipboot
          description: CPU
          product: Intel(R) Core(TM)2 CPU          6320  @ 1.86GHz
          vendor: Intel Corp.


Date Change Outcome
2007-09-23 Changed BIOS PCIe compliance mode from 1.0a to 1.0 No change - still locks up
2007-09-23 Changed BIOS to keep CPU fan on max No change - still locks up
2007-09-23 Avoided using Compiz, incase it was exercising the nvidia driver in a bad way Still locked up, but the GUI cursor was still movable, albeit it very jerkily. An attempt to connect via SSH timed out, system request keys failed
2007-09-23 Ran memtest for ~8 hours No problems found
2007-09-23 Updated BIOS to v21 (2007/08/16) Can't boot from Hard drive anymore (they're not listed on the boot-order page of the BIOS!). Cleared CMOS by shorting jumper as per Motherboard manual, system now enters Grub stage, but Ubuntu doesn't appear to boot. Changed BIOS to do USB keyboard via BIOS instead of OS so I can interact with grub. Booting in recovery mode (so the boot log is printed to the screen) shows that it is getting stuck retrying to connect to one of my harddrives. Switched IDE controller mode from IDE to AHCI: cannot find boot device. Switched back, boot gets stuck bringing up SATA - downgraded kernel to 2.6.20-15 (2.6.20-16 also exhibits this problem). Found that when in IDE mode, there are options to control how the drive is presented in the Standard BIOS Settings page. After much experimenting, settled on update v17, as all others seems to cause boot problems.
