This is an old revision of the document!
So I upgraded Ubuntu 9.10 to Ubuntu 11.10. When the system boots it says it cannot mount /store
, my 960GB RAID+LVM file-system. That the one that holds over 10 years of personal photographs and such.
There are many layers of indirection between the file-system and the physical storage when using LVM or RAID. When using both, the number of layers can seem excessive. Here's a diagram of the layers involved in my (lost) setup:
Note that the RAID block device, md0
, is not partitioned. I believe that was a mistake on my part, and a likely reason why Ubuntu 11.10 cannot auto-detect it.
Since upgrading system boot is interrupted with an error screen to the affect of “Cannot mount /store” and a prompt to enter a root shell or skip mounting. From what I can see, the RAID array is detected without problems, and is functioning correctly. So the system looks like this:
The RAID (multi-disk) status looks fine to me:
root@ikari:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid5 sdb[1] sde[3] sdc[0] sdd[2] sdf[4](S) 937713408 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] unused devices: <none>
but the resulting 960.2 GB block device is partitioned as a “Linux RAID autodetect” - which would suggest that it is a *member* of some other multi-disk setup. This, I believe, is human error on my part when I created the thing…
root@ikari:~# fdisk -l /dev/md127 Disk /dev/md127: 960.2 GB, 960218529792 bytes 255 heads, 63 sectors/track, 116739 cylinders, total 1875426816 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 65536 bytes / 196608 bytes Disk identifier: 0xd71c877b Device Boot Start End Blocks Id System /dev/md127p1 63 625137344 312568641 fd Linux RAID autodetect Partition 1 does not start on physical sector boundary.
md127
, partition table and all (required buying an external 2TB USB drive)
So I created a single “Linux LVM” partition on the 2TB disk, created a single 1.8TB physical volume and a single 1.8TB volume group containing it. On this I created a 1TB logical volume called lv_scratch
and copied the contents of md127
to it (e.g. dd if=/dev/md127 of=/dev/mapper/lv_scratch
). Once the copy was made, I created a snapshot of lv_scratch
which I imaginatively called snap
.
LVM snapshots are interesting creatures. As the name suggests, the snapshot (named snap
) holds the state of lv_scratch
as it was when I created it. I can still read and write to lv_scratch
, but the contents of snap
will not change. This is ideal for making consistent backups. The snapshot works by deferring any writes to lv_scratch
and placing them instead in some temporary copy-on-write (COW) volume. All access to lv_scratch
consults the COW volume - when there is a hit it is returned, otherwise the original (unchanged) lv_scratch
is read. When the snapshot is deleted, the deferred changes stored in snap
are written to lv_scratch
and become permanent. Makes sense if you are used to copy-on-write behaviour.
Now here is where things get interesting. The snapshot, snap
, does not have to be read-only: you can create it read-write. Doing so gives you a very cheap copy of lv_scratch
, and any changes you make to the snapshot are stored in a COW table. You can discard the changes by deleting the snapshot. Ideal for my situation: I want to experiment with the partition table and various file-system recovery tools etc. I let these manipulate the snapshot, and if things go bad I delete and recreate the snapshot and try over.
root@ikari:~# fdisk -l /dev/sdj Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000f0222 Device Boot Start End Blocks Id System /dev/sdj1 2048 3907028991 1953513472 8e Linux LVM
root@ikari:~# pvdisplay /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error --- Physical volume --- PV Name /dev/sdj1 VG Name vg_scratch PV Size 1.82 TiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 476931 Free PE 86787 Allocated PE 390144 PV UUID nrf9cQ-Asfz-Y2x2-SDoT-3ppu-mpEC-Fnuf8Z root@ikari:~# vgdisplay /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error --- Volume group --- VG Name vg_scratch System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 13 VG Access read/write VG Status resizable MAX LV 0 Cur LV 2 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 1.82 TiB PE Size 4.00 MiB Total PE 476931 Alloc PE / Size 390144 / 1.49 TiB Free PE / Size 86787 / 339.01 GiB VG UUID Lk7UZP-48xF-vBPi-6g8F-sXlF-qyzy-pQNKgq root@ikari:~# lvdisplay /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error --- Logical volume --- LV Name /dev/vg_scratch/lv_scratch VG Name vg_scratch LV UUID aFBpgv-gqcd-jjLU-c7xO-Jyeb-2R0t-HpEF84 LV Write Access read/write LV snapshot status source of /dev/vg_scratch/snap [active] LV Status available # open 0 LV Size 1.00 TiB Current LE 262144 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:1 --- Logical volume --- LV Name /dev/vg_scratch/snap VG Name vg_scratch LV UUID OvOsQ7-uACi-xJVZ-vseu-fKEc-F73h-CmSalH LV Write Access read/write LV snapshot status active destination for /dev/vg_scratch/lv_scratch LV Status available # open 0 LV Size 1.00 TiB Current LE 262144 COW-table size 500.00 GiB COW-table LE 128000 Allocated to snapshot 0.00% Snapshot chunk size 4.00 KiB Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3
So here's the goal I'm aiming for on my external storage:
Since it's been a few days and reboots since I last worked on this, I'll start by plugging the USB drive it.
root@ikari:~# dmesg [ 479.180019] usb 2-5: new high speed USB device number 7 using ehci_hcd [ 479.313228] scsi13 : usb-storage 2-5:1.0 [ 480.312605] scsi 13:0:0:0: Direct-Access Seagate Desktop 0130 PQ: 0 ANSI: 4 [ 480.336633] sd 13:0:0:0: Attached scsi generic sg10 type 0 [ 480.337029] sd 13:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 480.337671] sd 13:0:0:0: [sdi] Write Protect is off [ 480.337671] sd 13:0:0:0: [sdi] Mode Sense: 2f 08 00 00 [ 480.340027] sd 13:0:0:0: [sdi] No Caching mode page present [ 480.340027] sd 13:0:0:0: [sdi] Assuming drive cache: write through [ 480.341806] sd 13:0:0:0: [sdi] No Caching mode page present [ 480.341811] sd 13:0:0:0: [sdi] Assuming drive cache: write through [ 480.357290] sdi: sdi1 [ 480.359346] sd 13:0:0:0: [sdi] No Caching mode page present [ 480.359350] sd 13:0:0:0: [sdi] Assuming drive cache: write through [ 480.359354] sd 13:0:0:0: [sdi] Attached SCSI disk
The (outer) LVM PVs are automatically detects, and their VGs + LVs are subsequently detected:
root@ikari:~# pvs PV VG Fmt Attr PSize PFree /dev/sdi1 vg_scratch lvm2 a- 1.82t 339.01g root@ikari:~# vgs VG #PV #LV #SN Attr VSize VFree vg_scratch 1 2 1 wz--n- 1.82t 339.01g root@ikari:~# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert lv_scratch vg_scratch owi-a- 1.00t snap vg_scratch swi-a- 500.00g lv_scratch 0.00
Somewhere on the snap
logcial volume is my nested LVM. I used xxd /dev/vg_scratch/snap | less
and searched for LVM2
. The first hit was a false-positive (appeared to have stripes of NULLs written across it), but the second hit looked plausible:
8018600: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018610: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018620: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018630: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018640: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018650: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018660: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018670: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018680: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018690: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80186f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018700: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018710: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018720: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018730: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018740: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018750: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018760: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018770: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018780: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018790: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80187f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018800: 4c41 4245 4c4f 4e45 0100 0000 0000 0000 LABELONE........ 8018810: 9148 4053 2000 0000 4c56 4d32 2030 3031 .H@S ...LVM2 001 8018820: 5341 7536 6e32 7578 474c 5148 6743 5351 SAu6n2uxGLQHgCSQ 8018830: 6b56 6b5a 655a 4c78 7874 314b 7652 6a31 kVkZeZLxxt1KvRj1 8018840: 00f8 0391 df00 0000 0000 0300 0000 0000 ................ 8018850: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018860: 0000 0000 0000 0000 0010 0000 0000 0000 ................ 8018870: 00f0 0200 0000 0000 0000 0000 0000 0000 ................ 8018880: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 8018890: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 80188a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
I know from using xxd
to examine a correct and functioning LVM2 partition (the PV behind vg_scratch
as it happens) that the “LVM2” text should appear at 0x210. So I'll create a loopback device with an appropriate offset to make that happen:
root@ikari:~# losetup /dev/loop0 /dev/vg_scratch/snap --offset $((0x8018600)) root@ikari:~# lvmdiskscan /dev/ram0 [ 64.00 MiB] /dev/loop0 [ 894.15 GiB] LVM physical volume /dev/dm-0 [ 186.27 GiB] /dev/ram1 [ 64.00 MiB] /dev/sda1 [ 294.09 GiB] /dev/dm-1 [ 894.27 GiB] /dev/ram2 [ 64.00 MiB] /dev/dm-2 [ 894.27 GiB] /dev/ram3 [ 64.00 MiB] /dev/dm-3 [ 894.27 GiB] /dev/ram4 [ 64.00 MiB] /dev/dm-4 [ 782.47 GiB] /dev/ram5 [ 64.00 MiB] /dev/sda5 [ 4.00 GiB] /dev/dm-5 [ 715.38 GiB] /dev/ram6 [ 64.00 MiB] /dev/ram7 [ 64.00 MiB] /dev/ram8 [ 64.00 MiB] /dev/ram9 [ 64.00 MiB] /dev/ram10 [ 64.00 MiB] /dev/ram11 [ 64.00 MiB] /dev/ram12 [ 64.00 MiB] /dev/ram13 [ 64.00 MiB] /dev/ram14 [ 64.00 MiB] /dev/ram15 [ 64.00 MiB] /dev/sdb1 [ 1.82 TiB] LVM physical volume 0 disks 24 partitions 0 LVM physical volume whole disks 2 LVM physical volumes root@ikari:~# pvs PV VG Fmt Attr PSize PFree /dev/loop0 store_vg lvm2 a- 894.25g 178.88g /dev/sdb1 vg_scratch lvm2 a- 1.82t 0
If lvmdiskscan
doesn work you could try using partprobe
to tell the Kernel to rescan partition tables and do what it does.
root@ikari:~# lvs LV VG Attr LSize Origin Snap% Move Log Copy% Convert store_lv store_vg -wi-a- 715.38g home_zfs vg_scratch -wi-a- 186.27g lv_scratch vg_scratch owi-a- 894.27g snap vg_scratch swi-ao 782.47g lv_scratch 0.00
OK, so now I can read/write my store
LVM that housed my ext4
file-system. But the file-system isn't where it's supposed to be, apparently:
root@ikari:~# file -Ls /dev/store_vg/store_lv /dev/store_vg/store_lv: data
Time to play with a hex editor again to look at a valid ext4
file-system header (from my current Ubuntu installation, /dev/sda1
).
After a bit of searching I find the characteristic 53ef
marker in the right column, followed by the name of my file-system: store
. Looks good.
# xxd /dev/store_vg/store_lv | less 05c9db0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9dc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9dd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9de0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9df0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9e00: 00c0 fc06 c017 f90d c9da b200 0c2a d00a .............*.. 05c9e10: c604 f906 0000 0000 0200 0000 0200 0000 ................ 05c9e20: 0080 0000 0080 0000 0040 0000 9d17 aa4d .........@.....M 05c9e30: 9d17 aa4d 0200 1b00 53ef 0100 0100 0000 ...M....S....... 05c9e40: 4c2c a24d 004e ed00 0000 0000 0100 0000 L,.M.N.......... 05c9e50: 0000 0000 0b00 0000 8000 0000 3400 0000 ............4... 05c9e60: 0600 0000 0300 0000 9495 5e9b 7d7e 41da ..........^.}~A. 05c9e70: a595 6af4 4848 b5c6 7374 6f72 6500 0000 ..j.HH..store... 05c9e80: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9e90: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9ea0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9eb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9ec0: 0000 0000 0000 0000 0000 0000 0000 c803 ................ 05c9ed0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 05c9ee0: 0800 0000 0000 0000 0000 0000 9a4a cd41 .............J.A 05c9ef0: 88ff 4759 ac42 d083 8b3f 3b0d 0201 0000 ..GY.B...?;..... 05c9f00: 0000 0000 0000 0000 cf4a 9f46 0906 0000 .........J.F.... 05c9f10: 0a06 0000 0b06 0000 0c06 0000 0d06 0000 ................
By comparing this to my valid file-system, I can see that the 53ef
line should be at offset 0x430:
# xxd /dev/sda1 | less 0000380: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000390: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00003f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000400: 0020 2601 005d 9804 73d1 3a00 8366 bb02 . &..]..s.:..f.. 0000410: d8da 2001 0000 0000 0200 0000 0200 0000 .. ............. 0000420: 0080 0000 0080 0000 0020 0000 b153 264f ......... ...S&O 0000430: d9bf f44e 0d00 2200 53ef 0100 0100 0000 ...N..".S....... 0000440: 98b4 f44e 004e ed00 0000 0000 0100 0000 ...N.N.......... 0000450: 0000 0000 0b00 0000 0001 0000 3c00 0000 ............<... 0000460: 4602 0000 7b00 0000 89bb eae1 b864 492d F...{........dI- 0000470: a3d6 2bc9 5336 151e 0000 0000 0000 0000 ..+.S6.......... 0000480: 0000 0000 0000 0000 2f00 1eae 7c13 0000 ......../...|... 0000490: 0000 c099 98a1 0188 ffff 985d 698f 0188 ...........]i... 00004a0: ffff 307a 87a2 0188 ffff 307a 87a2 0188 ..0z......0z.... 00004b0: ffff 585c 698f 0188 ffff 645d 1681 ffff ..X\i.....d].... 00004c0: ffff c000 588f 0188 0000 0000 0000 ed03 ....X........... 00004d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00004e0: 0800 0000 0000 0000 b60a 4000 5547 eae9 ..........@.UG.. 00004f0: 2f3c 45ee 9913 70fb 20b7 7395 0101 0000 /<E...p. .s..... 0000500: 0000 0000 0000 0000 98b4 f44e 0af3 0200 ...........N.... 0000510: 0400 0000 0000 0000 0000 0000 ff7f 0000 ................ 0000520: 0080 4802 ff7f 0000 0100 0000 ffff 4802 ..H...........H. 0000530: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000540: 0000 0000 0000 0000 0000 0000 0000 0008 ................ 0000550: 0000 0000 0000 0000 0000 0000 1c00 1c00 ................ 0000560: 0100 0000 0000 0000 0000 0000 0000 0000 ................ 0000570: 0000 0000 0400 0000 b6b0 300b 0000 0000 ..........0..... 0000580: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000590: 0000 0000 0000 0000 0000 0000 0000 0000 ................
So let's create another loopback device with an offset:
# losetup /dev/loop1 /dev/store_vg/store_lv --offset $((0x5C9A00))
For the record, that makes the current overall loopback settings:
# losetup -a /dev/loop0: [0005]:12783 (/dev/mapper/vg_scratch-snap), offset 134317568 /dev/loop1: [0005]:40851 (/dev/mapper/store_vg-store_lv), offset 6068736
So is the ext4
file-system visible?
# file -Ls /dev/loop1 /dev/loop1: Linux rev 1.0 ext3 filesystem data, UUID=94955e9b-7d7e-41da-a595-6af44848b5c6, volume name "store" (needs journal recovery) (large files)
Success!
Obviously I tried mounting it first, to no avail:
root@ikari:/tmp# mount /dev/loop1 /tmp/store mount: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so
Since this is all sitting on top of the LVM snapshot I made earlier (/dev/vg_scratch/snap
) I am happy to try various tools that modify the drive contents, such as fsck
.
First attempt, no joy:
# fsck.ext4 -y /dev/loop1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Group descriptors look bad... trying backup blocks... fsck.ext4: Bad magic number in super-block when using the backup blocks fsck.ext4: going back to original superblock Error reading block 3226742528 (Invalid argument). Ignore error? yes Force rewrite? yes Superblock has an invalid journal (inode 8). Clear? yes *** ext3 journal has been deleted - filesystem is now ext2 only *** The filesystem size (according to the superblock) is 234428352 blocks The physical size of the device is 187529782 blocks Either the superblock or the partition table is likely to be corrupt! Abort? yes Error writing block 3226742528 (Invalid argument). Ignore error? yes