User Tools

Site Tools


unix:lvm_recovery

This is an old revision of the document!


Recovering my lost data: LVM and RAID

So I upgraded Ubuntu 9.10 to Ubuntu 11.10. When the system boots it says it cannot mount /store, my 960GB RAID+LVM file-system. That the one that holds over 10 years of personal photographs and such. :-(

About the file-system: ''store''

There are many layers of indirection between the file-system and the physical storage when using LVM or RAID. When using both, the number of layers can seem excessive. Here's a diagram of the layers involved in my (lost) setup:

Note that the RAID block device, md0, is not partitioned. I believe that was a mistake on my part, and a likely reason why Ubuntu 11.10 cannot auto-detect it.

Problem statement

Since upgrading system boot is interrupted with an error screen to the affect of “Cannot mount /store” and a prompt to enter a root shell or skip mounting. From what I can see, the RAID array is detected without problems, and is functioning correctly. So the system looks like this:

The RAID (multi-disk) status looks fine to me:

root@ikari:~# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid5 sdb[1] sde[3] sdc[0] sdd[2] sdf[4](S)
      937713408 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

but the resulting 960.2 GB block device is partitioned as a “Linux RAID autodetect” - which would suggest that it is a *member* of some other multi-disk setup. This, I believe, is human error on my part when I created the thing…

root@ikari:~# fdisk -l /dev/md127

Disk /dev/md127: 960.2 GB, 960218529792 bytes
255 heads, 63 sectors/track, 116739 cylinders, total 1875426816 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 196608 bytes
Disk identifier: 0xd71c877b

      Device Boot      Start         End      Blocks   Id  System
/dev/md127p1              63   625137344   312568641   fd  Linux RAID autodetect
Partition 1 does not start on physical sector boundary.

Recovery strategy

  1. Create a disk-image of md127, partition table and all (required buying an external 2TB USB drive)
  2. Use LVM snapshots with this disk-image to make (and quickly roll-back) experimental changes

Formatting the external USB drive

So I created a single “Linux LVM” partition on the 2TB disk, created a single 1.8TB physical volume and a single 1.8TB volume group containing it. On this I created a 1TB logical volume called lv_scratch and copied the contents of md127 to it (e.g. dd if=/dev/md127 of=/dev/mapper/lv_scratch). Once the copy was made, I created a snapshot of lv_scratch which I imaginatively called snap.

LVM snapshots are interesting creatures. As the name suggests, the snapshot (named snap) holds the state of lv_scratch as it was when I created it. I can still read and write to lv_scratch, but the contents of snap will not change. This is ideal for making consistent backups. The snapshot works by deferring any writes to lv_scratch and placing them instead in some temporary copy-on-write (COW) volume. All access to lv_scratch consults the COW volume - when there is a hit it is returned, otherwise the original (unchanged) lv_scratch is read. When the snapshot is deleted, the deferred changes stored in snap are written to lv_scratch and become permanent. Makes sense if you are used to copy-on-write behaviour.

Now here is where things get interesting. The snapshot, snap, does not have to be read-only: you can create it read-write. Doing so gives you a very cheap copy of lv_scratch, and any changes you make to the snapshot are stored in a COW table. You can discard the changes by deleting the snapshot. Ideal for my situation: I want to experiment with the partition table and various file-system recovery tools etc. I let these manipulate the snapshot, and if things go bad I delete and recreate the snapshot and try over.

root@ikari:~# fdisk -l /dev/sdj

Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f0222

   Device Boot      Start         End      Blocks   Id  System
/dev/sdj1            2048  3907028991  1953513472   8e  Linux LVM
root@ikari:~# pvdisplay 
  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  --- Physical volume ---
  PV Name               /dev/sdj1
  VG Name               vg_scratch
  PV Size               1.82 TiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              476931
  Free PE               86787
  Allocated PE          390144
  PV UUID               nrf9cQ-Asfz-Y2x2-SDoT-3ppu-mpEC-Fnuf8Z
   
root@ikari:~# vgdisplay
  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  --- Volume group ---
  VG Name               vg_scratch
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  13
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               1.82 TiB
  PE Size               4.00 MiB
  Total PE              476931
  Alloc PE / Size       390144 / 1.49 TiB
  Free  PE / Size       86787 / 339.01 GiB
  VG UUID               Lk7UZP-48xF-vBPi-6g8F-sXlF-qyzy-pQNKgq
   
root@ikari:~# lvdisplay
  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  --- Logical volume ---
  LV Name                /dev/vg_scratch/lv_scratch
  VG Name                vg_scratch
  LV UUID                aFBpgv-gqcd-jjLU-c7xO-Jyeb-2R0t-HpEF84
  LV Write Access        read/write
  LV snapshot status     source of
                         /dev/vg_scratch/snap [active]
  LV Status              available
  # open                 0
  LV Size                1.00 TiB
  Current LE             262144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1
   
  --- Logical volume ---
  LV Name                /dev/vg_scratch/snap
  VG Name                vg_scratch
  LV UUID                OvOsQ7-uACi-xJVZ-vseu-fKEc-F73h-CmSalH
  LV Write Access        read/write
  LV snapshot status     active destination for /dev/vg_scratch/lv_scratch
  LV Status              available
  # open                 0
  LV Size                1.00 TiB
  Current LE             262144
  COW-table size         500.00 GiB
  COW-table LE           128000
  Allocated to snapshot  0.00% 
  Snapshot chunk size    4.00 KiB
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

So here's the goal I'm aiming for on my external storage:

Recognising the nested LVM volumes

Since it's been a few days and reboots since I last worked on this, I'll start by plugging the USB drive it.

root@ikari:~# dmesg
[  479.180019] usb 2-5: new high speed USB device number 7 using ehci_hcd
[  479.313228] scsi13 : usb-storage 2-5:1.0
[  480.312605] scsi 13:0:0:0: Direct-Access     Seagate  Desktop          0130 PQ: 0 ANSI: 4
[  480.336633] sd 13:0:0:0: Attached scsi generic sg10 type 0
[  480.337029] sd 13:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[  480.337671] sd 13:0:0:0: [sdi] Write Protect is off
[  480.337671] sd 13:0:0:0: [sdi] Mode Sense: 2f 08 00 00
[  480.340027] sd 13:0:0:0: [sdi] No Caching mode page present
[  480.340027] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[  480.341806] sd 13:0:0:0: [sdi] No Caching mode page present
[  480.341811] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[  480.357290]  sdi: sdi1
[  480.359346] sd 13:0:0:0: [sdi] No Caching mode page present
[  480.359350] sd 13:0:0:0: [sdi] Assuming drive cache: write through
[  480.359354] sd 13:0:0:0: [sdi] Attached SCSI disk

The (outer) LVM PVs are automatically detects, and their VGs + LVs are subsequently detected:

root@ikari:~# pvs
  PV         VG         Fmt  Attr PSize PFree  
  /dev/sdi1  vg_scratch lvm2 a-   1.82t 339.01g

root@ikari:~# vgs
  VG         #PV #LV #SN Attr   VSize VFree  
  vg_scratch   1   2   1 wz--n- 1.82t 339.01g

root@ikari:~# lvs
  LV         VG         Attr   LSize   Origin     Snap%  Move Log Copy%  Convert
  lv_scratch vg_scratch owi-a-   1.00t                                          
  snap       vg_scratch swi-a- 500.00g lv_scratch   0.00   

Somewhere on the snap logcial volume is my nested LVM. I used xxd /dev/vg_scratch/snap | less and searched for LVM2. The first hit was a false-positive (appeared to have stripes of NULLs written across it), but the second hit looked plausible:

8018600: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018610: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018620: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018630: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018640: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018650: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018660: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018670: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018680: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018690: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80186f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018700: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018710: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018720: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018730: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018740: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018750: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018760: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018770: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018780: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018790: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80187f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018800: 4c41 4245 4c4f 4e45 0100 0000 0000 0000  LABELONE........
8018810: 9148 4053 2000 0000 4c56 4d32 2030 3031  .H@S ...LVM2 001
8018820: 5341 7536 6e32 7578 474c 5148 6743 5351  SAu6n2uxGLQHgCSQ
8018830: 6b56 6b5a 655a 4c78 7874 314b 7652 6a31  kVkZeZLxxt1KvRj1
8018840: 00f8 0391 df00 0000 0000 0300 0000 0000  ................
8018850: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018860: 0000 0000 0000 0000 0010 0000 0000 0000  ................
8018870: 00f0 0200 0000 0000 0000 0000 0000 0000  ................
8018880: 0000 0000 0000 0000 0000 0000 0000 0000  ................
8018890: 0000 0000 0000 0000 0000 0000 0000 0000  ................
80188a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

I know from using xxd to examine a correct and functioning LVM2 partition (the PV behind vg_scratch as it happens) that the “LVM2” text should appear at 0x210. So I'll create a loopback device with an appropriate offset to make that happen:

root@ikari:~# losetup /dev/loop0 /dev/vg_scratch/snap --offset $((0x8018600))

root@ikari:~# lvmdiskscan 
  /dev/ram0  [      64.00 MiB] 
  /dev/loop0 [     894.15 GiB] LVM physical volume
  /dev/dm-0  [     186.27 GiB] 
  /dev/ram1  [      64.00 MiB] 
  /dev/sda1  [     294.09 GiB] 
  /dev/dm-1  [     894.27 GiB] 
  /dev/ram2  [      64.00 MiB] 
  /dev/dm-2  [     894.27 GiB] 
  /dev/ram3  [      64.00 MiB] 
  /dev/dm-3  [     894.27 GiB] 
  /dev/ram4  [      64.00 MiB] 
  /dev/dm-4  [     782.47 GiB] 
  /dev/ram5  [      64.00 MiB] 
  /dev/sda5  [       4.00 GiB] 
  /dev/dm-5  [     715.38 GiB] 
  /dev/ram6  [      64.00 MiB] 
  /dev/ram7  [      64.00 MiB] 
  /dev/ram8  [      64.00 MiB] 
  /dev/ram9  [      64.00 MiB] 
  /dev/ram10 [      64.00 MiB] 
  /dev/ram11 [      64.00 MiB] 
  /dev/ram12 [      64.00 MiB] 
  /dev/ram13 [      64.00 MiB] 
  /dev/ram14 [      64.00 MiB] 
  /dev/ram15 [      64.00 MiB] 
  /dev/sdb1  [       1.82 TiB] LVM physical volume
  0 disks
  24 partitions
  0 LVM physical volume whole disks
  2 LVM physical volumes

root@ikari:~# pvs
  PV         VG         Fmt  Attr PSize   PFree  
  /dev/loop0 store_vg   lvm2 a-   894.25g 178.88g
  /dev/sdb1  vg_scratch lvm2 a-     1.82t      0 

If lvmdiskscan doesn work you could try using partprobe to tell the Kernel to rescan partition tables and do what it does.

root@ikari:~# lvs
  LV         VG         Attr   LSize   Origin     Snap%  Move Log Copy%  Convert
  store_lv   store_vg   -wi-a- 715.38g                                          
  home_zfs   vg_scratch -wi-a- 186.27g                                          
  lv_scratch vg_scratch owi-a- 894.27g                                          
  snap       vg_scratch swi-ao 782.47g lv_scratch   0.00    
unix/lvm_recovery.1327960500.txt.gz · Last modified: 2012/01/30 21:55 by robm