User Tools

Site Tools


unix:lvm_recovery

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

unix:lvm_recovery [2011/11/27 21:38]
robm
unix:lvm_recovery [2013/08/20 22:54]
Line 1: Line 1:
-====== Recovering my lost data: LVM and RAID ====== 
- 
-So I upgraded Ubuntu 9.10 to Ubuntu 11.10. When the system boots it says it cannot mount ''/store'', my 960GB RAID+LVM file-system. That the one that holds over 10 years of personal photographs and such. :-( 
- 
-===== About the file-system: ''store'' ===== 
- 
-There are many layers of indirection between the file-system and the physical storage when using LVM or RAID. When using both, the number of layers can seem excessive. Here's a diagram of the layers involved in my (lost) setup: 
- 
-<graphviz> 
-digraph G { 
-node [shape=box] 
- 
-md0 [label="RAID5 block device: md0"] 
-sdb1 [label="AutoRAID partition: sdb1"] 
-sdc1 [label="AutoRAID partition: sdc1"] 
-sdd1 [label="AutoRAID partition: sdd1"] 
-sde1 [label="AutoRAID partition: sde1"] 
-sdf1 [label="AutoRAID partition: sdf1"] 
-sdb [label="Disk: 320GB SATA: sdb"] 
-sdc [label="Disk: 320GB SATA: sdc"] 
-sdd [label="Disk: 320GB SATA: sdd"] 
-sde [label="Disk: 320GB SATA: sde"] 
-sdf [label="Disk: 320GB SATA: sdf"] 
- 
-fs_store [label="EXT4 file-system: store"] 
-lv_store [label="LVM logical volume: lv_store"] 
-vg_store [label="LVM volume group: vg_store"] 
-pv_store [label="LVM physical volume: pv_store"] 
- 
-sdb1 -> sdb 
-sdc1 -> sdc 
-sdd1 -> sdd 
-sde1 -> sde 
-sdf1 -> sdf 
-md0 -> {sdb1 sdc1 sdd1 sde1 sdf1} 
-pv_store -> md0 
-vg_store -> pv_store 
-lv_store -> vg_store 
-fs_store -> lv_store 
-} 
-</graphviz> 
- 
-Note that the RAID block device, ''md0'', is not partitioned. I believe that was a mistake on my part, and a likely reason why Ubuntu 11.10 cannot auto-detect it. 
- 
-===== Problem statement ===== 
- 
-Since upgrading system boot is interrupted with an error screen to the affect of "Cannot mount /store" and a prompt to enter a root shell or skip mounting. From what I can see, the RAID array is detected without problems, and is functioning correctly. So the system looks like this: 
- 
-<graphviz> 
-digraph G { 
-node [shape=box] 
- 
-md0 [label="RAID5 block device: md0 (appears to be unformatted)"] 
-sdb1 [label="AutoRAID partition: sdb1"] 
-sdc1 [label="AutoRAID partition: sdc1"] 
-sdd1 [label="AutoRAID partition: sdd1"] 
-sde1 [label="AutoRAID partition: sde1"] 
-sdf1 [label="AutoRAID partition: sdf1"] 
-sdb [label="Disk: 320GB SATA: sdb"] 
-sdc [label="Disk: 320GB SATA: sdc"] 
-sdd [label="Disk: 320GB SATA: sdd"] 
-sde [label="Disk: 320GB SATA: sde"] 
-sdf [label="Disk: 320GB SATA: sdf"] 
- 
-sdb1 -> sdb 
-sdc1 -> sdc 
-sdd1 -> sdd 
-sde1 -> sde 
-sdf1 -> sdf 
-md0 -> {sdb1 sdc1 sdd1 sde1 sdf1} 
-} 
-</graphviz> 
- 
-The RAID (multi-disk) status looks fine to me: 
- 
-<code> 
-root@ikari:~# cat /proc/mdstat  
-Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]  
-md127 : active raid5 sdb[1] sde[3] sdc[0] sdd[2] sdf[4](S) 
-      937713408 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] 
-       
-unused devices: <none> 
-</code> 
- 
-but the resulting 960.2 GB block device is partitioned as a "Linux RAID autodetect" - which would suggest that it is a *member* of some other multi-disk setup. This, I believe, is human error on my part when I created the thing... 
- 
-<code> 
-root@ikari:~# fdisk -l /dev/md127 
- 
-Disk /dev/md127: 960.2 GB, 960218529792 bytes 
-255 heads, 63 sectors/track, 116739 cylinders, total 1875426816 sectors 
-Units = sectors of 1 * 512 = 512 bytes 
-Sector size (logical/physical): 512 bytes / 512 bytes 
-I/O size (minimum/optimal): 65536 bytes / 196608 bytes 
-Disk identifier: 0xd71c877b 
- 
-      Device Boot      Start         End      Blocks   Id  System 
-/dev/md127p1              63   625137344   312568641   fd  Linux RAID autodetect 
-Partition 1 does not start on physical sector boundary. 
-</code> 
- 
-====== Recovery strategy ====== 
- 
-  - Create a disk-image of ''md127'', partition table and all (required buying an external 2TB USB drive) 
-  - Use LVM snapshots with this disk-image to make (and quickly roll-back) experimental changes 
- 
-===== Formatting the external USB drive ===== 
- 
-So I created a single "Linux LVM" partition on the 2TB disk, created a single 1.8TB physical volume and a single 1.8TB volume group containing it. On this I created a 1TB logical volume called ''lv_scratch'' and copied the contents of  ''md127'' to it (e.g. ''dd if=/dev/md127 of=/dev/mapper/lv_scratch''). Once the copy was made, I created a snapshot of ''lv_scratch'' which I imaginatively called ''snap''. 
- 
-LVM snapshots are interesting creatures. As the name suggests, the snapshot (named ''snap'') holds the state of ''lv_scratch'' as it was when I created it. I can still read and write to ''lv_scratch'', but the contents of ''snap'' will not change. This is ideal for making consistent backups. The snapshot works by deferring any writes to ''lv_scratch'' and placing them instead in some temporary copy-on-write (COW) volume. All access to ''lv_scratch'' consults the COW volume - when there is a hit it is returned, otherwise the original (unchanged) ''lv_scratch'' is read. When the snapshot is deleted, the deferred changes stored in ''snap'' are written to ''lv_scratch'' and become permanent. Makes sense if you are used to copy-on-write behaviour. 
- 
-Now here is where things get interesting. The snapshot, ''snap'', does not have to be read-only: you can create it read-write. Doing so gives you a very cheap copy of ''lv_scratch'', and any changes you make to the snapshot are stored in a COW table. You can discard the changes by deleting the snapshot. Ideal for my situation: I want to experiment with the partition table and various file-system recovery tools etc. I let these manipulate the snapshot, and if things go bad I delete and recreate the snapshot and try over. 
- 
-<graphviz> 
-digraph G { 
-sdj [label="2TB USB disk: sdj"] 
-sdj1 [label="Linux LVM partition: sdj1"] 
-pv_scratch [label="LVM physical volume"] 
-vg_scratch [label="LVM volume group: vg_scratch"] 
-lv_scratch [label="LVM logical volume: lv_scratch"] 
-snap [label="LVM logical volume: snap"] 
- 
-lv_scratch -> vg_scratch -> pv_scratch -> sdj1 -> sdj 
-snap -> vg_scratch 
- 
-snap -> lv_scratch [style="dashed",arrowhead="none"] 
-} 
-</graphviz> 
- 
-<code> 
-root@ikari:~# fdisk -l /dev/sdj 
- 
-Disk /dev/sdj: 2000.4 GB, 2000398934016 bytes 
-255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors 
-Units = sectors of 1 * 512 = 512 bytes 
-Sector size (logical/physical): 512 bytes / 512 bytes 
-I/O size (minimum/optimal): 512 bytes / 512 bytes 
-Disk identifier: 0x000f0222 
- 
-   Device Boot      Start         End      Blocks   Id  System 
-/dev/sdj1            2048  3907028991  1953513472   8e  Linux LVM 
-</code> 
- 
-<code> 
-root@ikari:~# pvdisplay  
-  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error 
-  --- Physical volume --- 
-  PV Name               /dev/sdj1 
-  VG Name               vg_scratch 
-  PV Size               1.82 TiB / not usable 4.00 MiB 
-  Allocatable           yes  
-  PE Size               4.00 MiB 
-  Total PE              476931 
-  Free PE               86787 
-  Allocated PE          390144 
-  PV UUID               nrf9cQ-Asfz-Y2x2-SDoT-3ppu-mpEC-Fnuf8Z 
-    
-root@ikari:~# vgdisplay 
-  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error 
-  --- Volume group --- 
-  VG Name               vg_scratch 
-  System ID              
-  Format                lvm2 
-  Metadata Areas        1 
-  Metadata Sequence No  13 
-  VG Access             read/write 
-  VG Status             resizable 
-  MAX LV                0 
-  Cur LV                2 
-  Open LV               0 
-  Max PV                0 
-  Cur PV                1 
-  Act PV                1 
-  VG Size               1.82 TiB 
-  PE Size               4.00 MiB 
-  Total PE              476931 
-  Alloc PE / Size       390144 / 1.49 TiB 
-  Free  PE / Size       86787 / 339.01 GiB 
-  VG UUID               Lk7UZP-48xF-vBPi-6g8F-sXlF-qyzy-pQNKgq 
-    
-root@ikari:~# lvdisplay 
-  /dev/dm-0: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-1: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-2: read failed after 0 of 4096 at 0: Input/output error 
-  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error 
-  --- Logical volume --- 
-  LV Name                /dev/vg_scratch/lv_scratch 
-  VG Name                vg_scratch 
-  LV UUID                aFBpgv-gqcd-jjLU-c7xO-Jyeb-2R0t-HpEF84 
-  LV Write Access        read/write 
-  LV snapshot status     source of 
-                         /dev/vg_scratch/snap [active] 
-  LV Status              available 
-  # open                 0 
-  LV Size                1.00 TiB 
-  Current LE             262144 
-  Segments               1 
-  Allocation             inherit 
-  Read ahead sectors     auto 
-  - currently set to     256 
-  Block device           253:1 
-    
-  --- Logical volume --- 
-  LV Name                /dev/vg_scratch/snap 
-  VG Name                vg_scratch 
-  LV UUID                OvOsQ7-uACi-xJVZ-vseu-fKEc-F73h-CmSalH 
-  LV Write Access        read/write 
-  LV snapshot status     active destination for /dev/vg_scratch/lv_scratch 
-  LV Status              available 
-  # open                 0 
-  LV Size                1.00 TiB 
-  Current LE             262144 
-  COW-table size         500.00 GiB 
-  COW-table LE           128000 
-  Allocated to snapshot  0.00%  
-  Snapshot chunk size    4.00 KiB 
-  Segments               1 
-  Allocation             inherit 
-  Read ahead sectors     auto 
-  - currently set to     256 
-  Block device           253:3 
-</code> 
- 
-So here's the goal I'm aiming for on my external storage: 
- 
-<graphviz> 
-digraph G { 
-sdj [label="2TB USB disk: sdj"] 
-sdj1 [label="Linux LVM partition: sdj1"] 
-pv_scratch [label="LVM physical volume"] 
-vg_scratch [label="LVM volume group: vg_scratch"] 
-lv_scratch [label="LVM logical volume: lv_scratch"] 
-snap [label="LVM logical volume: snap"] 
- 
-lv_scratch -> vg_scratch -> pv_scratch -> sdj1 -> sdj 
-snap -> vg_scratch 
-snap -> lv_scratch [style="dashed",arrowhead="none"] 
- 
-node [shape=box] 
-fs_store [label="EXT4 file-system: store"] 
-lv_store [label="LVM logical volume: lv_store"] 
-vg_store [label="LVM volume group: vg_store"] 
-pv_store [label="LVM physical volume: pv_store"] 
- 
- 
-pv_store -> snap 
-vg_store -> pv_store 
-lv_store -> vg_store 
-fs_store -> lv_store 
- 
- 
-} 
-</graphviz> 
- 
-====== Recognising the nested LVM volumes ====== 
- 
-Since it's been a few days and reboots since I last worked on this, I'll start by plugging the USB drive it. 
- 
-<code> 
-root@ikari:~# dmesg 
-[  479.180019] usb 2-5: new high speed USB device number 7 using ehci_hcd 
-[  479.313228] scsi13 : usb-storage 2-5:1.0 
-[  480.312605] scsi 13:0:0:0: Direct-Access     Seagate  Desktop          0130 PQ: 0 ANSI: 4 
-[  480.336633] sd 13:0:0:0: Attached scsi generic sg10 type 0 
-[  480.337029] sd 13:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) 
-[  480.337671] sd 13:0:0:0: [sdi] Write Protect is off 
-[  480.337671] sd 13:0:0:0: [sdi] Mode Sense: 2f 08 00 00 
-[  480.340027] sd 13:0:0:0: [sdi] No Caching mode page present 
-[  480.340027] sd 13:0:0:0: [sdi] Assuming drive cache: write through 
-[  480.341806] sd 13:0:0:0: [sdi] No Caching mode page present 
-[  480.341811] sd 13:0:0:0: [sdi] Assuming drive cache: write through 
-[  480.357290]  sdi: sdi1 
-[  480.359346] sd 13:0:0:0: [sdi] No Caching mode page present 
-[  480.359350] sd 13:0:0:0: [sdi] Assuming drive cache: write through 
-[  480.359354] sd 13:0:0:0: [sdi] Attached SCSI disk 
-</code> 
- 
-The (outer) LVM PVs are automatically detects, and their VGs + LVs are subsequently detected: 
- 
-<code> 
-root@ikari:~# pvs 
-  PV         VG         Fmt  Attr PSize PFree   
-  /dev/sdi1  vg_scratch lvm2 a-   1.82t 339.01g 
-root@ikari:~# vgs 
-  VG         #PV #LV #SN Attr   VSize VFree   
-  vg_scratch       1 wz--n- 1.82t 339.01g 
-root@ikari:~# lvs 
-  LV         VG         Attr   LSize   Origin     Snap%  Move Log Copy%  Convert 
-  lv_scratch vg_scratch owi-a-   1.00t                                           
-  snap       vg_scratch swi-a- 500.00g lv_scratch   0.00    
-</code> 
- 
-Now I know that the ''snap'' logical volume contains the PVs I copied from my defunct-RAID array, and ''fdisk'' seems to confirm this: 
- 
-<code> 
-root@ikari:~# fdisk -l /dev/vg_scratch/snap  
- 
-Disk /dev/vg_scratch/snap: 1099.5 GB, 1099511627776 bytes 
-1 heads, 1 sectors/track, -2147483648 cylinders, total 2147483648 sectors 
-Units = sectors of 1 * 512 = 512 bytes 
-Sector size (logical/physical): 512 bytes / 512 bytes 
-I/O size (minimum/optimal): 512 bytes / 512 bytes 
-Disk identifier: 0xd71c877b 
- 
-               Device Boot      Start         End      Blocks   Id  System 
-/dev/vg_scratch/snap1             195  1875411646   937705726   8e  Linux LVM 
-</code> 
- 
-So to make the Linux Kernel notice this it must reread the partition table. There's a tool for that: 
- 
-<code> 
-root@ikari:~# whatis partprobe 
-partprobe (8)        - inform the OS of partition table changes 
-root@ikari:~# partprobe --help 
-Usage: partprobe [OPTION] [DEVICE]... 
-Inform the operating system about partition table changes. 
- 
-  -d, --dry-run    do not actually inform the operating system 
-  -s, --summary    print a summary of contents 
-  -h, --help       display this help and exit 
-  -v, --version    output version information and exit 
- 
-When no DEVICE is given, probe all partitions. 
- 
-Report bugs to <bug-parted@gnu.org>. 
-root@ikari:~# partprobe /dev/vg_scratch/snap  
-device-mapper: deps ioctl failed: No such device or address 
- 
-root@ikari:~# pvs 
-  PV         VG         Fmt  Attr PSize   PFree   
-  /dev/dm-4  store_vg   lvm2 a-   894.25g  78.88g 
-  /dev/sdi1  vg_scratch lvm2 a-     1.82t 339.01g 
-</code> 
- 
-So, despite the error about some ioctl failing, my (inner) LVM copy has become visible! 
- 
-<code> 
-root@ikari:~# vgs 
-  VG         #PV #LV #SN Attr   VSize   VFree   
-  store_vg         1 wz--n- 894.25g  78.88g 
-  vg_scratch       1 wz--n-   1.82t 339.01g 
-root@ikari:~# lvs 
-  LV         VG         Attr   LSize   Origin     Snap%  Move Log Copy%  Convert 
-  store_lv   store_vg   owi-a- 715.38g                                           
-  store_snap store_vg   swi-a- 100.00g store_lv     0.00                         
-  lv_scratch vg_scratch owi-a-   1.00t                                           
-  snap       vg_scratch swi-ao 500.00g lv_scratch   0.00                         
-root@ikari:~#  
-</code> 
- 
  
unix/lvm_recovery.txt · Last modified: 2013/08/20 22:54 (external edit)