So whatcha gonna do when you have a Lustre file system, with an ext4 backing store with a journal on an external RAID1 SSD, when that external RAID1 ssd pair goes away (in a non-recoverable manner), and the file system has the needs_recovery flag set?
You see, the ‘-f’ option to e2fsck … doesn’t … in the face of a missing external journal with needs_recovery set.
Ok, you can turn off the journal.
tune2fs -O ^has_journal /dev/...
… oh, but that doesn’t work because of the needs_recovery flag.
Yeah, about this time, you have visions of wiping out the file system. A nice cleansing ‘mkfs…’ would save you large swaths of time. But there is data in them thar sectors.
So you decide to do some auto-neurosurgery on the file system.
Light up debugfs
debugfs -b4096 -w /dev/...
when it finally gives you a prompt back, do this
debugfs: feature -has_journal
Filesystem features: ext_attr resize_inode dir_index filetype needs_recovery extent sparse_super large_file uninit_bg
debugfs: feature -needs_recovery
Filesystem features: ext_attr resize_inode dir_index filetype extent sparse_super large_file uninit_bg
Yeah, we just told it we don’t need recovery, and it has no journal. Not very good, but its better than wiping for the moment. Oh, and the journal uuid is now likely quite different, so its a good idea to fix that as well.
debugfs: ssv journal_uuid UUID
where UUID comes from the output of the blkid program … point it to your new RAID1.
Oh … but we aren’t done. Now we have to do some rebuilding of a blank external journal, and then run a forced fsck.
mke2fs -b 4096 -O journal_dev /dev/NEW_RAID1
tune2fs -f -O has_journal /dev/OST
(technically this is keeping the journal internal to the RAID, rather than external … there are a number of reasons why we are doing this here for this fix, which is temporary, if we really wanted the external journal, we would have added the -j /dev/NEW_RAID1 to the tune2fs above)
and then the fsck
e2fsck -fv /dev/OST
Now that wasn’t too painful … was it?