A 'cool' xfs bug
By joe
- 2 minutes read - 373 wordsNo, really, bugs can be cool … Customer has a user with a proclivity towards writing large files. Sparse large files. Say a couple Petabytes or so. Single file. I kid you not. (filenames and paths changed)
[root@jr4-2 ~]# ls -alF /data/brick-sdd2/dht/scratch/xyzpdq
total 4652823496
d--------- 2 1232 1000 86 Jun 27 20:31 ./
drwx------ 104 1232 1000 65536 Aug 17 23:53 ../
-rw------- 1 1232 1000 21 Jun 27 09:57 Default.Route
-rw------- 1 1232 1000 250 Jun 27 09:57 Gau-00000.inp
-rw------- 1 1232 1000 0 Jun 27 09:57 Gau-00000.d2e
-rw------- 1 1232 1000 7800416534233088 Jun 27 20:18 Gau-00000.rwf
[root@jr4-2 ~]# ls -ahlF /data/brick-sdd2/dht/scratch/xyzpdq
total 4.4T
d--------- 2 1232 1000 86 Jun 27 20:31 ./
drwx------ 104 1232 1000 64K Aug 17 23:53 ../
-rw------- 1 1232 1000 21 Jun 27 09:57 Default.Route
-rw------- 1 1232 1000 250 Jun 27 09:57 Gau-00000.inp
-rw------- 1 1232 1000 0 Jun 27 09:57 Gau-00000.d2e
-rw------- 1 1232 1000 7.0P Jun 27 20:18 Gau-00000.rwf
[OMG] Now suppose the unit gets crashed for some reason, and you have to run an xfs_repair. xfs_repair basically walks files, to make sure the extents line up correctly among other things. So … when it hits a large sparse file like this, whatcha think happens? Can you say … “Denial of service” … as a side effect? No, xfs_repair does the right (literal) thing, of attempting to find all the extents of a 7 PB file. You might guess that this takes a while. You would be right. As soon as my subscription to the xfs mailing list is up, we’ll send them a note on it. And I’ll be filing a bug or two over this … How can you tell you are in this position? Easy. Run xfs_repair. If it is taking an inordinate amount of time, say more than a few hours, strace the process. If you see lots of pread(…)=4096 happening about 1/second, yes, thats the file system doing a 4k read, 1 per second, to try to find your extents. Whoops. Kinda neat, but I am sure that some folks would probably like sparse file behavior in xfs not to include this. FWIW, the fix is to delete that file. Once thats done, xfs_repair happily completes pretty quickly.