A ‘cool’ xfs bug

No, really, bugs can be cool …

Customer has a user with a proclivity towards writing large files. Sparse large files. Say a couple Petabytes or so. Single file. I kid you not.

(filenames and paths changed)

[root@jr4-2 ~]# ls -alF /data/brick-sdd2/dht/scratch/xyzpdq
total 4652823496
d---------   2 1232 1000               86 Jun 27 20:31 ./
drwx------ 104 1232 1000            65536 Aug 17 23:53 ../
-rw-------   1 1232 1000               21 Jun 27 09:57 Default.Route
-rw-------   1 1232 1000              250 Jun 27 09:57 Gau-00000.inp
-rw-------   1 1232 1000                0 Jun 27 09:57 Gau-00000.d2e
-rw-------   1 1232 1000 <strong>7800416534233088</strong> Jun 27 20:18 Gau-00000.rwf

[root@jr4-2 ~]# ls -ahlF /data/brick-sdd2/dht/scratch/xyzpdq
total 4.4T
d---------   2 1232 1000   86 Jun 27 20:31 ./
drwx------ 104 1232 1000  64K Aug 17 23:53 ../
-rw-------   1 1232 1000   21 Jun 27 09:57 Default.Route
-rw-------   1 1232 1000  250 Jun 27 09:57 Gau-00000.inp
-rw-------   1 1232 1000    0 Jun 27 09:57 Gau-00000.d2e
-rw-------   1 1232 1000 <strong>7.0P</strong> Jun 27 20:18 Gau-00000.rwf

[OMG]

Now suppose the unit gets crashed for some reason, and you have to run an xfs_repair. xfs_repair basically walks files, to make sure the extents line up correctly among other things.

So … when it hits a large sparse file like this, whatcha think happens?

Can you say … “Denial of service” … as a side effect? No, xfs_repair does the right (literal) thing, of attempting to find all the extents of a 7 PB file. You might guess that this takes a while. You would be right.

As soon as my subscription to the xfs mailing list is up, we’ll send them a note on it. And I’ll be filing a bug or two over this …

How can you tell you are in this position? Easy. Run xfs_repair. If it is taking an inordinate amount of time, say more than a few hours, strace the process. If you see lots of pread(…)=4096 happening about 1/second, yes, thats the file system doing a 4k read, 1 per second, to try to find your extents. Whoops.

Kinda neat, but I am sure that some folks would probably like sparse file behavior in xfs not to include this. FWIW, the fix is to delete that file. Once thats done, xfs_repair happily completes pretty quickly.

Viewed 47821 times by 7095 viewers

Facebooktwittergoogle_plusredditpinterestlinkedinmail