Tuesday, March 3, 2015

Btrfs filesystem df

Do you know how hard it is to make backups of multiple terabytes? How tall of a stack of floppies that would be? Yes, I have backups, of the important stuff I can't re-get easily, onto several different computers.

With that out of the way, I was running uncomfortably close to out of space on my main system. It only had a couple of hundred gigabytes left. (I remember when a 1.2GB system was so unimaginably vast that I split it up into 4 pieces.) However, the filesystem in question is btrfs, running as raid1 on 3x 2TB drives. I had always intended for that system to be raid5, which is why I got 3 drives for it. Since raid5 on btrfs wasn't ready for prime time, I did a raid1 and waited for the day when raid5 became usable.

With kernel 3.19, that day finally came. A quick (not really, it took many hours
) rebalance later, and raid1 became raid5 without even taking the filesystem offline. Now I go from 3TB total usable space (6TB device space divided by 2, raid1 duplication factor) to 4TB (effectively two drives data, one parity).

But do I? I know that the normal df command is basically useless for free space on btrfs, so I do the following:

root@omoikane ~
# btrfs fi df /mnt/big
Data, RAID5: total=2.48TiB, used=2.46TiB
System, RAID5: total=64.00MiB, used=348.00KiB
Metadata, RAID5: total=6.94GiB, used=3.80GiB
GlobalReserve, single: total=512.00MiB, used=3.54MiB


What gives? I expected something like 4TB of space, and no amount of TiB->TB conversion is going to cause a discrepancy that big. In fact, this says that I have less space than I did prior to converting to raid5. What did I do wrong? As it turns out, nothing. First, look at this:

root@omoikane ~
$ btrfs fi show /mnt/big
Label: none  uuid: 6bf52a05-b0cc-4b21-96db-32cc9a1bed7d
        Total devices 3 FS bytes used 2.47TiB
        devid    1 size 1.82TiB used 1.24TiB path /dev/sdb
        devid    2 size 1.82TiB used 1.24TiB path /dev/sdc
        devid    3 size 1.82TiB used 1.24TiB path /dev/sdd

btrfs-progs v3.19-rc2


Not all of the disks are even fully allocated. If you add up two of them, you get the 2.48TB expected, and you only add two because of raid5. So why aren't my disks fully allocated? Is it like mdadm, where I have to do something manually to get that space? Again, that turns out to not be the case.

To test this, I opened up two terminals. In one, I made a large file by dd'ing /dev/urandom into a file. I used that instead of /dev/zero so that I wouldn't have to be in doubt with file holes.

In the other terminal, I watched the output of btrfs fi df, in a little bit more fine-grained manner:

me@omoikane ~
$ btrfs fi df -k /mnt/big
Data, RAID5: total=2658172928.00KiB, used=2643827200.00KiB
System, RAID5: total=65536.00KiB, used=348.00KiB
Metadata, RAID5: total=7274496.00KiB, used=3990868.00KiB
GlobalReserve, single: total=524288.00KiB, used=0.00KiB

As the dd ran, I watched this report, then saw the total amount of raid5 space grow (the bold number above). I knew what was going on at that point: The system was automatically growing the data space as needed. The unallocated space reported by fi show is available automatically without any need to rebalance or manually allocate. I suppose it is held in reserve for snapshots, a feature I don't use. In any case, the space is really there and the fully allocatable space is 3.64TiB, of which 2.47TiB is used, meaning I now have over a full TiB available. The raid5 conversion worked.

Interestingly, when I deleted the big file, the amount of space allocated to raid5 data decreased.