2015-05-09 discovering bitrot with MD5

Except for modern filesystems such as ZFS and btrfs, your files aren't protected against so-called "bitrot". If, for some reason, a file is corrupted then you won't discover this and your normal backups might not have the correct file anymore. This is because the corrupted file has been backed up and your archives don't go back in time far enough.

There are a number of ways to protect against this:

I went with the second option. md5deep is easily installable via Homebrew:

 $ brew install md5deep

To generate a file with hashes:

 $ md5deep -r -l testdirectory > testdirectory.md5deep

Explanation: -r means recursive, -l will use relative paths. This will create a file called "testdirectory.md5deep", where all files are written with their path and their hash.

To print all changed (damaged) files:

 $ md5deep -r -l -x testdirectory.md5deep testdirectory

Regularly, check the hashes against a directory that shouldn't ever change. For example, your archive of last year's family pictures. If one of the pictures got corrupted, then you know you should restore it from backup.

If you don't mind using a bit of extra space (as well as taking a bit of additional CPU time), then you can use par2. It installs nicely via Homebrew as well:

 $ brew install par2

Example command inside the directory with the files of your choice:

 $ cd testdirectory
 $ par2create par2file *

To verify:

 $ cd testdirectory
 $ par2verify par2file.par2

As an indication, a directory containing 1836 megabytes of photos and videos resulted in a couple of par2 files that took 93 megs, so about 5% of extra storage is necessary.

To go through all subdirectories and create a par2 file, I use the following one-liner on MacOS:

 $ START=$(pwd);IFS=$'\n'; for i in $(find . -type d); do if [ "$i" == "." ]; then continue; fi ;cd "${i##./}"; pwd; par2create par2file *.*; cd "$START"; done

The *.* after par2create is there to prevent subfolders being included in the parameters to par2create. I.e. with *.* we are not globbing subfolders.