[Noisebridge-discuss] Data integrity, rdiff-backup, Reed-Solomon codes

John Magolske listmail at b79.net
Mon Dec 14 07:39:56 UTC 2009


* Jason Dusek <jason.dusek at gmail.com> [091213 18:40]:
> 2009/12/12 John Magolske <listmail at b79.net>:
> > In general, what would be some recommended tools & strategies
> > to ensure ongoing data integrity?
> 
>   It is very important to have a few backups -- snapshots from
>   times past. A corrupted cell is not the only way to get an
>   rsync backup that is broken, after all. If you only have one
>   and you don't check it's integrity, you can easily find
>   yourself in a situation where you backup, trash something
>   important and then lose your drive -- leaving you with no good
>   copies.
> 
>   Maybe `rdiff-backup` is just the thing?
> 
>     http://rdiff-backup.nongnu.org/

Oh yes, incremental backups make sense for lots of reasons...thanks
for the reminder, must bump this up on my todo list. I remember
trying to decide between dirvish and rdiff-backup a while back, but I
think rdiff-backup looks like the way to go. Though I've heard good
things about dirvish, I've read that its reliance on hard link trees
means "...Apart from not being a 1:1 backup (you lose hard links!),
the filesystem metadata storage explodes for any reasonable sized
filesystem and any reasonable frequency of backup."
http://lists.debian.org/debian-user/2009/07/msg02022.html

Also, archfs sound pretty cool -- a FUSE virtual filesystem that
allows you to mount a backup created by rdiff-backup and browse each
increment as though it were a regular directory structure.
http://code.google.com/p/archfs/
http://packages.debian.org/sid/archfs

*

I found the following interesting...maybe incorporate this into a
backup routine? :

Shielding your files with Reed-Solomon codes
http://ttsiodras.googlepages.com/rsbep.html

Some commentary about the above on Slashdot:
http://hardware.slashdot.org/article.pl?sid=08/08/03/197254

>From which I gleaned...

Current hard drives employ some such error-correction (but how much?
are some drives better than others in this regard?):
http://hardware.slashdot.org/comments.pl?sid=634559&no_d2=1&cid=24459631

* PAR will protect against an occasional bit error, but the
above mentioned R-S scheme will protect against bad sectors:
http://hardware.slashdot.org/comments.pl?sid=634559&no_d2=1&cid=24462959

* CDROM's by design employ some such error correction, evidently
dvdisaster can add additional levels of error correction:
http://hardware.slashdot.org/comments.pl?sid=634559&no_d2=1&cid=24462527
http://hardware.slashdot.org/comments.pl?sid=634559&no_d2=1&cid=24462733
http://dvdisaster.net/en/index.html

Brings to mind this idea of re-assembling collections of files from
say, a series of backups on a bunch of aging CDROM's each with varying
errors using bittorrent to stitch the pieces back together. Not sure
if this was ever implemented or just imagined...can't find a link ATM.

John


-- 
John Magolske
http://B79.net/contact



More information about the Noisebridge-discuss mailing list