zfs-rsync command? #114

psy0rz · 2022-01-28T09:29:21Z

There are some requests for making zfs-autobackup behave more like an rsync, and i agree. However i think it will be another command, which offcourse uses the same codebase but has some different parameters and features.

Please help in discussing what such a tool should and shouldn't do.

To compare, first we start with the features of zfs-autobackup with its default settings:

designed as purely a backup tool
requires a "name" to determine the select-property, holds and snapshot names
target location should exist
allows for easy selection/deselection by external tools via zfs property. (you dont need access to the backup server to select additional datasets.)
preserves as much as possible: the whole parent path and all its properties
creates and operates on its own snapshots
will have a tool to verify backups (via zfs-autoverify i'm working on now)

A tool like zfs-rsync, should perhaps have these features with its default settings:

uses the same codebase, has the same look and feel as zfs-autobackup, and the command line options are the same where it makes sense.
a tool to sync datasets/snapshots to another location
no "name" required
selecting sources and target synax is like rsync: zfs-rsync SRC [SRC]... DEST
will create the target if it doesnt exist (like rsync)
has --destroy-... options that deletes datasets/snapshots that are missing on source (like rsync)
will not preserve the parent path, just like rsync does.
non-recursive, unless specified with -r
will just transfer any latest snapshot.
can be run again to send increments if there are newer latest snapshots.
trailing slash of the selected source has the same effect as rsync, e.g:
zfs-rsync pool/source1/ targetpool/target, results in pool/source1 -> targetpool/target
zfs-rsync pool/source1 targetpool/target, results in pool/source1 -> targetpool/target/source1
will not create/destroy snapshots.
can operate on "pool level" as well:
- zfs-rsync --ssh-target server2 pool1 pool2 / -> sends over datasets under pool1 and pool2 to server2.

More details and optional stuff for zfs-rsync:

should we send over all the properties (and have the same --filter and --clear options), or should we by default not send over any properties at all by default?
should we get rid of --ssh-source/--ssh-target and use the user@host:/path syntax?
- I like the rsync syntax, but maybe we should support --ssh-source and --ssh-target as hidden options for people who are used to zfs-autobackup?
- ( also for zfs-autobackup this syntax makes less sense, since the "backupname" is not a path.)
might have things like --include and --exclude filtering.
might have options to filter snapshots with a certain format. (like zfs-autobackup)
might have options to select more that just the latest snapshot to send.
might have options so it will create/prune snapshots just like zfs-autobackup.
by specifying the right amount of options, zfs-rsync and zfs-autobackup might be interchangeable in some cases.

(regarding #41 and #113 )

The text was updated successfully, but these errors were encountered:

Scrin · 2022-01-28T10:30:40Z

I would love to have a "more like rsync" tool for migration purposes in addition to the zfs_autobackup for backups. Currently I have my own script for such cases, but a "properly maintained" solution would definitely be better. To me the above looks good, with a few comments/opinions/ideas:

There should be an option to create a snapshot and send that instead of the latest existing snapshot. This is really useful for migration situations where you want to make incremental transfers while a system is running before the final migration transfer with services stopped, or just "using like rsync to get the current data over".
There should be an option to sync all snapshots and not just the latest one (to make it more like zfs_autobackup), and to "truly sync everything". This is useful for situations where one is migrating data from one system to another and wants to keep previous snapshots for quicker rollbacks after the migration (ie. can just rollback to the local snapshots without needing to restore from the actual backups)

psy0rz · 2022-01-28T19:04:41Z

thanks for the reply! i agree with your points, i think they already fall somewhat on the above mentioned extra options:

might have options to filter snapshots with a certain format. (like zfs-autobackup)
might have options to select more that just the latest snapshot to send.
might have options so it will create/prune snapshots just like zfs-autobackup.

but indeed, sending the current data over should be easy and not require too many options or thinking

digitalsignalperson · 2022-01-28T19:05:35Z

I'd be interested in making it as close to mirroring everything as possible:

transfer all the snapshots
option to delete snapshots and datasets in the DEST that do not exist in the SOURCE
send all the properties unless specifying filter options etc.

One hack I've been considering for achieving this to mirror a pool to rotating offsite drives is

add the drive to the pool as another mirror
let it resilver/sync
offline the drive and rotate it out
next time you rotate it in, when you online the drive it syncs back up (https://docs.oracle.com/cd/E19253-01/819-5461/gazgk/index.html "When a device is brought online, any data that has been written to the pool is resynchronized with the newly available device. N")

This gives you robust (at least from no external tools to maintain) and exact mirror of the pool, but the con is you have a permanent warning about being degraded state. Reference: https://serverfault.com/a/641217

additional aside...almost wonder if you could do the offline/online sync to remote servers as well using a network block device or similar https://unix.stackexchange.com/questions/119364/how-can-i-mount-a-block-device-from-one-computer-to-another-via-the-network-as-a

psy0rz · 2022-01-28T19:14:05Z

ha! did i read that idea of changing disk on reddit r/zfs? i wanted to mention zfs-autobackup but i dont want to be too spammy with it :)

psy0rz · 2022-01-28T19:16:01Z

indeed, i forgot to menation the --delete or --destroy option that will indeed delete missing stuff. again, just like rsync, which is one of my all time favorite cli tools :)

psy0rz · 2022-01-28T19:19:33Z

i also agree it should be fairly easy to use it to make a close-as-possible mirror.

you could go one step (option) further and have it sync over changed properties as well. (not just the first time, like zfs-autobackup does)

digitalsignalperson · 2022-01-28T19:19:48Z

I don't think I saw the reddit thread, have a link? So I can learn all the grave warnings for why to not do it 😅
The person on server fault said

Just a quick update: over the past year this approach has worked well enough. Monthly restore tests of the offsite backup have been successful and consistent. Rotating an array (rather than a single disk) would be better to provide a level of redundancy in the offsite copy, and I would recommend doing that if possible. Overall this is still a hackish approach and does introduce some risk, but has provided a reasonably safe and inexpensive offsite backup of our data.

psy0rz · 2022-01-28T19:49:06Z

sorry cant find it anymore :( but if you google you can find more persons that tried to do this. i think its fairely ok, but a zfs-rsync would indeed allow you do to it offsite.

digitalsignalperson · 2022-06-11T21:42:18Z

any thoughts on bi-directional-ish zfs-rsync? ...zfs-syncthing?

Tangentially on topic... I'm currently testing on 3 VMs simulating 1 server, 2 workstations (root@golden-image1, root@golden-image2)

Loop this on server

# Push 1
zfs-autobackup -v \
    --ssh-target root@golden-image1 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Push 2
zfs-autobackup -v \
    --ssh-target root@golden-image2 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Pull 1
zfs-autobackup -v \
    --ssh-source root@golden-image1 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

# Pull 2
zfs-autobackup -v \
    --ssh-source root@golden-image2 \
    --keep-source 5,1min5min \
    --keep-target 5,1min5min \
    --no-holds \
    --min-change 1 \
    --strip-path=1 \
    data rpoolx

Thinning schedule just for testing purposes.

Dumbly pushing and pulling data from the workstations. Scenario of me as a single user having multiple PCs and laptops, keeping some datasets in sync.

e.g.

rpoolx/DATA/media
rpoolx/DATA/email
rpoolx/DATA/workdata1
rpoolx/DATA/workdata2

Timing it. When no changes it takes some time to analyze and exit. (aside: curious what causes analysis to be "slow")

    Push 1  7.8s
    Push 2  7.9s
    Pull 1  7.0s
    Pull 2  7.1s
    
    Total 29.8s

I can make a change to any dataset on any workstation and it takes minimum ~30sec to make it around everywhere (or sometimes it has to loop twice to deal with thinning). The "min-change 1" is key to this working. Could probably make the (push, pull) per workstation in parallel to make it tight and not grow with increasing nodes.

Not seeing any conflicts because only editing files on one workstation at a time. Even with the datasets mounted in all places (atime=false in case concurrent browsing causes a change). Can do things like take a laptop offline offsite and automatically push back changes later. If I intentionally make a conflict writing to two workstations within 1 minute, nothing explodes, I see the error and just rollback and/or copy the change manually. In practice I might assume responsibility for only one workstation "checking out" the dataset at a time with a read/write mount.

Could be some potential to automatically detect conflicts, rollback/resolve, and rsync conflicting changes to the server. Or maybe more robust and seamless a fuse driver and locking mechanism to ensure consistent replication always... z(c)luster??

digitalsignalperson · 2023-11-06T00:27:23Z

From my comment here #113 (comment) something looking closer to rsync for zfs?

psy0rz added the RFC Request for comment label Jan 28, 2022

This was referenced Jan 28, 2022

Use source pool names as target pool name #41

Closed

specify source dataset(s) instead of property name #113

Open

psy0rz added this to the 3.3 milestone Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs-rsync command? #114

zfs-rsync command? #114

psy0rz commented Jan 28, 2022 •

edited

Loading

Scrin commented Jan 28, 2022

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jan 28, 2022 •

edited

Loading

psy0rz commented Jan 28, 2022

psy0rz commented Jan 28, 2022 •

edited

Loading

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jan 28, 2022

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jun 11, 2022

digitalsignalperson commented Nov 6, 2023 •

edited

Loading

zfs-rsync command? #114

zfs-rsync command? #114

Comments

psy0rz commented Jan 28, 2022 • edited Loading

Scrin commented Jan 28, 2022

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jan 28, 2022 • edited Loading

psy0rz commented Jan 28, 2022

psy0rz commented Jan 28, 2022 • edited Loading

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jan 28, 2022

psy0rz commented Jan 28, 2022

digitalsignalperson commented Jun 11, 2022

digitalsignalperson commented Nov 6, 2023 • edited Loading

psy0rz commented Jan 28, 2022 •

edited

Loading

digitalsignalperson commented Jan 28, 2022 •

edited

Loading

psy0rz commented Jan 28, 2022 •

edited

Loading

digitalsignalperson commented Nov 6, 2023 •

edited

Loading