[SOLVED] Setting up an rsync server: advice needed

hazel · 12-23-2023, 04:16 AM

Quote:

Originally Posted by IsaacKuo

Yes, there are parts that a normal user will not have read access to. What's more important, though, is that a normal user will not be able to replicate the user:group ownership nor replicate the proper permissions on the copied files. Even if the user could read all of the files, the user couldn't create a usable backup because the backup won't have the proper ownership and permissions.

Right! So the way to get stuck in is to get the data backup set up first and run it for a couple of weeks. Then, when I feel more confident, find a way to add in backups of the two system partitions later.

hazel · 12-23-2023, 07:52 AM

Well, it looks as if my initial rsync command for dumping data should be something like:

Code:

rsync -axPvn /home/data littleboy:/home/hazel/dumps

for the first time (testing) and then

Code:

rsync -axP /home/data littleboy:/home/hazel/dumps

to do it in anger. That should create a tree under /home/hazel/dumps/data. Except that from what I've read, the -x option will not copy the contents of data because it's actually a mount point! Maybe not use -x for the data dump; I'll certainly need it for system dumps because I'll want to start at / but exclude the data partition and the dynamic filesystems. And I don't know if I need -H. What's a sparse file anyway?

Once I have got it to work, I'd like to progress to turbocapitalist's 7-day system with hard links, which looks really cool and ensures that I will have father and grandfather copies if needed.

For the two system partitions, dumping after the first boot following an update should be good enough, but I'll need to find out how to do it as root.

PS: Just booted littleboy with ethernet connection. Both eth0 and wlan0 came up with different local ip addresses. So I assume I can choose my connection by setting the appropriate address for littleboy in Slackware's /etc/hosts file.

PPS: Bah! I didn't actually have rsync on my very stripped-down Slackware. Just installed it, now I have to chase up its dependencies. Well, I'm not doing that by day; too expensive! I'll do it tomorrow morning in the early hours when I get cheap rates, then try again and report back.

Petri Kaukasoina · 12-23-2023, 08:48 AM

Quote:

Originally Posted by hazel

What's a sparse file anyway?

For example, /var/log/lastlog is a sparse file. It contains mostly null bytes and the file system does not need to store the nulls. rsync without -S makes a copy containing all those nulls verbatim with 1596 blocks of 1k. But rsync with -S needs only 8 blocks as the original.

Code:

# rsync -aS /var/log/lastlog /tmp 
# ls -ls /var/log/lastlog /tmp/lastlog 
8 -rw-r--r-- 1 root root 1632864 2023-11-29 17:47 /tmp/lastlog
8 -rw-r--r-- 1 root root 1632864 2023-11-29 17:47 /var/log/lastlog
# rm /tmp/lastlog
# rsync -a /var/log/lastlog /tmp 
# ls -ls /var/log/lastlog /tmp/lastlog 
1596 -rw-r--r-- 1 root root 1632864 2023-11-29 17:47 /tmp/lastlog
   8 -rw-r--r-- 1 root root 1632864 2023-11-29 17:47 /var/log/lastlog

Petri Kaukasoina · 12-23-2023, 08:56 AM

Quote:

Originally Posted by hazel

And I don't know if I need -H.

For example, in /usr/lib64/dri there are many hard linked files. Without -H, rsync creates separate files, and with -H it preserves the hard links, using the disk space only once.

Code:

# rsync -a /usr/lib64/dri /tmp
# du -sh /usr/lib64/dri /tmp/dri
61M     /usr/lib64/dri
411M    /tmp/dri
# rm -rf /tmp/dri 
# rsync -aH /usr/lib64/dri /tmp
# du -sh /usr/lib64/dri /tmp/dri
61M     /usr/lib64/dri
61M     /tmp/dri

hazel · 12-23-2023, 09:03 AM

So I will need both -H and -S for my system dumps. Thanks a lot; that's good to know. But I doubt if I need either for the data dump. All the files on that partition were created by me and they are all standard data formats. Certainly no hard links, although there might be one or two soft ones.

I feel I have an itinerary now, thanks to all you lovely people. Last night when I started this thread, it just felt like ERR!?

Hey! When I've finished this project, I'll write a blog on it.

jailbait · 12-23-2023, 09:24 AM

Quote:

Originally Posted by hazel

Doesn't rsync keep successive copies automatically unless you explicitly delete them?

When a file changes then rsync overwrites the backup copy with the latest copy.

If the live file is deleted then what rsync does depends on whether or not you have specified --delete in the rsync command. If you haven't specified --delete then rsync does nothing to the backup file and over time the backup partition will fill up with garbage. If you specify --delete then rsync will delete any backup files that do not have existing live files. That is why you need multiple generations of backup, to give you a grace period before a file is deleted forever or overwritten before you discover that you need to fall back to a previous version of the file.

rclark · 12-23-2023, 10:10 AM

As root user, I backup multiple directories in a bash script. I find the -av --delete is all I need to keep it 'simple'.
FYI the man page show -a is same as -rlptgoD . I don't backup the OS just the /home and any data directories out side of /home that I might have created.

Sample:

...
rsync -av --delete /homedata/virtualBoxVMs /mnt/usbdrive/
rsync -av --delete /homedata2/development /mnt/usbdrive/
...

hazel · 12-25-2023, 09:39 AM

Yippee! I just rsynced the data directory on bigboy over to littleboy. It took 49 minutes by wireless. I was trying to get it to use the ethernet connection, which would have been faster, but it didn't do that for some reason (advice needed in due course). I am planning to do differential dumps weekly rather than daily as there isn't a big data churn on my machine compared to Turbocapitalist's.

Now if I do that for a month, using the --link-dest option to hard link to files that haven't changed, and from then on delete the oldest one every week, will that work? I gather from what I know about hard links that deleting the directories that contain them doesn't delete the corresponding files as long as there is at least one other hard link to each one. So deleting today's dump directory (data-25-12-2023) shouldn't affect the files I have just copied, once there are later dump directories with hard links to them. But files that have been deleted on bigboy will eventually disappear on littleboy too when the last directory dump to contain a link to them is erased a month later. Is that correct? If so, there is no need to explicitly delete old files.

I have found that I need to specify hazel@littleboy in my target for a data dump. If I just put littleboy, it asks for a root password but doesn't say which machine! I assumed it meant littleboy's root but when I used that password, it was rejected. When I get to dumping system partitions, I will need to be root, maybe on both machines (?)

Turbocapitalist · 12-25-2023, 10:14 AM

Yes, the --link-dest links unchanged files in the two directories to avoid making a whole new copy, so deleting he oldest directory will leave the other directories unaffected. Files stick around until the last reference is deleted:

Code:

cd /tmp/
touch x
ln x y
stat -c '%i\n' x y
rm x
stat -c '%i\n' x y

If you have the date from GNU coreutils, then you can do relative dates easily:

Code:

#!/bin/sh

d=$(date +'%F')

thisweek=$(date -d $d +'%V')
linkdest=$(date -d "$d - 1 week" +'%V')
deleteweek=$(date -d "$d - 5 weeks" +'%V')

echo $thisweek
echo $linkdest
echo $deleteweek

exit 0

hazel · 12-25-2023, 10:39 AM

Good! I want eventually to have a script that does this and run it as a cron job (probably via anacron as I keep irregular hours). Using your functions will allow me to automatically name the destination directories. Of course I will then have to put a slash after "data" to ensure that only the contents get transferred. But for a month or so, I want to do it by hand just to check that everything goes smoothly.

Now why did rsync use wifi rather than ethernet? I put lines into /etc/hosts for littleboy with both ip addresses and commented out the one corresponding to wlan0, so why did the router still use that one?

Turbocapitalist · 12-25-2023, 10:50 AM

Quote:

Originally Posted by hazel

Now why did rsync use wifi rather than ethernet? I put lines into /etc/hosts for littleboy with both ip addresses and commented out the one corresponding to wlan0, so why did the router still use that one?

It could be that your main machine is using the router's DNS service/proxy as its first choice that would pick up the WLAN address for littleboy through that. Many wi-fi routers are set up so that your self-reported host names are in the router's DNS service/proxy and so other machines on the WLAN can look them up using that self-reported name.

However, someone with networking knowledge would have to say whether there is another reason and maybe the following work-around would be unnecessary then:

One possible work-around would be to use the -B option with the SSH client call using Rsync's -e option.

Code:

rsync -e 'ssh -B eth0 -l hazel' ...

That can be put into the SSH client's configuration file using the BindInterface option. That file is good for making a lot of short cuts with multiple options for specific connections. The configuration file for the SSH client is one of the more seriously underappreciated capabilities among the common, every day tools.

hazel · 12-25-2023, 11:04 AM

Actually it's only a problem for the very first dump of a partition. All the later ones will be much smaller. So probably not worth chasing up.

I'm appending a first draft of a script that incorporates your very useful dating suggestions.

Oops! The line for deleting the oldest dump needs rm -fR, not just rm.

jailbait · 12-25-2023, 04:56 PM

Quote:

Originally Posted by hazel

I'm appending a first draft of a script that incorporates your very useful dating suggestions.

Your logic is complicated enough that I would have to set up a test and run it to make sure exactly what the script does over a five week interval. Scanning the overall logic I think that you may often have the situation where the only backup copy of a file is in the oldest backup and all the newer backups have a hard link to that file. When you delete the oldest backup what will the hard links point to?

Turbocapitalist · 12-25-2023, 09:19 PM

Quote:

Originally Posted by jailbait

When you delete the oldest backup what will the hard links point to?

The hard links would still point to the same inode as before. See the exercise in #24 above. Now symbolic links would be another matter, but Rsync produces hard links and not symbolic links so that potential concern would be moot.

hazel · 12-25-2023, 11:36 PM

Quote:

Originally Posted by jailbait

Scanning the overall logic I think that you may often have the situation where the only backup copy of a file is in the oldest backup and all the newer backups have a hard link to that file. When you delete the oldest backup what will the hard links point to?

That's precisely what I initially asked myself. It's because we are encouraged to think of files as being in directories. The visual metaphor of the folder encourages this. And then the first dump becomes mystically special because it has the actual files in it while the later ones only have links. But it just ain't so. The files are just somewhere on the partition, we don't need to know where. Only their locations are stored inside the parent directory as hard links.

Normally when a directory is deleted, the contents are deleted too because the only hard links to them are stored in the directory file. Every file has a link count field and when the value of that drops to zero, the filesystem driver knows it can recycle those blocks. But if you have another set of hard links to the same files in another directory, they won't be deleted.