This does increase the amount of CPU and I/O that both your sending and receiving side use, but I’ve been able to run ~25 parallel instances without remotely degrading the rest of the system or slowing down the other RSYNC instances.
The key is to use the –include and –exclude command line switches to create selection criteria.
Example
drwxr-xr-x 2 root root 179 Jul 19 16:22 directory_a
drwxr-xr-x 2 root root 179 Aug 12 00:08 directory_b
If directory_a has 2,000,000 files underneath it. and directory_b also has 2,000,000 files, use the following idea to split them up. The –exclude option says in essence to “exclude everything that is not explicitly included”.
#!/bin/bash
rsync -av --include="/directory_a*" --exclude="/*" --progress remote::/ /localdir/ > /tmp/myoutputa.log &
rsync -av --include="/directory_b*" --exclude="/*" --progress remote::/ /localdir/ > /tmp/myoutputb.log &
The following will take about twice the amount of time gathering files than the above:
#!/bin/bash
rsync -av --progress remote::/ /localdir/ > /tmp/myoutput.log &
No comments:
Post a Comment