Using rsync with RegEx - regex

I am using rsync to sync folders and their content between a Linux server and a network storage to backup files. For this, I am using this line of code:
rsync -rltPuz -k --chmod=ugo+rwx --prune-empty-dirs --exclude=*backup* --exclude=*.zip --exclude=*.zip.bak --password-file=/rsync_pw.txt /source/ user#storage::Kunden/Jobs
This Code is running on the source via crontab. Everything works fine.
But now I have a little problem. My directories are built like this:
Jobs
Job1
new
all new files
ready
all ready files
Job2
new
all new files
ready
all ready files
I need only to sync all ready folders and their content. I have tried around with --include and --exclude but I did not really got what I needed. Is there a way to tell rsync what I want?
Thanks for your time!

You can use find /path/to/Jobs -name ready and pipe its output to rsync or use find option -exec and place you rsync call there.
In your example the final command will look like:
find Jobs/ -name 'ready' -exec rsync -rltPuz -k --chmod=ugo+rwx --prune-empty-dirs --exclude=*backup* --exclude=*.zip --exclude=*.zip.bak {}/ dest \;
On my ubuntu it works:
kammala#devuntu:~$ ls -R dest/
dest/:
kammala#devuntu:~$ ls -R Jobs/
Jobs/:
Job1 Job2
Jobs/Job1:
new ready
Jobs/Job1/new:
new1.txt new2.txt some_new_backup.txt
Jobs/Job1/ready:
r1.txt r2.txt some_backup_file.txt
Jobs/Job2:
new ready
Jobs/Job2/new:
new3.txt new4.txt zipped_bckp.zip.bak
Jobs/Job2/ready:
r4.txt r5.txt r6.txt some_zipped_file.zip.bak
kammala#devuntu:~$ find Jobs/ -name 'ready' -exec rsync -rltPuz -k --chmod=ugo+rwx --prune-empty-dirs --exclude=*backup* --exclude=*.zip --exclude=*.zip.bak {}/ dest \;
building file list ...
3 files to consider
./
r1.txt
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=1/3)
r2.txt
0 100% 0.00kB/s 0:00:00 (xfr#2, to-chk=0/3)
building file list ...
4 files to consider
./
r4.txt
0 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=2/4)
r5.txt
0 100% 0.00kB/s 0:00:00 (xfr#2, to-chk=1/4)
r6.txt
0 100% 0.00kB/s 0:00:00 (xfr#3, to-chk=0/4)
kammala#devuntu:~$ ls -R dest
dest:
r1.txt r2.txt r4.txt r5.txt r6.txt

Eight years later I find this post after days of pounding on globbing and escaping issues for command option parameters. This was doubly important as my IDE was applying "exclude" options for rsync without quotes or escaping.
CompSci 101:
Glob characters ? * [ ] are expanded by the shell before the command is executed. And, they are expanded based on the current working directory. (Yeah, I forget all the places that this applies, too.) This is why it might seem to work in situations.
This includes your option to rsync, --exclude=*.zip. Those parameters need to be either escaped or quoted. So, omitting other options for brevity:
rsync -av --exclude='*backup*' --exclude='*.zip' --exclude='*.zip.bak' /source/ user#storage::Kunden/Jobs
or
rsync -av --exclude=\*backup\* --exclude=\*.zip --exclude=\*.zip.bak /source/ user#storage::Kunden/Jobs
If you are unsure of what the results of an include, exclude, or filter combination is and what is being sent to, say, a production server, you can test your command with the options --dry-run or -n and --debug=filter. You'll get a list of files that are shown or hidden from the planned transfer.

Related

Youtube-DL - Batchfile

I have a list of youtube urls.
The list is stored in a batch-file.txt
I would like to download each URL and rename with a given name.m4a
batch-file.txt
youtube-dl -f 'bestaudio[ext=m4a]' 'https://www.youtube.com/watch?v= ...' --output '...m4a'
youtube-dl -f 'bestaudio[ext=m4a]' 'https://www.youtube.com/watch?v= ...' --output '...m4a'
youtube-dl -f 'bestaudio[ext=m4a]' 'https://www.youtube.com/watch?v= ...' --output '...m4a'
If I run the commands individually, it works.
If I run the batch file via
youtube-dl --batch-file='batch-file.txt'
it does not work.
What do I need to write in the batch-txt file?
How do I call the batch file to download the m4a files simultaneously (if possible)
Many Thanks,
BM
Batch file contains only the URLs, no other parameters.
batch-file.txt
https://www.youtube.com/watch?v=...
https://www.youtube.com/watch?v=...
.
Here is the line to run the youtube-dl command starting with leading number 1
youtube-dl -ciw -f 'bestaudio[ext=m4a]' --batch-file='batch-file.txt' -o '%(autonumber)02d. %(title)s.%(ext)s'
Here is the line to run the youtube-dl command starting with leading number 35 (in case you want to continue at another time)
youtube-dl -ciw -f 'bestaudio[ext=m4a]' --batch-file='batch-file.txt' -o '%(autonumber)02d. %(title)s.%(ext)s' --autonumber-start 35
Missing part:
Parallel / Simultaneous Download. But I can live with the approach above.

How come file is not excluded with gsutil rsync -x by the Google Cloud Builder?

I am currently running the gsutil rsync cloud build command:
gcr.io/cloud-builders/gsutil
-m rsync -r -c -d -x "\.gitignore" . gs://mybucket/
I am using the -x "\.gitignore" argument here to try and not copy over the .gitignore file, as mentioned here:
https://cloud.google.com/storage/docs/gsutil/commands/rsync
However, when looking in the bucket and the logs, it still says:
2021-04-23T13:29:37.870382893Z Step #1: Copying file://./.gitignore [Content-Type=application/octet-stream]...
So rsync is still copying over the file despite the -x "\.gitignore" argument.
According to the docs -x is a Python regexp, so //./.gitignore should be captured by \.gitignore
Does anyone know why this isn't working and why the file is still being copied?
See the rsync.py source code:
if cls.exclude_pattern.match(str_to_check):
In Python, re.match only returns a match if it occurs at the start of string.
So, in order to find a match anywhere using the -x parameter, you need to prepend the pattern you need to find with .* or with (?s).*:
gcr.io/cloud-builders/gsutil
-m rsync -r -c -d -x ".*\.gitignore" . gs://mybucket/
Note that to make sure .gitignore appears at the end of string, you need to append $, -x ".*\.gitignore$".

Using 'gsutil mv' without any exceptions shown

I am using gsutil to move files, but this generates an exception if none of the files are moved.
This is the command I run:
gsutil -m mv gs://{url}/20200116* gs://{destional url}/data/rtbiq_data/
The exception I see:
CommandException: No URLs matched: gs://{url}/20200116*
CommandException: 1 file/object could not be transferred.
I want it to just go through without any exception being thrown even if none of the files are moved. How can I do that?
There isn't an official gsutil mv option to do that, but an one approach is to simply redirect the output of the command to /dev/null, essentially supressing stderr:
gsutil -m mv gs://{url}/20200116* gs://{destional url}/data/rtbiq_data/ 2> /dev/null

.sh file works in terminal but not in python script (rclone w/ Raspberry Pi)

I'm having trouble running a .sh file in python. When I type in the location of the .sh file (/home/pi/file/script.sh) the script runs perfectly.
I'm trying to run this script in my python2 script and I've done the following methods:
subprocess.Popen(['bash', 'location of .sh'])
subprocess.call(['location of .sh'])
os.popen(['location of .sh'])
When I run the python script, I get a prompt from rclone saying "Command sync needs 2 arguments maximum"
My .sh file just includes:
#!/bin/sh
sudo /usr/local/bin/rclone -v sync /home/pi/some_project_data remote:rclone --delete-before --include *.csv --include *.py
I'm not sure how running the .sh file on terminal works fine, but this error pops up when I'm trying to run the .sh file using Python.
Your script fails whenever you run it in a directory containing 2 or more .csv or .py files. This is true for terminals as well as via Python.
To avoid that, quote your patterns so the shell doesn't expand them:
#!/bin/sh
sudo /usr/local/bin/rclone -v sync /home/pi/some_project_data remote:rclone \
--delete-before --include "*.csv" --include "*.py"
Please try:
os.popen('bash locationof.sh')
ex:
os.popen('bash /home/script.sh')
That worked on my machine. If you place square brackets around the string then python assumes it is a list and popen doesnt accept a list, it accepts a single string.
If the script doesnt work, then this won't fix that, but it will at least run it. If it still doesnt work, try running the script with something like
touch z.txt
and see if z.txt appears in the file explorer. If it does, then your .sh file has a problem.

Using wget or curl with a changing name file

First of all please excuse my bad english, I'll try to get understandable.
I'm using a batch file (Windows, cmd.exe) to retrieve and silently install Adobe Flash on my computer.
The batch works well, but I have a problem when there is a major version change on Adobe servers.
Here is the command line batch:
#echo off
setlocal enableextensions
md c:\temp\flash
pushd c:\temp\flash
wget -nH --cut-dirs=5 -r --timestamping http://download.macromedia.com/get/flashplayer/current/licensing/win/install_flash_player_15_plugin.exe
wget -nH --cut-dirs=5 -r --timestamping http://download.macromedia.com/get/flashplayer/current/licensing/win/install_flash_player_15_active_x.exe
echo Closing browsers
pause
taskkill /f -im firefox.exe -im iexplore.exe
install_flash_player_15_plugin.exe -install -au 2
install_flash_player_15_active_x.exe -install -au 2
popd
setlocal disableextensions
pause
When Flash is upgraded to the next version, the filename changes from install_flash_player_15_active_x.exe
to
install_flash_player_16_active_x.exe
and the batch must be manually corrected or else it is stuck with an old version.
Is there any way to replace the version number with wildcards or some regular expression in order to have wget to retrieve the latest file when its name change ?
Or at least, is there any Windows compatible command line tool which parses the file names on a server, find the latest and passes it as a variable for wget (or cURL) ?
Thank you
You don't need Regular Expressions to get the current version of flash for IE and Firefox. Just change the URLs to
For Firefox: http://download.macromedia.com/pub/flashplayer/current/support/install_flash_player.exe
For IE: http://download.macromedia.com/pub/flashplayer/current/support/install_flash_player_ax.exe
wget -nH --cut-dirs=5 -r --timestamping http://download.macromedia.com/pub/flashplayer/current/support/install_flash_player.exe
wget -nH --cut-dirs=5 -r --timestamping http://download.macromedia.com/pub/flashplayer/current/support/install_flash_player_ax.exe