Mirror only files having specific string in file path - regex

I'm trying to mirror only those branches of a directory tree that contain a specific directory name somewhere within the branch. I've spent several hours trying different things to no avail.
A remote FTP site has a directory structure like this:
image_db
movies
v2
20131225
xyz
xyz.jpg
20131231
abc
abc.jpg
AllPhotos <-- this is what I want to mirror
xyz
xyz.jpg
abc
abc.jpg
v4
(similar structure to 'v2' above, contains 'AllPhotos')
...
tv_shows
(similar structure to 'movies', contains 'AllPhotos')
other
(different paths, some of which contain 'AllPhotos')
...
I am trying to create a local mirror of only the 'AllPhotos' directories, with their parent paths intact.
I've tried variations of this:
lftp -e 'mirror --only-newer --use-pget-n=4 --verbose -X /* -I AllPhotos/ /image_db/ /var/www/html/mir_images' -u username,password ftp.example.com
...where the "-X /*" excludes all directories and "-I AllPhotos/" includes only AllPhotos. This doesn't work, lftp just copies everything.
I also tried variations of this:
lftp -e 'glob -d -- mirror --only-newer --use-pget-n=4 --verbose /image_db/*/*/AllPhotos/ /var/www/html/mir_images' -u username,password ftp.example.com
...and lftp crunches away at the remote directory structure without actually creating anything on my side.
Basically, I want to mirror only those files that have the string 'AllPhotos' somewhere in the full directory path.
Update 1:
If I can do this with wget, rsync, ftpcopy or some other utility besides lftp, I welcome suggestions for alternatives.
Trying wget didn't work for me either:
wget -m -q -I /image_db/*/*/AllPhotos ftp://username:password#ftp.example.com/image_db
...it just gets the whole directory structure, even though the wget documentation says that wildcards are permitted in -I paths.
Update 2:
After further investigation, I am coming to the conclusion that I should probably write my own mirroring utility, although I still suspect I am approaching lftp the wrong way, and that there's a way to make it mirror only files that have a specific string in the absolute path.

One solution :
curl -s 'ftp://domain.tld/path' |
awk '/^d.*regex/{print $NF}' |
xargs wget -m ftp://domain.tld/path/
Or using lftp :
lftp -e 'ls; quit' 'ftp://domain.tld/path' |
awk '/^d.*regex/{print $NF}' |
xargs -I% lftp -e "mirror -e %; quit" ftp://domain.tld/path/

Related

Using xargs, eval, and mv ensemble

I've been using the command line more frequently lately to increase my proficiency. I've created a .txt file containing URLs for libraries that I'd like to download. I batch-downloaded these files using
$ cat downloads.txt | xargs wget
When using the wget command I didn't specify a destination directory. I'd like to move each of the files that I've just downloaded into a directory called "vendor".
For the record, it has occurred to me that if I ran...
$ open .
...I could drag-and-drop these files into the desired directory. But in my opinion that would defeat the purpose of this exercise.
Now that I have the files in my cwd, I'd like to be able to target them and move them into the "vendor" directory.
As a side-question: Is there a useful way to print the most recently created files to STDOUT? Currently, I can grab the filenames from the URLs within downloads.txt pretty simply using the following pipeline and Perl script...
$ cat downloads.txt | perl -n -e 'if (/(?<=\/)([-.a-z]+)$/) { print $1 . "\n" }'
This will produce...
react.js
redux.js
react-dom.js
expect.js
...which is great as these are file that I intended on targeting. I'd like to transform each of these lines into a command within a pipeline that resembles this...
$ mv {./,./vendor/}<filename>
... where <filename> is "react.js" then "redux.js", and so forth.
I figure that I may be able to accomplish this using some combination of xargs, eval, and mv. This is where my bash skills drop-off.
Just to reiterate, I'm aware that the method in which I am approaching this problem is neither simple nor ideal. This is intentionally a convoluted exercise intended to stretch my bash knowledge.
Is there anyone who knows how I can use xargs, eval, and mv to accomplish this goal?
Thank you!
xargs -l -a downloads.txt basename | xargs -i mv {} ./vendor
How this works: The first instance of xargs reads the file names from downloads.txt and calls basename for each of these file names individually (alternatively, you could use basename -a). These basenames are then piped to another instance of xargs, which uses the arguments to call mv, replacing the string {} with the current argument.
mv $(basename -a $(<downloads.txt)) ./vendor
How this works: Since you want to move all the files into the same directory, you can use a single call to mv. The command substitution ("backticks") inserts the output of the command basename -a, which, in turn, reads its arguments from the file.

How do I make Wget name files as part of URL?

Short story:
I want Wget to name downloaded files as they match regex token ([^/]*)
wget -r --accept-regex="^.*/([^/]*)/$" $MYURL
Full story:
I use GNU Wget to recursively download one specific folder under particular WordPress website. I use regex to accept only posts and nothing else. Here is how I use it:
wget -r --accept-regex="^.*/([^/]*)/$" $MYURL
It works and Wget follows all the desired URLs. However, it saves files as .../last_directory/index.html, but I want these files to be saved as last_directory.html (.html part is optional).
Is there a way to do that with Wget alone? Or would you suggest how to do the same thing with sed or similar tools?
You could use sed.
wget -r --accept-regex="^.*/([^/]*)/$" $MYURL | sed 's~\(.*\)/[^.]*~\1~'
Example:
$ echo '/foo/last_directory/index.html' | sed 's~\(.*\)/[^.]*~\1~'
/foo/last_directory.html

Bash script to change file extension using regex

I have a lot of files i've copied over from my iphone file system, to start with they were mp3 files, but app on iphone changed their names to some random staff which looks like:
1c03e04cc1bbfcb0c1237f57f1d0ae2e.mp3?extra=f7NhT68pNkmEbGA_I1WbVShXQ2E2gJAGBKSEyh3hf0hsbLB1cqnXDuepYA5ubcFm_B3KSsrXDuKVtWVAUh_MAPeFiEHXVdg
I only need to remove part of file name after mp3. Please give me a script - there are more than 600 files, and manually it is impossible.
you can use rename command:
rename "s/mp3\?.*/mp3/" *.mp3*
#!/bin/bash
shopt -s nullglob
for F in *.mp3\?*; do
echo mv -v -- "$F" "${F%%.mp3\?*}.mp3"
done
Save it to a script like script.sh then run as bash /path/to/script.sh in the directory where the files exist.
Remove echo when you find it correct already.

Move all images in folder to subfolder, and update all references in text files to those images to their new location?

I have a folder which contains a ~50 text files (PHP) and hundreds of images. I would like to move all the images to a subfolder, and update the PHP files so any reference to those images point to the new subfolder.
I know I can move all the images quite easily (mv *.jpg /image, mv *.gif /image, etc...), but don't know how to go about updating all the text files - I assume a Regex has to be created to match all the images in a file, and then somehow the new directory has to be appended to the image file name? Is this best done with a shell script? Any help is appreciated (Server is Linux/CentOs5)
Thanks!
sed with the -i switch is probably what you're looking for. -i tells sed to edit the file in-place.
Something like this should work:
find /my/php/location -name '*.php' | xargs sed -ie 's,/old/location/,/new/location/,g'
You could do it like this:
#!/bin/sh
for f in *.jpg *.png *.gif; do
mv $f gfx/
for p in *.txt; do
sed -i bak s,`echo $f`,gfx/`echo $f`,g $p
done
done
It finds all jpg/png/gif files and moves them to the "gfx" subfolder, then for each txt file (or whatever kind of file you want it edited in) it uses "sed" in-place to alter the path.
Btw. it will create backup files of the edited files with the extra extension of "bak". This can be avoided by omitting the "bak" part in the script.
This will move all images to a subdir called 'images' and then change only links to image files by adding 'images/' just before the basename.
mkdir images
mv -f *.{jpg,gif,png,jpeg} images/
sed -i 's%[^/"'\'']\+\.\(gif\|jpg\|jpeg\|png\)%images/\0%g' *.php
If you have thousands of files, you may need to utilize find and xargs. So, a bit slower
find ./ -regex '.*\(gif\|jpg\|png\|jpeg\)' -exec mv {} /tmp \;
find ./ -name \*.php -print0 | \
xargs -0 sed -i 's%[^/"'\'']\+\.\(gif\|jpg\|jpeg\|png\)%images/\0%g' *.php
Caution, it will also change the path to images with remote urls. Also, make sure you have a full backup of your directory, php syntax and variable names might cause problems.

script to add files to SVN with filters

My bash scripting is weak. I want to create a script that filters and add files to the svn.
So far i have this
ls | egrep -v "(\.tab\.|\.yy\.|\.o$|\.exe$|~$)"
I tried to output it using exec but couldnt figure out how. Before that I checked if svn add uses regex. I am not sure if it does and i couldnt figure out how to reverse the above without the -v (i tired "[^((\.tab\.|\.yy\.|\.o$|\.exe$|~$))]" but that didnt work as expected (it seems to only ignore .tab. files))
How do i create a script to add files to svn after applying a filter? Would this be the most simple way? -> use ls, grep, put into a bash array then use a foreach with an svn add $element ?
NOTE: This is using linux, i dont think i'll have this running on windows (i couldnt set up bison) so as long as it works on most linux distros i am happy. Ignore the fact the above uses .exe
A number of ways:
Use backticks: svn add ``ls | egrep stuff
Use xargs: ls | egrep stuff | xargs svn add
Use find and xargs: find . -type f -name *.c -print | grep -v '\.svn' | xargs svn add
Obviously, change "stuff" and the "-name *.c" to suit your requirements...
Try using find.
find <pattern> -prune .svn -exec svn add {} \;
The command following exec will be executed for each file and {} will be replaced with the filename at each iteration.
I'm not in front of my linux system so I can't get you a pattern that you need right now but if you read the man, you might get there.
Another solution to this is to add those file extensions and the .svn folder to your SVN ignore pattern.
Armed with a client configured as such, you could then do svn add * and get only what you want into SVN.