Bash copy all directory with content that matches a pattern - regex

Is there some way to copy the directories including the contents using bash script. For example
// Suppose there are many directory inside Test in c as,
/media/test/
-- en_US
-- file1
-- file 2
-- de_DE
-- file 1
-- SUB-dir1
-- sub file 1
-- file 2
.....
.....
-- Test 1
-- testfile1
-- folder
--- more 1
............
NoW i want to copy all the directories (including sub-directory and files)
to another location which matches the pattern.
--> for example , in above case I want the directories en_US and de_DE to be copied in another
location including sub-directories and files.
So Far I have done/ find out :
1) Needed Pattern as , /b/w{2}_/w{2}/b
2) I can list all the directories as ,
$MYDIR="/media/test/"
DIRS=`ls -l $MYDIR | egrep '^d' | awk '{print $10}'`
for DIR in $DIRS
do
echo ${DIR}
done
Now I need help in combining these together so that the script can copy all the directory(including sub contents) that matches the pattern to another location.
Thanks in advance.

To selectively copy an entire directory structure to a similar directory structure, while filtering the contents, in a general way your best bet is to archive the original directory and unarchive. For instance, using GNU Tar:
$ mkdir destdir
$ tar -c /media/test/{en_US,de_DE} | tar -C destdir -x --strip-components=1
In this example, the /media/test directory structure is partially recreated under destdir, excluding the /media prefix (thanks to --strip-components=1).
The left side tar archives just the directories/paths which match the pattern that we specified. The archive is produced on that command's standard output, which is piped to the decoding tar on the right hand side. The -C tells it to change to the destination directory. It extracts the files there, removing a leading path component.
$ ls destdir
test
$ ls destdir/test
en_US de_DE
Of course, your specific example test case is quite easily handled with cp -a:
$ mkdir destdir
$ cp -a /media/test/{en_US,de_DE} destdir
If the pattern is complicated, involving multiple selections of subtree material at deeper and/or different levels of the source directory hierarchy, then you need the more general approach, if you wish to do the copy in a single batch command which just specifies source patterns.

I'm not sure about your environment, but I guess you try to do this:
cp -r src_dir/??_?? dest_dir

Here is your starter for 10:
You will have to add the extra checks and balances that you require but it should give you a flying start.
#!/bin/bash
# assumes $1 is source to search and $2 to destination to copy to
subdirs=`find $1 -name ??_?? -print`
echo $subdirs
for x in $subdirs
do
echo $x
cp -a $x $2
done

Please check if this is what you wanted. It searches for directories with format xx_yy/ab_cd/&&_$$ (2char_2char) and copies the content to a new directory .
usage : ./script.sh
cat script.sh
#!/bin/bash
MYDIR="/media/test/"
NEWDIRPATH="/media/test_new"
DIRS=`ls -l $MYDIR | grep "^d" | awk '{print $9}'`
for DIR in $DIRS
do
total_characters=`echo $DIR | wc -m`
if [ $total_characters -eq 6 ]; then
has_underscore=`echo "$DIR" | grep "_"`
if [ "$has_underscore" != "" ]; then
echo "${DIR}"
start_string_count=`echo $DIR | awk -F '_' '{print $1}' | wc -m`
end_string_count=`echo $DIR | awk -F '_' '{print $2}' | wc -m`
echo "start_string_count => $start_string_count ; end_string_count => $end_string_count"
if [ $start_string_count -eq 3 ] && [ $end_string_count -eq 3 ]; then
mkdir -p $NEWDIRPATH/"$DIR"_new
cp -r $DIR $NEWDIRPATH/"$DIR"_new
fi
fi
fi
done

Related

How to find specific files in folders and do an operation in case they exist

I am having some difficulty doing a basic script in sh shell.
What I want to do is simple though:
I want to do a sh script (can also be csh) that looks through a number of folders and for each folder that contains the files I am interested in, it should do a specific operation of pasting the corresponding filename into a sh script with rdseed commands.
The script I wrote in sh shell and doesn't work is:
for dir in EV*
do
echo $dir
cd $dir
if [ -f GEFLE* = true ];
then
set dataless = gur_ini_dataless.seed
for file GEFLE*
do
echo "rdseed -d -o 2 -f "$file " -g " $dataless >> runmseed2ahGEFLE.sh
done
else
echo "File does not exists"
fi
sleep 0.5
cd ..
done
Does anyone know a solution?
Please try this... I'm adding some comments to the lines...
#!/bin/sh
for dir in EV*
do
echo $dir
cd $dir
if [ -f GEFLE* ] # true if at least one FILE named "GEFLE*" exists
then
dataless=gur_ini_dataless.seed # no `set`, no spaces
for file in GEFLE* # will match all FILES/DIRS/... that start with "GEFLE"
do
echo "rdseed -d -o 2 -f $file -g $dataless" >> runmseed2ahGEFLE.sh # vars are substituted in double quoted strings
done
else
echo "File does not exists"
fi
cd ..
done
Please note this will only look one level deep into the directories. If you need some resursion you should better use something like
for dir in `find . -type d -name 'EV*'`; do
# ...
done
The way I had put this is:
for f in `find EV* -name GEFLE* -type f`; do
echo "rdseed -d -o 2 -f ./$f -g gur_ini_dataless.seed >> ./`dirname $f`/runmseed2ahGEFLE.sh"
done

Bash script to Rename multiple files in subfolder to their folder name

I have the following file structure:
Applications/Snowflake/applications/Salford_100/wrongname_120.nui; wrongname_200_d.nui
Applications/Snowflake/applications/Salford_900/wrongname_120.nui; wrongname_200_d.nui
Applications/Snowflake/applications/Salford_122/wrongname_120.nui; wrongname_200_d.nui
And I want to rename the fles to the same name as the directories they're in, but the files with "_d" at the end should retain its last 2 characters. The file pattern would always be "salford_xxx" where xxx is always 3 digits. So the resulting files would be:
Applications/Snowflake/applications/Salford_100/Salford_100.nui; Salford_100_d.nui
Applications/Snowflake/applications/Salford_900/Salford_900.nui; Salford_900_d.nui
Applications/Snowflake/applications/Salford_122/Salford_122.nui; Salford_122_d.nui
The script would run from a different location in
Applications/Snowflake/Table-updater
I imagine this would require a for loop and a sed regex, but Im open to any suggestions.
(Thanks #ghoti for your advice)
I've Tried this, which currently does not account for files with "_d" yet and I just get one file renamed correctly. Some help would be appreciated.
cd /Applications/snowflake/table-updater/Testing/applications/salford_*
dcomp="$(basename "$(pwd)")"
for file in *; do
ext="${file##*.}"
mv -v "$file" "$dcomp.$ext"
done
Ive now updated the script following #varun advice (thank you) and it now also searches through all files in the parent dir that contain salford in the name, missing out the parent name. Please see below
#!/bin/sh
#
# RenameToDirName2.sh
#
set -e
cd /Applications/snowflake/table-updater/Testing/Applications/
find salford* -maxdepth 1 -type d \( ! -name . \) -exec sh -c '(cd {} &&
(
dcomp="$(basename "$(pwd)")"
for file in *;
do ext="${file#*.}"
zz=$(echo $file|grep _d)
if [ -z $zz ]
then
mv -v "$file" "$dcomp.$ext"
else
mv -v "$file" "${dcomp}_d.$ext"
fi
done
)
)' ';'
The thing is, I've just realised that in these salford sub directories there are other files with different extensions that I don't want renaming. Ive tried putting in an else if statement to stipulate *.Nui files only, calling my $dcomp variable, like this
else
if file in $dcomp/*.nui
then
#continue...
But I get errors. Where should this go in my script and also do I have the correct syntax for this loop? Can you help?
You can write:
(
cd ../applications/ && \
for name in Salford_[0-9][0-9][0-9] ; do
mv "$name"/*_[0-9][0-9][0-9].nui "$name/$name.nui"
mv "$name"/*_[0-9][0-9][0-9]_d.nui "$name/${name}_d.nui"
done
)
(Note: the (...) is a subshell, to restrict the scope of the directory-change and of the name variable.)
#eggfoot,I have modified my script, which will look into all the directories in folder applications and look for for folders which have Salford in it.
So you can call my script like this
./rename.sh /home/username/Applications/Snowflake e/applications
#!/bin/bash
# set -x
path=$1
dir_list=$(find $path/ -type d)
for index_dir in $dir_list
do
aa=$(echo $index_dir|grep Salford)
if [ ! -z $aa ]
then
files_list=$(find $index_dir/ -type f)
for index in $files_list
do
xx=$(basename $index)
z=$(echo $xx|grep '_d')
if [ -z $z ]
then
result=$(echo $index | sed 's/\/\(.*\)\/\(.*\)\/\(.*\)\(\..*$\)/\/\1\/\2\/\2\4/')
mv "$index" "$result"
else
result=$(echo $index | sed 's/\/\(.*\)\/\(.*\)\/\(.*\)_d\(\..*$\)/\/\1\/\2\/\2_d\4/')
mv "$index" "$result"
fi
done
fi
done
Regarding sed, it uses the s command of sed and substitute the file name with directory name, keeping the extension as it is.
Regarding your script, you need to use grep command to find files which have _d and than you can use parameter substitution changing the mv for files with _d and one without _d.
dcomp="$(basename "$(pwd)")"
for file in *; do
ext="${file##*.}"
zz=$(echo $file|grep _d)
if [ -z $zz ]
then
mv -v "$file" "$dcomp.$ext"
else
mv -v "$file" "${dcomp}_d.$ext"
fi
done

Use datestring in a filename to create folder directory and move files

The script I'm trying to pull of should move files to a destination folder and place them in "year/month/" folders according to the files name which starts with YYYY-MM-DD.
Example:
2013-08-03-image_name.png -> ~/B/uploads/2013/08/2013-08-03-image_name.png
2012-01-01-image_name.png -> ~/B/uploads/2012/01/2012-01-01-image_name.png
Plan of action
(1) Set path variables
source=~/Desktop/A/
targetPath=~/Desktop/B/uploads/
(2) Perform these actions on each file in $source
cd "$source";
for i in *.png
do
# STEP 3
# STEP 4
done
(3) Step 3: Image Optimization √
(4) Step 4: File away files to directory that machtes datename
(4a) Search for datestring in filename via ^(\d{4})-(\d{2}) and create $datePath, c.f. datePath=2013/08/. I image this something like this…
awk -F … somehow put the regex here with a search and replace "-" into "/"
and save it as a variable.
(4b) Create new target directory if it doesn't exist and move files there.
targetDir=$targetPath$datePath
mkdir -p $targetDir
mv -v "$i" "$destination"
PS: Bash would be nice.
I am providing you solution for finding target path for your files in pure BASH:
f='2013-08-03-image_name.png'
targetPath=~/Desktop/B/uploads/
[[ "$f" =~ ^([0-9]{4})-([0-9]{2}) ]] && \
echo "$targetPath${BASH_REMATCH[1]}/${BASH_REMATCH[2]}/$f"
OUTPUT:
~/Desktop/B/uploads/2013/08/2013-08-03-image_name.png
I'd use find + egrep to filter, then sed to build the name of the destination directory.
cd /src
IMAGES=`find . -type f -name '*.png' -print | egrep '^./[0-9]{4}-[0-9]{2}-[0-9]{2}-.+.png$'`
for IMG in $IMAGES; do
# optimize here
DIR=`echo $IMG | sed -E 's/^\.\/([0-9]{4})-([0-9]{2})-[0-9]{2}-.+.png/\1\/\2/'`
mkdir -p /dest/$DIR
mv /src/$IMG /dest/$DIR/
done
I think you will find glob useful and might find some inspiration in this question
Here's another bash solution, without using a regex/match:
srcdir=<whatever>
destdir=<whatever>
cd "${srcdir}"
for f in *-*-*-*.png
do
{ IFS=- read y m rest
[[ -d "${destdir}/${y}/${m}" ]] || mkdir -p "${destdir}/${y}/${m}"
echo mv "${f}" "${destdir}/${y}/${m}/${f}"
} <<< "${f}"
done
The for f in ... pattern may need some adjusting, depending on what other stuff you have in your source directory...
Remove the echo from in front of mv if you're satisfied with the proposed set of commands the above produces (or just pipe the whole thing into a subshell .... | bash).

Bash - CD into Untared directory with variable URL

This is the situation. I have a list of URLs that I need to extract and setup. Its all variable driven, but after I extract, I dont know what my folder will be called. I cant CD into it if I dont know what its called.
$DL_DIR = /opt/
$URL = http://nginx.org/download/nginx-1.3.3.tar.gz
$FILE=${URL##*/}
$CONFIG = "-- core"
cd "$DL_DIR"
wget $URL
tar xzf $FILE
cd <HOW DO I GO INTO IT?>
./configure "$CONFIG"
make
make install
rm $FILE
If this doesnt explain it please say. I really want to get past this problem but Im having a hard time explaining it.
Since I want this to function for any set of URL's which may have two formats like ".tar.gz" or one format ".zip" and may have .'s in the filename like "Python2.3.4" or may not "Nginx", it makes it a bit tricky.
#! /bin/bash
#
# Problem:
# find the path of the "root" folder in an archive
#
# Strategy:
# list all folders in the archive.
# sort the list to make sure the shortest path is at the top.
# print the first line
#
# Weak point:
# assumes that tar tf and unzip -l will list files in a certain way
# that is: paths ending with / and that the file-list of unzip -l
# is in the fourth column.
#
LIST_FILES=
FILE=$1
case ${FILE##*.} in
gz)
LIST_FILES="tar tf $FILE"
;;
tgz)
LIST_FILES="tar tf $FILE"
;;
zip)
LIST_FILES='unzip -l '$FILE' | awk "{print \$4}"'
;;
esac
ARCHIVE_ROOT=$(
echo $LIST_FILES | sh |\
grep '/$'|\
sort |\
head -n1
)
# we should have what we need by now, go ahead and extract the files.
if [ -d "$ARCHIVE_ROOT" ]; then
cd "$ARCHIVE_ROOT"
else
# there is no path (whoever made the archive is a jerk)
# ...or the script failed (see weak points)
exit 1
fi
If you know that there is going to be exactly one directory in $DL_DIR, then you can use:
cd `ls -m1`
Another approach would be to loop through the files of the directory:
for filename in "$DL_DIR"/*
do
echo $filename
done;
You could perform file tests and other checks as necessary.
extract_dir=$(tar -tf $FILE | cut -d/ -f1 | uniq)
cd $extract_dir
or
extract_dir=$(tar -tf $FILE | head -1 | cut -d/ -f1)
cd $extract_dir
or
ls > .dir_list_1 # save current directory listing as hidden file
tar xzf $FILE # extract the $FILE
ls > .dir_list_2 # save the directory listing after extraction...
# ...as another hidden file
# diff two lists saved in hidden files, this will help you get the created dir
# grep '>' symbol, to get the inserted line
# use head to get the dir in case there are multiple lines (not necessary)
# use cut to remove the '>' and get the actual dir name, store in extract_dir
extract_dir=$(diff .dir_list_1 .dir_list_2 | grep '>' | head -1 | cut -d' ' -f2)
# remove temporary files
rm .dir_list_*
cd $extract_dir
I'd say, strip the extension of the file with ${FILE##*.} and do the other way around with the directory name using ${FILE%.ext*}:
case ${FILE##*.} in
gz)
tar xf $FILE
cd ${FILE%.tar.gz*}
;;
tgz)
tar xf $FILE
cd ${FILE%.tgz*}
;;
zip)
unzip $FILE
cd ${FILE%.zip*}
;;
esac
Just one small problem: how do you know if the directory in the archive has the same name af the archive itself?
How about this:
rm -rf tmpdir
mkdir tmpdir && cd tmpdir || exit
wget "$URL" || exit 1
case "$(ls)" in
*.tar.gz|*.tgz)
tar xzf $(ls)
;;
*.zip)
unzip $(ls)
;;
esac
for d in $(ls -d)
do
( cd "$d" 2>/dev/null && ./configure && make && make install; )
done

How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

I want to use regex's with Linux's find command to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargs command to do certain processing on all of the matching files. I can pipe the find output through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0 to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited find through the piped grep filters successfully; grep -Z didn't seem to help in that respect.
So I figured I'd just write a better regex for find and do away with the intermediary grep filters... perhaps sed would be an alternative?
In any case, for the following small sampling of directories...
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.
BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:
mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;
This command finds all of the .h, .c, and .cpp files...
find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"
...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using ls as a stand-in for what I actually want to do with the output):
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
So I can enhance that with the -print0 and -0 args to find and xargs:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:
$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney fred
...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for find and this is the best I could come up with:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.
Any ideas how I can make this work? This is the output I want:
$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h
...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.
Thanks in advance!
-Mark
This works for me:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
-not -path '*/deploy/*' -print0 | xargs -0 ls -L1d
Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.
The event not found is because ! is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.
Pop quiz: What characters are special inside of a single-quoted string in sh?
Answer: Only ' is special (it ends the string). That's the ultimate safety.
grep with -Z (sometimes known as --null) makes grep output terminated with a null character instead of newline. What you wanted was -z (sometimes known as --null-data) which causes grep to interpret a null character in its input as end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.
If you had done it this way:
find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld
Then the input and output of grep would have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cpp and started getting "mysteriously" excluded by your script.
Incidentally, here's a nicer way to generate your testcase file set.
while read -r file ; do
mkdir -p "${file%/*}"
touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA
Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.
Your command:
find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls
fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319
find does support pcre, so if you convert to pcre, this should work.