How to use '-prune' option of 'find' in sh? - regex

I don't quite understand the example given from the man find, can anyone give me some examples and explanations? Can I combine regular expression in it?
The more detailed question is like this:
Write a shell script, changeall, which has an interface like changeall [-r|-R] "string1" "string2". It will find all files with an suffix of .h, .C, .cc, or .cpp and change all occurrences of string1 to string2. -r is option for staying in current dir only or including subdir's.
NOTE:
For non-recursive case, ls is NOT allowed, we could only use find and sed.
I tried find -depth but it was NOT supported. That's why I was wondering if -prune could help, but didn't understand the example from man find.
EDIT2: I was doing assignment, I didn't ask question in great details because I would like to finish it myself. Since I already done it and hand it in, now I can state the whole question. Also, I managed to finish the assignment without using -prune, but would like to learn it anyway.

The thing I'd found confusing about -prune is that it's an action (like -print), not a test (like -name). It alters the "to-do" list, but always returns true.
The general pattern for using -prune is this:
find [path] [conditions to prune] -prune -o \
[your usual conditions] [actions to perform]
You pretty much always want the -o (logical OR) immediately after -prune, because that first part of the test (up to and including -prune) will return false for the stuff you actually want (ie: the stuff you don't want to prune out).
Here's an example:
find . -name .snapshot -prune -o -name '*.foo' -print
This will find the "*.foo" files that aren't under ".snapshot" directories. In this example, -name .snapshot makes up the [conditions to prune], and -name '*.foo' -print is [your usual conditions] and [actions to perform].
Important notes:
If all you want to do is print the results you might be used to leaving out the -print action. You generally don't want to do that when using -prune.
The default behavior of find is to "and" the entire expression with the -print action if there are no actions other than -prune (ironically) at the end. That means that writing this:
find . -name .snapshot -prune -o -name '*.foo' # DON'T DO THIS
is equivalent to writing this:
find . \( -name .snapshot -prune -o -name '*.foo' \) -print # DON'T DO THIS
which means that it'll also print out the name of the directory you're pruning, which usually isn't what you want. Instead it's better to explicitly specify the -print action if that's what you want:
find . -name .snapshot -prune -o -name '*.foo' -print # DO THIS
If your "usual condition" happens to match files that also match your prune condition, those files will not be included in the output. The way to fix this is to add a -type d predicate to your prune condition.
For example, suppose we wanted to prune out any directory that started with .git (this is admittedly somewhat contrived -- normally you only need to remove the thing named exactly .git), but other than that wanted to see all files, including files like .gitignore. You might try this:
find . -name '.git*' -prune -o -type f -print # DON'T DO THIS
This would not include .gitignore in the output. Here's the fixed version:
find . -name '.git*' -type d -prune -o -type f -print # DO THIS
Extra tip: if you're using the GNU version of find, the texinfo page for find has a more detailed explanation than its manpage (as is true for most GNU utilities).

Normally, the native way we do things in Linux, and the way we think, is from left to right.
You would go and write what you are looking for first:
find / -name "*.php"
Then, you hit ENTER and realize you are getting too many files from directories you wish not to.
So, you think "let's exclude /media to avoid searching mounted drives."
You should now just append the following to the previous command:
-print -o -path '/media' -prune
and the final command is:
find / -name "*.php" -print -o -path '/media' -prune
|<-- Include -->|<-- Exclude -->|
I think this structure is much easier and correlates to the right approach.

Beware that -prune does not prevent descending into any directory as some have said. It prevents descending into directories that match the test it's applied to. Perhaps some examples will help (see the bottom for a regex example). Sorry for this being so lengthy.
$ find . -printf "%y %p\n" # print the file type the first time FYI
d .
f ./test
d ./dir1
d ./dir1/test
f ./dir1/test/file
f ./dir1/test/test
d ./dir1/scripts
f ./dir1/scripts/myscript.pl
f ./dir1/scripts/myscript.sh
f ./dir1/scripts/myscript.py
d ./dir2
d ./dir2/test
f ./dir2/test/file
f ./dir2/test/myscript.pl
f ./dir2/test/myscript.sh
$ find . -name test
./test
./dir1/test
./dir1/test/test
./dir2/test
$ find . -prune
.
$ find . -name test -prune
./test
./dir1/test
./dir2/test
$ find . -name test -prune -o -print
.
./dir1
./dir1/scripts
./dir1/scripts/myscript.pl
./dir1/scripts/myscript.sh
./dir1/scripts/myscript.py
./dir2
$ find . -regex ".*/my.*p.$"
./dir1/scripts/myscript.pl
./dir1/scripts/myscript.py
./dir2/test/myscript.pl
$ find . -name test -prune -regex ".*/my.*p.$"
(no results)
$ find . -name test -prune -o -regex ".*/my.*p.$"
./test
./dir1/test
./dir1/scripts/myscript.pl
./dir1/scripts/myscript.py
./dir2/test
$ find . -regex ".*/my.*p.$" -a -not -regex ".*test.*"
./dir1/scripts/myscript.pl
./dir1/scripts/myscript.py
$ find . -not -regex ".*test.*" .
./dir1
./dir1/scripts
./dir1/scripts/myscript.pl
./dir1/scripts/myscript.sh
./dir1/scripts/myscript.py
./dir2

Adding to the advice given in other answers (I have no rep to create replies)...
When combining -prune with other expressions, there is a subtle difference in behavior depending on which other expressions are used.
#Laurence Gonsalves' example will find the "*.foo" files that aren't under ".snapshot" directories:-
find . -name .snapshot -prune -o -name '*.foo' -print
However, this slightly different short-hand will, perhaps inadvertently, also list the .snapshot directory (and any nested .snapshot directories):-
find . -name .snapshot -prune -o -name '*.foo'
According to the posix manpage, the reason is:
If the given expression does not contain any of the primaries -exec,
-ls, -ok, or -print, the given expression is effectively replaced by:
( given_expression ) -print
That is, the second example is the equivalent of entering the following, thereby modifying the grouping of terms:-
find . \( -name .snapshot -prune -o -name '*.foo' \) -print
This has at least been seen on Solaris 5.10. Having used various flavors of *nix for approx 10 years, I've only recently searched for a reason why this occurs.

I am no expert at this (and this page was very helpful along with http://mywiki.wooledge.org/UsingFind)
Just noticed -path is for a path that fully matches the string/path that comes just after find (. in theses examples) where as -name matches all basenames.
find . -path ./.git -prune -o -name file -print
blocks the .git directory in your current directory ( as your finding in . )
find . -name .git -prune -o -name file -print
blocks all .git subdirectories recursively.
Note the ./ is extremely important!! -path must match a path anchored to . or whatever comes just after find if you get matches with out it (from the other side of the or '-o') there probably not being pruned!
I was naively unaware of this and it put me of using -path when it is great when you don't want to prune all subdirectory with the same basename :D

find builds a list of files. It applies the predicate you supplied to each one and returns those that pass.
This idea that -prune means exclude from results was really confusing for me. You can exclude a file without prune:
find -name 'bad_guy' -o -name 'good_guy' -print // good_guy
All -prune does is alter the behavior of the search. If the current match is a directory, it says "hey find, that file you just matched, dont descend into it". It just removes that tree (but not the file itself) from the list of files to search.
It should be named -dont-descend.

Show everything including dir itself but not its long boring contents:
find . -print -name dir -prune

Prune is a "do not recurse at this file" switch (action).
From the man page
If -depth is not given, true;
if the file is a directory, do not descend into it.
If -depth is given, false; no effect.
Basically it will not descend into any sub directories.
Take this example:
You have the following directories:
% find home
home
home/test1
home/test1/test1
home/test2
home/test2/test2
find home -name test2 will print both the parent and the child directories named test2:
% find home -name test2
home/test2
home/test2/test2
Now, with -prune...
find home -name test2 -prune will print only /home/test2; it will not descend into /home/test2 to find /home/test2/test2:
% find home -name test2 -prune
home/test2

If you read all the good answers here my understanding now is that the following all return the same results:
find . -path ./dir1\* -prune -o -print
find . -path ./dir1 -prune -o -print
find . -path ./dir1\* -o -print
#look no prune at all!
But the last one will take a lot longer as it still searches out everything in dir1. I guess the real question is how to -or out unwanted results without actually searching them.
So I guess prune means don't decent past matches but mark it as done...
http://www.gnu.org/software/findutils/manual/html_mono/find.html
"This however is not due to the effect of the ‘-prune’ action (which only prevents further descent, it doesn't make sure we ignore that item). Instead, this effect is due to the use of ‘-o’. Since the left hand side of the “or” condition has succeeded for ./src/emacs, it is not necessary to evaluate the right-hand-side (‘-print’) at all for this particular file."

There are quite a few answers; some of them are a bit too much theory-heavy. I'll leave why I needed prune once so maybe the need-first/example kind of explanation is useful to someone :)
Problem
I had a folder with about 20 node directories, each having its node_modules directory as expected.
Once you get into any project, you see each ../node_modules/module. But you know how it is. Almost every module has dependencies, so what you are looking at is more like projectN/node_modules/moduleX/node_modules/moduleZ...
I didn't want to drown with a list with the dependency of the dependency of...
Knowing -d n / -depth n, it wouldn't have helped me, as the main/first node_modules directory I wanted of each project was at a different depth, like this:
Projects/MysuperProjectName/project/node_modules/...
Projects/Whatshisname/version3/project/node_modules/...
Projects/project/node_modules/...
Projects/MysuperProjectName/testProject/november2015Copy/project/node_modules/...
[...]
How can I get the first a list of paths ending at the first node_modules and move to the next project to get the same?
Enter -prune
When you add -prune, you'll still have a standard recursive search. Each "path" is analyzed, and every find gets spit out and find keeps digging down like a good chap. But it's the digging down for more node_modules what I didn't want.
So, the difference is that in any of those different paths, -prune will find to stop digging further down that particular avenue when it has found your item. In my case, the node_modules folder.

Related

Find command in shell with regular expression to find files with two extensions

I am trying to list generated log and zip files from my application server.
Files which are .log or .zip
These files include digits in their name. i.e. Files with any number of digits in their name
Files should be older than +5 days.
I used below expression. but looks something wrong. Could you please assist with regular expression?
ROOT_DIR=applications/jboss-as/servers/
find $ROOT_DIR -name '*[0-9]*[zip|log]' -mtime +5
Finally I wish to delete these files using command
find $ROOT_DIR -name '*[0-9]*[zip|log]' -mtime +5 -exec rm {} \;
The first command will find them and display.
find $ROOT_DIR ! -readable -prune -mtime +5 -type f | egrep -e "^.*\.(log|zip)$"
The second one will remove them all
find $ROOT_DIR ! -readable -prune -mtime +5 -type f | egrep -e "^.*\.(log|zip)$" | xargs -L 1 rm
You could do it this way (with most versions of find):
find "$ROOT_DIR" '(' -name '*[0-9]*.log' -o -name '*[0-9]*.zip' ')' -mtime +5 -exec rm {} +
The + is from POSIX 2008 and means "run the exec'd command with as many file names as convenient" whereas the older alternative ';' (or \;) means "run the exec'd command once per file name".
If you have GNU find, you can use various dialects of regular expression:
find "$ROOT_DIR" -regex '.*\.\(zip\|bz2\)' -mtime +5 -delete
This uses the default regex mode; you can use some alternatives to avoid using so many backslashes. The -delete option uses the unlink() system call rather than invoking an external command; it is more efficient, therefore.

How to use find and the prune option with an while loop

i've got an question about find, prune and print combined with an while loop. I want find every file named trace but not the files ending on mailed. Also i want to exclude the files in the lost+found directory. My idea was to use the following command:
find /opt/myTESTdir/ -iwholename '*lost+found' -prune -o -ctime +4 -type f -iname "*trace*" -not -iname "*.mailed*" -print0 | while read file ; do newfile=${file%.txt}".mailed" ; mv -v $file $newfile ; done
My question is now should this work or is there an syntax error? I've tried out the find command without everything behind the pipe and it seems, that's work correctly. But i'm not sure about the combination. I hope you could answer me :)
(Sorry for my bad english)
In while loop, it seems you are trying to rename files with extension .txt to .mailed. You can achieve the same using -exec option.
Try adding following portion to the end of your find command and remove piping to while loop.
-exec sh -c 'mv -f $0 ${0%.txt}.mailed' {} \;
Complete command would look like
find /opt/myTESTdir/ -iwholename '*lost+found' -prune -o -ctime +4 -type f -iname '*trace*' ! -iname '*.mailed*' -exec sh -c 'mv -f $0 ${0%.txt}.mailed' {} \;

Find & replace recursively except for certain files

With regards to this post, how would I exclude one or more files from applying the string replacement? By using the aforementioned post as an example, I would like to be able to replace "apples" with "oranges" in all descendant files of a given directory except, say, ./fpd/font/symbol.php.
My idea was using the -regex switch in the find command but unfortunately it does not have a -v option like the grep command hence I can't negate the regex to not match the files where the replacement must occur.
I use this in my Git repository:
grep -ilr orange . | grep -v ".git" | grep -e "\\.php$" | xargs sed -i s/orange/apple/g {}
It will:
Run find and replace only in files that actually have the word to be replaced;
Not process the .git folder;
Process only .php files.
Needless to say you can include as many grep layers you want to filter the list that is being passed to xargs.
Known issues:
At least in my Windows environment it fails to open files that have spaces in the path or name. Never figured that one out. If anyone has an idea of how to fix this I would like to know.
Haven't tested this but it should work:
find . -path ./fpd/font/symbol.php -prune -o -exec sed -i 's/apple/orange/g' {} \;
You can negate with ! (or -not) combined with -name:
$ find .
.
./a
./a/b.txt
./b
./b/a.txt
$ find . -name \*a\* -print
./a
./b/a.txt
$ find . ! -name \*a\* -print
.
./a/b.txt
./b
$ find . -not -name \*a\* -print
.
./a/b.txt
./b

Unix find with wildcard directory structure

I am trying to do a find where I can specify wildcards in the directory structure then do a grep for www.domain.com in all the files within the data directory.
ie
find /a/b/c/*/WA/*/temp/*/*/data -type f -exec grep -l "www.domain.com" {} /dev/null \;
This works fine where there is only one possible level between c/*/WA.
How would I go about doing the same thing above where there could be multiple levels between C/*/WA?
So it could be at
/a/b/c/*/*/WA/*/temp/*/*/data
or
/a/b/c/*/*/*/WA/*/temp/*/*/data
There is no defined number of directories between /c/ and /WA/; there could be multiple levels and at each level there could be the /WA/*/temp/*/*/data.
Any ideas on how to do a find such as that?
How about using a for loop to find the WA directories, then go from there:
for DIR in $(find /a/b/c -type d -name WA -print); do
find $DIR/*/temp/*/*/data -type f \
-exec grep -l "www.domain.com" {} /dev/null \;
done
You may be able to get all that in a single command, but I think clarity is more important in the long run.
Assuming no spaces in the paths, then I'd think in terms of:
find /a/b/c -name data -type f |
grep -E '/WA/[^/]+/temp/[^/]+/[^/]+/data' |
xargs grep -l "www.domain.com" /dev/null
This uses find to find the files (rather than making the shell do most of the work), then uses the grep -E (equivalent to egrep) to select the names with the correct pattern in the path, and then uses xargs and grep (again) to find the target pattern.

remove files when name does NOT contain some words

I am using Linux and intend to remove some files using shell.
I have some files in my folder, some filenames contain the word "good", others don't.
For example:
ssgood.wmv
ssbad.wmv
goodboy.wmv
cuteboy.wmv
I want to remove the files that does NOT contain "good" in the name, so the remaining files are:
ssgood.wmv
goodboy.wmv
How to do that using rm in shell? I try to use
rm -f *[!good].*
but it doesn't work.
Thanks a lot!
This command should do what you you need:
ls -1 | grep -v 'good' | xargs rm -f
It will probably run faster than other commands, since it does not involve the use of a regex (which is slow, and unnecessary for such a simple operation).
With bash, you can get "negative" matching via the extglob shell option:
shopt -s extglob
rm !(*good*)
You can use find with the -not operator:
find . -not -iname "*good*" -a -not -name "." -exec rm {} \;
I've used -exec to call rm there, but I wonder if find has a built-in delete action it does, see below.
But very careful with that. Note in the above I've had to put an -a -not -name "." clause in, because otherwise it matched ., the current directory. So I'd test thoroughly with -print before putting in the -exec rm {} \; bit!
Update: Yup, I've never used it, but there is indeed a -delete action. So:
find . -not -iname "*good*" -a -not -name "." -delete
Again, be careful and double-check you're not matching more than you want to match first.