Unix find with wildcard directory structure - regex

I am trying to do a find where I can specify wildcards in the directory structure then do a grep for www.domain.com in all the files within the data directory.
ie
find /a/b/c/*/WA/*/temp/*/*/data -type f -exec grep -l "www.domain.com" {} /dev/null \;
This works fine where there is only one possible level between c/*/WA.
How would I go about doing the same thing above where there could be multiple levels between C/*/WA?
So it could be at
/a/b/c/*/*/WA/*/temp/*/*/data
or
/a/b/c/*/*/*/WA/*/temp/*/*/data
There is no defined number of directories between /c/ and /WA/; there could be multiple levels and at each level there could be the /WA/*/temp/*/*/data.
Any ideas on how to do a find such as that?

How about using a for loop to find the WA directories, then go from there:
for DIR in $(find /a/b/c -type d -name WA -print); do
find $DIR/*/temp/*/*/data -type f \
-exec grep -l "www.domain.com" {} /dev/null \;
done
You may be able to get all that in a single command, but I think clarity is more important in the long run.

Assuming no spaces in the paths, then I'd think in terms of:
find /a/b/c -name data -type f |
grep -E '/WA/[^/]+/temp/[^/]+/[^/]+/data' |
xargs grep -l "www.domain.com" /dev/null
This uses find to find the files (rather than making the shell do most of the work), then uses the grep -E (equivalent to egrep) to select the names with the correct pattern in the path, and then uses xargs and grep (again) to find the target pattern.

Related

linux recursive copy specified files doesn't work

I want to recursive copy all the files which start with letters in directory data to directory test. So I wrote this:
find data -type f -exec grep '^[a-z]' {} \; -exec cp -f {} ./test \;
However, it also matched other files.
What's wrong with the code?
Your command isn't executing grep on filenames, but rather on the contents of those files.
You say:
copy all the files which start with letters in directory
which would use a find command that's matching filenames which requires the -name option. For example,
find data -type f -name '[a-z]*'
By using the -exec option to find, instead you're executing the provided command (grep '^[a-z]' {}) on every file that find finds in the data directory since there is no filename matching clause (-name).
The command you likely want is:
find data -type f -name '[a-z]*' -exec cp -f {} ./test \;

Find command in shell with regular expression to find files with two extensions

I am trying to list generated log and zip files from my application server.
Files which are .log or .zip
These files include digits in their name. i.e. Files with any number of digits in their name
Files should be older than +5 days.
I used below expression. but looks something wrong. Could you please assist with regular expression?
ROOT_DIR=applications/jboss-as/servers/
find $ROOT_DIR -name '*[0-9]*[zip|log]' -mtime +5
Finally I wish to delete these files using command
find $ROOT_DIR -name '*[0-9]*[zip|log]' -mtime +5 -exec rm {} \;
The first command will find them and display.
find $ROOT_DIR ! -readable -prune -mtime +5 -type f | egrep -e "^.*\.(log|zip)$"
The second one will remove them all
find $ROOT_DIR ! -readable -prune -mtime +5 -type f | egrep -e "^.*\.(log|zip)$" | xargs -L 1 rm
You could do it this way (with most versions of find):
find "$ROOT_DIR" '(' -name '*[0-9]*.log' -o -name '*[0-9]*.zip' ')' -mtime +5 -exec rm {} +
The + is from POSIX 2008 and means "run the exec'd command with as many file names as convenient" whereas the older alternative ';' (or \;) means "run the exec'd command once per file name".
If you have GNU find, you can use various dialects of regular expression:
find "$ROOT_DIR" -regex '.*\.\(zip\|bz2\)' -mtime +5 -delete
This uses the default regex mode; you can use some alternatives to avoid using so many backslashes. The -delete option uses the unlink() system call rather than invoking an external command; it is more efficient, therefore.

Find and delete all core files in a directory

Core files are generated when a program terminates abnormally. It consists the working memory of the system when the program exits abnormally. You can use a debugger with the generated core file to debug the program. The Challenge is:
Delete all core files from a directory (recursive search). Core files are quite huge in size and you may want to delete them to save memory
Make sure you don't delete any folder named core and some other filed named core which not actually a memory/system dump
After some searching on the internet, I found a nice piece of code to do this. Drawback is it asks you to recognize the core file to make sure its not some other file named core. Source : http://csnbbs.com/
Code:
find . -name core\* -user $USER -type f -size +1000000c -exec file {} \; -exec ls -l {} \; -exec printf "\n\ny to remove this core file\n" \; -exec /bin/rm -i {} \;
Please post if you have better solutions.
To delete all files matching to the regex "*.core" you can use:
find . -name "*.core" -type f -delete
find supports many filters like:
-size +1000000c # size > 1G
-user $USER # specific user
-mtime +3 # older than 3 days
if you are afraid for files ending with "core" that are not core files you can filter by file command piped to some other linux commands. for example -
find . -name "*.core" -type f -exec file {} \; | grep 'core file' | awk -F":" '{print $1}' | xargs -n1 -P4 rm -rf

Find & replace recursively except for certain files

With regards to this post, how would I exclude one or more files from applying the string replacement? By using the aforementioned post as an example, I would like to be able to replace "apples" with "oranges" in all descendant files of a given directory except, say, ./fpd/font/symbol.php.
My idea was using the -regex switch in the find command but unfortunately it does not have a -v option like the grep command hence I can't negate the regex to not match the files where the replacement must occur.
I use this in my Git repository:
grep -ilr orange . | grep -v ".git" | grep -e "\\.php$" | xargs sed -i s/orange/apple/g {}
It will:
Run find and replace only in files that actually have the word to be replaced;
Not process the .git folder;
Process only .php files.
Needless to say you can include as many grep layers you want to filter the list that is being passed to xargs.
Known issues:
At least in my Windows environment it fails to open files that have spaces in the path or name. Never figured that one out. If anyone has an idea of how to fix this I would like to know.
Haven't tested this but it should work:
find . -path ./fpd/font/symbol.php -prune -o -exec sed -i 's/apple/orange/g' {} \;
You can negate with ! (or -not) combined with -name:
$ find .
.
./a
./a/b.txt
./b
./b/a.txt
$ find . -name \*a\* -print
./a
./b/a.txt
$ find . ! -name \*a\* -print
.
./a/b.txt
./b
$ find . -not -name \*a\* -print
.
./a/b.txt
./b

Why is this pattern search hanging?

I am running Linux CentOs and i am trying to find some malicious code in my wordpress installation with this command:
grep -r 'php \$[a-zA-Z]*=.as.;' * |awk -F : '{print $1}'
When I hit enter, the process just hangs...I want to double check that I have the syntax right and all I have to do is wait?
How Can I get some sort of feedback/something happening while its searching?
Thanks
Instead of using grep -r to recursively grep, one option is to use find to get the list of filenames, and feed them to grep one at a time. That lets you add other commands alongside the grep, such as echos. For example, you could create a script called is-it-malware.sh that contains this:
#!/bin/bash
if grep 'php \$[a-zA-Z]*=.as.;' "$1" >/dev/null
then
"!!! $1 is malware!!!"
else
" $1 is fine."
fi
and run this command:
find -type f -exec ./is-it-malware.sh '{}' ';'
to run your script over every file in the current directory and all of its subdirectories (recursively).
Its probably taking its time due to the -r * (recursively, all files/dirs)?
Consider
find -type f -print0 | xargs -0trn10 grep -l 'php \$[a-zA-Z]*=.as.;'
which will process the files in batches of (max) 10, and printing those commands as it goes.
Of course, like that you can probably optimize the heck out of it, with a simple measure like
find -type f -iname '*.php' -print0 | xargs -0trn10 grep -l 'php \$[a-zA-Z]*=.as.;'
Kind of related:
You can do similar things without find for smaller trees, with recent bash:
shopt -s globstar
grep -l 'pattern' **/*.php