Find a string in multiple files using grep - regex

I have a folder with sub-folders inside, all have many types of files. I want to search for a word inside the .css-files. I am using Windows 7 and I have grep.
How I can use grep to :
Find pattern and print it
Give file name (and path) if pattern found

Actually you don't need find. Just use:
grep -R --include=*.css -H pattern .
this will recurse and look for all *.css in subdirectories, while -H will show the filename.

find folder/ -name "*.css" |xargs grep "your-pattern"
You will need to install cygwin to do this.

if the files in which we have to look for, has pattern then we can use this.
Consider I'm looking for pattern "cardlayout" in files named chap1.lst chap2.lst and so on.
then the command
grep -e 'cardlayout' ` find . -name "chap??.lst"`
hope this would help

Related

How to use grep to find in a directory by a regex?

I tried
grep -R '.*invalidTemplateName.*' -regex './online_admin/.*/UTF-8/.*'
to find all occurences of possible mathces of the '.invalidTemplateName.' regex within a directory regex pattern './online_admin/.*/UTF-8/.*', but it doesn't work. I got the message:
grep: ./online_admin/.*/UTF-8/.*: No such file or directory
If I use
grep -R '.*invalidTemplateName.*' .
it looks up in all subdirectory of the current directory that's overwhelming. How can I specify a directory pattern in grep? Is it possible?
Find might be a better choice here:
find ./online_admin/*/UTF-8/* -type f -exec grep -H "invalidTemplateName" {} \;
Find will locate all files in the locations you want, including subdirs of UTF-8 and then execute grep on each file. the -H argument ensures the filename will be printed along with the match. If you want only the filename, use the -L switch instead.
with find you could do something like that:
find /abs/path/to/directory -maxdepth 1 -name '.*invalidTemplateName.*'
using the name argument you can directly filter by names. you can also use wildcards for the filter-string.
using the maxdepth argument you can specify the level of recursion to look up the files. 1 means to look up in /abs/path/to/directory, 2 means to look up in /abs/path/to/directory and in the first level of directories in /abs/path/to/directory as well.

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

List all files not starting with a number

I want to examine the all the key files present in my /proc. But /proc has innumerable directories corresponding to the running processes. I don't want these directories to be listed. All these directories' names contain only numbers. As I am poor in regular expressions, can anyone tell me whats the regex that I need to send to ls to make it NOT to search files/directories which have numbers in their name?
UPDATE: Thanks to all the replies! But I would love to have a ls alone solution instead of ls+grep solution. The ls alone solutions offered till now doesn't seem to be working!
You don't need grep, just ls:
ls -ad /proc/[^0-9]*
if you want to search the whole subdirectory structure use find:
find /proc/ -type f -regex "[^0-9]*" -print
All files and directories in /proc which do not contain numbers (in other words, excluding process directories):
ls -d /proc/[^0-9]*
All files recursively under /proc which do not start with a number:
find /proc -regex '.*/[0-9].*' -prune -o -print
But this will also exclude numeric files in subdirectories (for example /proc/foo/bar/123). If you want to exclude only the top-level files with a number:
find /proc -regex '/proc/[0-9].*' -prune -o -print
Hold on again! Doesn't this mean that any regular files created by touch /proc/123 or the like will be excluded? Theoretically yes, but I don't think you can do that. Try creating a file for a PID which does not exist:
$ sudo touch /proc/123
touch: cannot touch `/proc/123': No such file or directory
Use grep with -v which tells it to print all lines not matching the pattern.
ls /proc | grep -v '[0-9+]'
ls /proc | grep -v -E '[0-9]+'
Following regex matches all the characters except numbers
^[\D]+?$
Hope it helps !
For the sake of of completion. You may apply Mithandir's answer with find.
find . -name "[^0-9]*" -type f

Using non-consuming matches in Linux find regex

Here's my problem in a simplified scenario.
Create some test files:
touch /tmp/test.xml
touch /tmp/excludeme.xml
touch /tmp/test.ini
touch /tmp/test.log
I have a find expression that returns me all the XML and INI files:
[root#myserver] ~> find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)'
/tmp/test.ini
/tmp/test.xml
/tmp/excludeme.xml
I now want a way of modifying this -regex to exclude the excludeme.xml file from being included in the results.
I thought this should be possible by using/combining a non-consuming regex (?=expr) with a negated match (?!expr). Unfortunately I can't quite get the format of the command right, so my attempts result in no matches being returned. Here was one of my attempts (I've tried many different forms of this with different escaping!):
find /tmp -name -prune -o -regex '\(?=.*excludeme\.xml\).*\.\(xml\|ini\)'
I can't break down the command into multiple steps (e.g. piping through grep -v) as the find command is assumed as input into other parts of our tool.
This does what you want on linux:
find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)' \! -regex '.*excludeme\.xml'
I'm not sure if the "!" operator is unique to gnu find.
Not sure about what escapes you need or if lookarounds work, but these work for Perl:
/^(?!.*\/excludeme\.).*\.(xml|ini)$/
/(?<!\/excludeme)\.(xml|ini)$/
Edit - Just checked find command, best you can do with find is to change the regextype to -regextype posix-extended but that doesen't do stuff like look-arounds. The only way around this looks to be using some gnu stuff, either as #unholygeek suggests with find or piping find into gnu grep with the -P perl option. You can use the above regex verbatim if you go with a gnu grep. Something like find .... -print | xargs grep -P ...
Sorry, thats the best I can do.

How to use UNIX find to find (file1 OR file2)?

In the bash command line, I want to find all files that are named foo or bar. I tried this:
find . -name "foo\|bar"
but that doesn't work. What's the right syntax?
You want:
find . \( -name "foo" -o -name "bar" \)
See the wikipedia page (of all places)
I am cheap with find, I would use this:
find ./ | grep -E 'foo|bar'
Thats just my personal pref, I like grep more than find because the syntax is easier to 'get' and once you master it there are more uses than just walking file tree.