Regexp for matching filenames - regex

I have a files:
first.error.log
second1.log
second2.log
FFFpc.log
TR.den.log
bla.error.log
and I would like to make a pattern that will match all files with error inside of filenames + few additional ones but no more:
For a sole error it would be
$FILE_PATTERN="*.error*"
But what if I want to match not only those errors but also all second and FFpc etc?
This does not work:
$FILE_PATTERN="*.error*|^second.*\log$|.*FFPC\.log$"
Thanks in advance for your help
EDIT:
$FILE_PATTERN is later used by:
find /somefolder -type f -name $FILE_PATTERN
EDIT: THIS FILE_PATTERN is in property file that is later used by bash script.

You need to use find with -regex option:
find -E /somefolder -type f -regex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
PS: Use -iregex for ignore case matching:
find -E /somefolder -type f -iregex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'

$ ls | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
bla.error.log
FFFpc.log
first.error.log
second1.log
second2.log
If you wanted to use with find
find /somefolder -type f | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'

If you're in bash I'm assuming you have to grep. Using grep -E or egrep will allow you to use alternation (ORing your searches)
$ stat * | egrep "(error|second)"
File: `first.error.log'
File: `second1.log'
File: `second2.log'
You could use ls instead of stat but sometimes ls will not give you what you predicted. But considering you're only search for filenames, ls should suffice.
$ ls | egrep "(error|second)"
first.error.log
second1.log
second2.log
You can use command substitution to store the output into a bash variable:
FILE_PATTERN=$(ls | egrep "(error|second)")

FILE_PATTERN=("*.error*" "second.*log" ".*FFPC.log")
ARGS=(-name "$FILE_PATTERN")
for F in "${FILE_PATTERN[#]:2}"; do
ARGS+=(-o -name "$F")
done
find /somefolder -type f '(' "${ARGS[#]}" ')'

You were close, theres just a few misplaced symbols.
Here's what I came up with:
.*\.error\..*|^second.*\.log$|.*FF[Pp][Cc]\.log$
here's a demo of a working modification of your regex:
http://regex101.com/r/rL3rM1/1

Related

Regex: Find files not ending with numeral suffix

I need to make a command which returns all files without numeral suffix (*.0, *.123, ...)
Have for example three files:
gg.p qqq.449 rtr55
I want to find only these:
./rtr55
./gg.p
I tried to find them using grep. However I got only results with no effect.
find -type f | grep -v '\.[0-9]+$'
(This command returned:)
./qqq.449
./rtr55
./gg.p
So there is probably some regex format error. Do you know, how to fix it?
The + operator belongs to the extended regular expressions. There are many workarounds:
find -type f | grep -v '\.[0-9]\+$'
find -type f | egrep -v '\.[0-9]+$'
find -type f | grep -E -v '\.[0-9]+$'
find -type f | grep -v '\.[0-9][0-9]*$'
Why would you use grep at all?
find -regex '.*\.[0-9][0-9]*' -prune -o -type f
If your expressions are simple enough (or your find doesn't support -regex), you could use -name instead of -regex but a glob wildcard can't capture an arbitrary amount of numbers after the dot. Here's one or two:
find -name '*.[0-9]' -prune -o -name '*.[0-9][0-9]' -prune -o -type f
Notice that this isn't purely an efficiency question; grep would simply not do the right thing if you ever come across file names with newlines in them.

Finding file names without a specified character

Is there a good regex to find all of the files that do not contain a certain character? I know there are lots to find lines containing matches, but I want something that will find all files that do not contain my match.
Using ls and sed to replace all filenames with no extension (i.e. not containing a .) with NoExtension:
ls | sed -e 's/^[^.]*$/NoExtension/g'
replacing filenames that have an extension with their extension:
ls | sed -e 's/^[^.]*$/NoExtension/g' -e 's/.*\.\(.*\)/\1/'
for bash - to list all files in a directory-:
shopt -s extglob
ls !(*.*)
The extglob setting is required to enable to ! which negates the . argument to ls.
You should discard all the answers that parse the output of ls read here for why. The tool find is perfect for this.
# Show files in cwd
$ ls
file file.txt
# Find the files with an extension
$ find -type f -regex '.*/.*\..*$'
./file.txt
# Invert the match using the -not option
$ find -type f -not -regex '.*/.*\..*$'
./file
And an awk solution, for good measure.
ls | awk '$0 !~ /\..+$/{a++}END{print a}'
This might work for you (find, GNU sed & wc):
find . -type f | sed -rn '\|.*/\.?[^.]+$|w NoExtensions' && wc -l NoExtensions
This gives you a count and a list.
N.B. dot files without extensions are included.

regextype with find command

I am trying to use the find command with -regextype but it could not able to work properly.
I am trying to find all c and h files send them to pipe and grep the name, func_foo inside those files. What am I missing?
$ find ./ -regextype sed -regex ".*\[c|h]" | xargs grep -n --color func_foo
Also in a similar aspect I tried the following command but it gives me an error like paths must precede expression:
$ find ./ -type f "*.c" | xargs grep -n --color func_foo
The accepted answer contains some inaccuracies.
On my system, GNU find's manpage says to run find -regextype help to see the list of supported regex types.
# find -regextype help
find: Unknown regular expression type 'help'; valid types are 'findutils-default', 'awk', 'egrep', 'ed', 'emacs', 'gnu-awk', 'grep', 'posix-awk', 'posix-basic', 'posix-egrep', 'posix-extended', 'posix-minimal-basic', 'sed'.
E.g. find . -regextype egrep -regex '.*\.(c|h)' finds .c and .h files.
Your regexp syntax was wrong, you had square brackets instead of parentheses. With square brackets, it would be [ch].
You can just use the default regexp type as well: find . -regex '.*\.\(c\|h\)$' also works. Notice that you have to escape (, |, ) characters in this case (with sed regextype as well). You don't have to escape them when using egrep, posix-egrep, posix-extended.
Why not just do:
find ./ -name "*.[c|h]" | xargs grep -n --color func_foo
and
find ./ -type f -name "*.c" | xargs grep -n --color func_foo
Regarding the valid paramters to find's option -regextype this comes verbatim from man find:
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on
the command line. Currently-implemented types are emacs (this is the default),
posix-awk, posix-basic, posix-egrep and posix-extended
There is no type sed.

find all files except e.g. *.xml files in shell

Using bash, how to find files in a directory structure except for *.xml files?
I'm just trying to use
find . -regex ....
regexe:
'.*^((?!xml).)*$'
but without expected results...
or is there another way to achieve this, i.e. without a regexp matching?
find . ! -name "*.xml" -type f
find . -not -name '*.xml'
Should do the trick.
Sloppier than the find solutions above, and it does more work than it needs to, but you could do
find . | grep -v '\.xml$'
Also, is this a tree of source code? Maybe you have all your source code and some XML in a tree, but you want to only get the source code? If you were using ack, you could do:
ack -f --noxml
with bash:
shopt -s extglob globstar nullglob
for f in **/*!(.xml); do
[[ -d $f ]] && continue
# do stuff with $f
done
You can also do it with or-ring as follows:
find . -type f -name "*.xml" -o -type f -print
Try something like this for a regex solution:
find . -regextype posix-extended -not -regex '^.*\.xml$'

Using grep to search files provided by find: what is wrong with find . | xargs grep '...'?

When I use the command:
find . | xargs grep '...'
I get the wrong matches. I'm trying to search for the string ... in all files in the current folder.
As Andy White said, you have to use fgrep in order to match for plain ., or escape the dots.
So you have to write (-type f is to only have the files : you obviously don't want the directories.) :
find . -type f | xargs fgrep '...'
or if you still want to use grep :
find . -type f | xargs grep '\.\.\.'
And if you only want the current directory and not its subdirs :
find . -maxdepth 1 -type f | xargs fgrep '...'
'.' matches any character, so you'll be finding all lines that contain 3 or more characters.
You can either escape the dots, like this:
find . | xargs grep '\.\.\.'
Or you can use fgrep, which does a literal match instead of a regex match:
find . | xargs fgrep '...'
(Some versions of grep also accept a -F flag which makes them behave like fgrep.)
#OP, if you are looking for files that contain ...,
grep -R "\.\.\." *
If you're looking for a filename that matches, try:
find . -name "filename pattern"
or
find . | grep "filename pattern"
If your looking for looking for files that match (ie it contains the grep string)
find . | xargs grep "string pattern"
works fine. or simply:
grep "string pattern" -R *
If you are literally typing grep '...' you'll match just about any string. I doubt you're actually typing '...' for your grep command, but if you are, the ... will match any three characters.
Please post more info on what you're searching for, and maybe someone can help you out more.
To complete Jeremy's answer, you may also want to try
find . -type f | xargs grep 'your_pattern'
or
find . -type f -exec grep 'your_pattern' {} +
Which is similar to a xargs
I might add : RTFM ! Or in a more polite way : use & abuse of
man command
!