regextype with find command - regex

I am trying to use the find command with -regextype but it could not able to work properly.
I am trying to find all c and h files send them to pipe and grep the name, func_foo inside those files. What am I missing?
$ find ./ -regextype sed -regex ".*\[c|h]" | xargs grep -n --color func_foo
Also in a similar aspect I tried the following command but it gives me an error like paths must precede expression:
$ find ./ -type f "*.c" | xargs grep -n --color func_foo

The accepted answer contains some inaccuracies.
On my system, GNU find's manpage says to run find -regextype help to see the list of supported regex types.
# find -regextype help
find: Unknown regular expression type 'help'; valid types are 'findutils-default', 'awk', 'egrep', 'ed', 'emacs', 'gnu-awk', 'grep', 'posix-awk', 'posix-basic', 'posix-egrep', 'posix-extended', 'posix-minimal-basic', 'sed'.
E.g. find . -regextype egrep -regex '.*\.(c|h)' finds .c and .h files.
Your regexp syntax was wrong, you had square brackets instead of parentheses. With square brackets, it would be [ch].
You can just use the default regexp type as well: find . -regex '.*\.\(c\|h\)$' also works. Notice that you have to escape (, |, ) characters in this case (with sed regextype as well). You don't have to escape them when using egrep, posix-egrep, posix-extended.

Why not just do:
find ./ -name "*.[c|h]" | xargs grep -n --color func_foo
and
find ./ -type f -name "*.c" | xargs grep -n --color func_foo
Regarding the valid paramters to find's option -regextype this comes verbatim from man find:
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on
the command line. Currently-implemented types are emacs (this is the default),
posix-awk, posix-basic, posix-egrep and posix-extended
There is no type sed.

Related

Recursively find filenames of exactly 8 hex characters, but not all 0-9, no lookahead (Mac terminal, bash)

I'm trying to write a regex to find files recursively with Mac Terminal (bash, not zsh even though Catalina wants me to switch over for whatever reason) using the find command. I'm looking for files that are:
Exactly 8 hexadecimal digits (0-9 and A-F)
But NOT only decimal digits (0-9)
In other words, it would match A1234567, ABC12DEF, 12345ABC, and ABCDABCD, but not 12345678 or 09876543.
To find files that are exactly 8 hex digits, I've used this:
find -E . -type f -regex '.*/[A-F0-9]{8}'
The .*/ is necessary to allow the full path name to precede the filename. This is eventually going to get fed to rm, so I have to keep the path.
It SEEMS like this should work to fulfill both of my requirements:
find -E . -type f -regex '.*/(?![0-9]{8})[A-F0-9]{8}'
But that returns an error:
find: -regex: .*/(?![0-9]{8})[A-F0-9]{8}: repetition-operator operand invalid
It seems like the find command doesn't support lookaheads. How can I do this without one?
With any POSIX-compliant find
find . -type f \
-name '????????' \
! -name '*[![:xdigit:]]*' \
-name '*[![:digit:]]*'
And if you insist on using regexps for this, here you go
find -E . -type f \
-regex '.*/[[:xdigit:]]{8}' \
! -regex '.*/[[:digit:]]*'
Those who use GNU find should drop -E and insert -regextype posix-extended after paths to make this work.
It's probably easiest to just filter out the results you don't like:
find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
$ find -E . -type f -regex '.*/[A-F0-9]{8}' -print | egrep -v '.*/[0-9]{8}$'
./01234567
./ABCDEFAF
./ABCDEF01
./ABCDEF2A
./ABCDEFA2
./x/01234567
./x/ABCDEFAF
./x/ABCDEF01
./x/ABCDEF2A
./x/ABCDEFA2
My find didn't understand -E and was inexplicably grumpy about -regex in general, but this still worked:
find . -type f -name '[A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9][A-F0-9]' -a -name '*[A-F]*'
Not as elegant as oguz ismail's, but easier to read for my clogged brain, lol

pattern to match multiple filenames with find utility

How to find multiple filenames with the bash find command?
$ find /path/* -type f -name pattern
The pattern should match a list of file names:
fname1.jpg
fname2.png
myfile.css
example.gif
I tryed with
https://alvinalexander.com/linux-unix/linux-find-multiple-filenames-patterns-command-example
find multiple filenames command: finding three filename extensions
find . -type f \( -name "*cache" -o -name "*xml" -o -name "*html" \)
and it works.
Anyway I think it would be cleaner with a -name pattern, rather than with a list of -names.
from
$ man find
-name pattern
I m searching for something like: -name '[fname2.png|myfile.css|example.gif ]'
-regex alternative would look as follows:
find . -type f -regextype posix-egrep -regex ".+\.(jpg|png|css)$"
As for -name option:
-name pattern - Base of file name (the path with the leading
directories removed) matches shell pattern.
Shell pattern is not a full-fledged regex pattern.
Just mix them:
find -name "aoc*" -regextype awk -regex ".*[0-9].(class|scala)"
This searches for files, matching shell-pattern aoc* and end in number, with ending .class or .scala.
For your example:
find -name "fname*" -regextype awk -regex ".*[0-9].(png|jpg|css)"
Available types are listet with:
find -regextype -help
However, I first tried "-regextype sed" which is available, but sed itself has options, changing the styles of regexes. And patterns I used to use with sed didn't work, but since the pattern works with awk, it's sufficient for me.

Regex: Find files not ending with numeral suffix

I need to make a command which returns all files without numeral suffix (*.0, *.123, ...)
Have for example three files:
gg.p qqq.449 rtr55
I want to find only these:
./rtr55
./gg.p
I tried to find them using grep. However I got only results with no effect.
find -type f | grep -v '\.[0-9]+$'
(This command returned:)
./qqq.449
./rtr55
./gg.p
So there is probably some regex format error. Do you know, how to fix it?
The + operator belongs to the extended regular expressions. There are many workarounds:
find -type f | grep -v '\.[0-9]\+$'
find -type f | egrep -v '\.[0-9]+$'
find -type f | grep -E -v '\.[0-9]+$'
find -type f | grep -v '\.[0-9][0-9]*$'
Why would you use grep at all?
find -regex '.*\.[0-9][0-9]*' -prune -o -type f
If your expressions are simple enough (or your find doesn't support -regex), you could use -name instead of -regex but a glob wildcard can't capture an arbitrary amount of numbers after the dot. Here's one or two:
find -name '*.[0-9]' -prune -o -name '*.[0-9][0-9]' -prune -o -type f
Notice that this isn't purely an efficiency question; grep would simply not do the right thing if you ever come across file names with newlines in them.

Regular expression in bash not working

Is there any way in bash so that I can match the patter like that
[0-9]{8}.*.jpg
I have written the above for the following pattern match
"First 8 character should be digit and rest of them would be anything and end with .jpg"
but the above is not working. if I write in the below manner it's working
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].*.jpg
Now suppose I want first 20 character must be digit should I repeat the [0-9] 20 times.. I think there is a better solution available which i don't know...
If anyone know please help....
You can use the regex in find:
find test -regextype posix-extended -regex "^[0-9]{8}.*.jpg$"
Test
$ touch test/12345678aaa.jpg
$ touch test/1234567aaa.jpg
$ find test -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
And if it is related to the previous question, you can use:
for file in $(find test -regextype posix-extended -regex ".*/[0-9]{8}.*")
do
echo "my file is $file"
done
If you create directories and files in them, more matchings can appear:
$ mkdir test/123456789.dir
$ touch test/123456789.dir/1234567890.jpg
You can filter by -type f, so that you just get files:
$ find test -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
test/123456789.dir/1234567890.jpg
And/or specify the depth of the find, so that it does not contain subdirectories:
$ find test -maxdepth 1 -type f -regextype posix-extended -regex ".*/[0-9]{8}.*"
test/12345678aaa.jpg
It looks like you're trying to generate a list of filenames from a regular expression. You can do that, but not directly from Bash as far as I know. Instead, use find:
find -E . -regex '.*/[0-9]{8}.*\.jpg' -depth 1
Something like that works on my Mac OS X system; on Linux the . for current directory is optional, or you can specify a different directory to search in. I added -depth 1 to avoid descending into subdirectories.
A bit late answer.
Bash's filename exapnsion patterns ( called globbing ) has it's own rules. They're exists in two forms:
simple globbing
extended globbing (if you have enabled shopts -s extglob
You can read about the both rules for example here. (3.5.8.1 Pattern Matching)
You should remember, the globbing rules aren't the traditional regular expressions (as you probably know for grep or sed and such), and especially they're not the perl's (extended) regular expressions.
So, if you want use filename expansion (aka globbing) you're stuck with the above two (simple/extended) pattern rules. Of course, bash knows regular expressions, but not for filename-expansion (globbing).
So, you can for example do the next:
shopt -s globstar #if you haven't already enabled - for the ** expansion
regex="[0-9]{8}.*\.jpg"
for file in ./**/*.jpg #will match all *.jpg recusrively (globstar)
do
#try the regex matching
[[ $file =~ $regex ]] || continue #didn't match
#matched! - do something with the file
echo "the $file has at least 8 digits"
done
or you can use, the find command with the built-in regex matching rules (see other answers), or the grep with perl-like regexes, such:
find somewhere -type f -name \*.jpg -maxdepth 1 -print0 | grep -zP '/\d{8}.*.jpg'
The speed: for the large trees the find is faster. At least on my notebook, where:
while IFS= read -d $'\0' -r file
do
echo "$file"
done < <(find ~/Pictures -name \*.JPG -print0 | grep -zP 'P\d{4}.*\.JPG')
runs real 0m1.593s, and the
regex="P[0-9]{4}.*\.JPG"
for file in ~/Pictures/**/*.JPG
do
[[ $file =~ $regex ]] || continue #didn't match
echo "$file"
done
runs real 0m3.628s seconds.
On the small trees, IMHO is better to use the builting bash regexes. (maybe, I prefer it because i like the ./**/*.ext expansion, and got all filenames correctly inside the variable, regardless of spaces and like, without the care about the -print0 and read -d $'\0; and such...)

Regexp for matching filenames

I have a files:
first.error.log
second1.log
second2.log
FFFpc.log
TR.den.log
bla.error.log
and I would like to make a pattern that will match all files with error inside of filenames + few additional ones but no more:
For a sole error it would be
$FILE_PATTERN="*.error*"
But what if I want to match not only those errors but also all second and FFpc etc?
This does not work:
$FILE_PATTERN="*.error*|^second.*\log$|.*FFPC\.log$"
Thanks in advance for your help
EDIT:
$FILE_PATTERN is later used by:
find /somefolder -type f -name $FILE_PATTERN
EDIT: THIS FILE_PATTERN is in property file that is later used by bash script.
You need to use find with -regex option:
find -E /somefolder -type f -regex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
PS: Use -iregex for ignore case matching:
find -E /somefolder -type f -iregex '\./(.*\.error.*|second.*log|.*FFPC\.log)$'
$ ls | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
bla.error.log
FFFpc.log
first.error.log
second1.log
second2.log
If you wanted to use with find
find /somefolder -type f | grep -i '\(.*error.*\)\|\(^second.*\log$\)\|\(.*FFPC\.log$\)'
If you're in bash I'm assuming you have to grep. Using grep -E or egrep will allow you to use alternation (ORing your searches)
$ stat * | egrep "(error|second)"
File: `first.error.log'
File: `second1.log'
File: `second2.log'
You could use ls instead of stat but sometimes ls will not give you what you predicted. But considering you're only search for filenames, ls should suffice.
$ ls | egrep "(error|second)"
first.error.log
second1.log
second2.log
You can use command substitution to store the output into a bash variable:
FILE_PATTERN=$(ls | egrep "(error|second)")
FILE_PATTERN=("*.error*" "second.*log" ".*FFPC.log")
ARGS=(-name "$FILE_PATTERN")
for F in "${FILE_PATTERN[#]:2}"; do
ARGS+=(-o -name "$F")
done
find /somefolder -type f '(' "${ARGS[#]}" ')'
You were close, theres just a few misplaced symbols.
Here's what I came up with:
.*\.error\..*|^second.*\.log$|.*FF[Pp][Cc]\.log$
here's a demo of a working modification of your regex:
http://regex101.com/r/rL3rM1/1