Character Classes for find command using emacs regex style - regex

For example, I want to find a file ends with numbers+.bed, it works if used this:
find . -regex ".*/[0-9]+\.bed"
As I know, find uses emacs regex style by default, so I found this page: http://www.gnu.org/software/emacs/manual/html_node/elisp/Char-Classes.html#Char-Classes and then try to use [[:digit:]] to replace [0-9], but none of these commands works for me.
find . -regex ".*/[:digit:]\.bed"
Or
find . -regex ".*/[[:digit:]]\.bed"
Does anybody have any ideas about where I was wrong in the command?
Thanks!

You need to use a different regex type. For example, try:
find . -regextype posix-extended -regex ".*/[[:digit:]]+\.bed"

I suspect that the find documentation is out of date: the regexp syntax accepted by Emacs has changed over the years, but find's apparently hasn't changed accordingly.

Related

Posix Extended Regex match with find Bash Linux [duplicate]

Am trying to do a simple file-name match with this below regex which I tested to be working from this page for a sample file-name ABC_YYYYMMDDHHMMSS.sha1
ABC_20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])([0-2][0-3])([0-5][0-9])([0-5][0-9])\.sha1
When I couple this in the -regex flag of find like
find . -type f -regex "ABC_20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])([0-2][0-3])([0-5][0-9])([0-5][0-9])\.sha1"
the command is not identifying the file present in the path (e.g ABC_20161231225950.sha1). Am aware of many existing regex-types from this page, but I realized my type is posix-extended and tried as below,
find . -type f -regextype posix-extended -regex 'ABC_20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])([0-2][0-3])([0-5][0-9])([0-5][0-9])\.sha1'
and still no result. I searched around some similar questions of this type, but they were involving giving the wrong regex leading to files not being found. In my case, though the regex is found to be proper. I need to know what am I missing here. Also possibly how to debug non matching issues when using -regex in find.
Note:- I could do some optimizations over the capturing groups in the regex, but that is not in the scope of the current question.
add .* at the start of your regex because you will always get something like ./ at start of path
find . -type f -regextype posix-extended -regex '.*ABC_20[0-9]{2}(0[1-9]|1[0-2])([0-2][0-9]|3[0-1])([0-2][0-3])([0-5][0-9])([0-5][0-9])\.sha1'

Find using regex with altenatives

I try to use find to match several alternative file patterns represented by certain numbers in the middle, but it returns an empty list. My actual pattern has a fixed beginning and variable numbers in the middle.
Reproducible example. Create a list of files
touch a10a a24b b12c a45d
Select a10a and a24b from the list using the following regex resulting in empty output
find . -regex '.*/a(10|45).*'
I expect that the issue should be easy to solve but I could not find a solution and could not figure out it. What did I miss?
system: ubuntu 16.04
The idea is right, but you need to type of the regex to use for find. Since you have alternate operator | here, you need to enable ERE (Extended Regular Expressions) support which you can do as below. The -regextype allows you to specify the regex flavor that you need for the requirement. Also the / part is optional if you have enabled a greedy match .*
find . -type f -regextype posix-extended -regex '.*/a(10|45).*'
From my version of GNU findutils, you could see from the man page
-regextype type
Changes the regular expression syntax understood by -regex and -iregex tests which occur later on the command line. Currently-implemented types are emacs (this is the default), posix-awk, posix-basic, posix-egrep and posix-extended.
Try specifying -regextype awk instead:
find . -regextype awk -regex '.*/a(10|45).*'
It seems you are using the wrong type of bracket. They should be square, not round.
The correct command should be:
find . -regex '.*/a[10|45].*'
Hope this helps!

What's the best way to use regex in searching for files named via some pattern in linux?

I have file names that are named thusly...
[phone_number]_[email]_[milliseconds].mp4
so:
2125551212_foo#blah.com_1378447385902.mp4
find takes a regex pattern (supposedly)
to look for files that start with 10 digits and end with mp4like this:
find ../media_pool -regex '^\d{10}.*mp4$'
However that returns nothing at all.
When I try it thusly:
find ../media_pool -regex 'mp4$'
it returns all the files that end with that extension... so, it *looks like it works with some subset of regex but not all.
Can someone point me to what the right way to get what I need would be? I'm happy not to use find if something else does the job better.
I am not an expert in Linux utilities but it seems you can specify the type of the regex used to match the pattern, anyway it seems that \d is not supported, try the following:
find ../media_pool -regextype posix-extended -regex '^[0-9]{10}.*mp4$'
I don't know if you need to quote posix-extended, that's for you to figure out.
Edit: Sorry for that, there was another problem. You don't need to change the engine type, by default find uses Emacs engine and I was able to look at the syntax supported.
find ../media_pool -regex '.*/[0-9]\{10\}.*mp4$'
The key is escaping the { and } eg. \{10\} and adding a .*/ to the start to match on the full path which find returns.
Took me a while to figure out that find matches the entire path, so you need the ".*/" at the beginning. The following is tested and works.
find . -regextype posix-extended -regex '.*/[0-9]{10}.*mp4$'
The default regular expression engine for find is Emacs, you can change it to something else by using the -regextype option
Here is an example using sed:
find . -regextype sed -regex ".*/[0-9]\{10\}.*mp4$"
there are most likely other solutions since several engines are supported. Another important thing to note is the .*/ at the beginning of the regular expression, find matches the entire path of a file so this will catch that.

repetition in GNU find regexp

I am trying to find all the files whose name contains exactly 14 digits (I'm trying to match a timestamp in the filename). I'm not sure how to get the GNU find regexp syntax for repetitions right.
I've tried find -regex ".*[0-9]{14} and find -regex ".*[0-9]\{14\}, neither of these turns up any results. Can you help me with the syntax?
remember, GNU find's -regex matches a whole path. Anyway, you can use a combination of find and grep to do the task, eg to find exactly 14 digits with no other characters
find . -type f -printf "%f\n" | grep -E "\b[0-9]{14}\b"
modify to suit your needs
Try changing the -regextype parameter to find.
Changes the regular expression syntax understood by -regex and -iregex
tests which occur later on the command line. Currently-implemented
types are emacs (this is the default), posix-awk, posix-basic,
posix-egrep and posix-extended.
Strange, I just gave it a try and I could not get this work. Here's a workaround anyway (matching 2 consecutive numbers):
$ls
a123.txt a1b2c3.txt a45.txt b123.txt
$find -regex '.*[^0-9][0-9][0-9][^0-9].*'
./a45.txt

regular expression to exclude filetypes from find

When using find command in linux, one can add a -regex flag that uses emacs regualr expressions to match.
I want find to look for all files except .jar files and .ear files. what would be the regular expression in this case?
Thanks
You don't need a regex here. You can use find with the -name and -not options:
find . -not -name "*.jar" -not -name "*.ear"
A more concise (but less readable) version of the above is:
find . ! \( -name "*.jar" -o -name "*.ear" \)
EDIT: New approach:
Since POSIX regexes don't support lookaround, you need to negate the match result:
find . -not -regex ".*\.[je]ar"
The previously posted answer uses lookbehind and thus won't work here, but here it is for completeness' sake:
.*(?<!\.[je]ar)$
find . -regextype posix-extended -not -regex ".*\\.(jar|ear)"
This will do the job, and I personally find it a bit clearer than some of the other solutions. Unfortunately the -regextype is required (cluttering up an otherwise simple command) to make the capturing group work.
Using a regular expression in this case sounds like an overkill (you could just check if the name ends with something). I'm not sure about emacs syntax, but something like this should be generic enough to work:
\.(?!((jar$)|(ear$)))
i.e. find a dot (.) not followed by ending ($) "jar" or (|) "ear".