Using non-consuming matches in Linux find regex - regex

Here's my problem in a simplified scenario.
Create some test files:
touch /tmp/test.xml
touch /tmp/excludeme.xml
touch /tmp/test.ini
touch /tmp/test.log
I have a find expression that returns me all the XML and INI files:
[root#myserver] ~> find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)'
/tmp/test.ini
/tmp/test.xml
/tmp/excludeme.xml
I now want a way of modifying this -regex to exclude the excludeme.xml file from being included in the results.
I thought this should be possible by using/combining a non-consuming regex (?=expr) with a negated match (?!expr). Unfortunately I can't quite get the format of the command right, so my attempts result in no matches being returned. Here was one of my attempts (I've tried many different forms of this with different escaping!):
find /tmp -name -prune -o -regex '\(?=.*excludeme\.xml\).*\.\(xml\|ini\)'
I can't break down the command into multiple steps (e.g. piping through grep -v) as the find command is assumed as input into other parts of our tool.

This does what you want on linux:
find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)' \! -regex '.*excludeme\.xml'
I'm not sure if the "!" operator is unique to gnu find.

Not sure about what escapes you need or if lookarounds work, but these work for Perl:
/^(?!.*\/excludeme\.).*\.(xml|ini)$/
/(?<!\/excludeme)\.(xml|ini)$/
Edit - Just checked find command, best you can do with find is to change the regextype to -regextype posix-extended but that doesen't do stuff like look-arounds. The only way around this looks to be using some gnu stuff, either as #unholygeek suggests with find or piping find into gnu grep with the -P perl option. You can use the above regex verbatim if you go with a gnu grep. Something like find .... -print | xargs grep -P ...
Sorry, thats the best I can do.

Related

How to delete files based on the extension in MacOS terminal using regex?

I need to delete a huge amount of .zip and .apk files from my project's root folder I'd like to do it using the bash terminal (MacOS X).
So far I've successfully made it with two commands:
$ find . -name \*.zip -delete
$ find . -name \*.apk -delete
But I want to do it in one using regex:
$ find . -regex '\w*.(apk|zip)' -delete
But this regular expression doesn't seem to work because it's deleting anything... what am I doing wrong?
MORE INFO:
An example of what I want to delete is android~1~1~sampleproject.zip.
$ find -E . -regex './[~a-zA-Z0-9]+\.(apk|zip)' -delete
The find tries to match the whole file name. So it is necessary to start the regex with ./
I believe find doesn't support \w \d etc. So replace them with character class. But find doesn't support them as well so you need to add -E to enable extended regular expressions.
-E Interpret regular expressions followed by -regex and -iregex primaries as extended (modern) regular expres-
sions rather than basic regular expressions (BRE's). The re_format(7) manual page fully describes both for-
mats.
Example
For example consider the following commands
$ ls *.json
bower.json composer.json package.json
$ find -E . -regex "\./[a-zA-Z0-9]+\.(json)"
./bower.json
./composer.json
./package.json
Note The above answer is specifically for BSD find. If you are using GNU find, it won't support -E option, instead it support -regextype posix-extended. I can rewrite the above example as
$ find . -regextype posix-extended -regex "\./\w+\.(json)"
I would use:
find . -type f \( -name "*.zip" -o -name "*.apk" \) -delete

Recursive find and replace based on regex

I have changed up my director structure and I want to do the following:
Do a recursive grep to find all instances of a match
Change to the updated location string
One example (out of hundreds) would be:
from common.utils import debug --> from etc.common.utils import debug
To get all the instances of what I'm looking for I'm doing:
$ grep -r 'common.' ./
However, I also need to make sure common is preceded by a space. How would I do this find and replace?
It's hard to tell exactly what you want because your refactoring example changes the import as well as the package, but the following will change common. -> etc.common. for all files in a directory:
sed -i 's/\bcommon\./etc.&/' $(egrep -lr '\bcommon\.' .)
This assumes you have gnu sed available, which most linux systems do. Also, just to let you know, this will fail if there are too many files for sed to handle at one time. In that case, you can do this:
egrep -lr '\bcommon\.' . | xargs sed -i 's/\bcommon\./etc.&/'
Note that it might be a good idea to run the sed command as sed -i'.OLD' 's/\bcommon\./etc.&/' so that you get a backup of the original file.
If your grep implementation supports Perl syntax (-P flag, on e.g. Linux it's usually available), you can benefit from the additional features like word boundaries:
$ grep -Pr '\bcommon\.'
By the way:
grep -r tends to be much slower than a previously piped find command as in Rob's example. Furthermore, when you're sure that the file-names found do not contain any whitespace, using xargs is much faster than -exec:
$ find . -type f -name '*.java' | xargs grep -P '\bcommon\.'
Or, applied to Tim's example:
$ find . -type f -name '*.java' | xargs sed -i.bak 's/\<common\./etc.common./'
Note that, in the latter example, the replacement is done after creating a *.bak backup for each file changed. This way you can review the command's results and then delete the backups:
$ find . -type f -name '*.bak' | xargs rm
If you've made an oopsie, the following command will restore the previous versions:
$ find . -type f -name '*.bak' | while read LINE; do mv -f $LINE `basename $LINE`; done
Of course, if you aren't sure that there's no whitespace in the file names and paths, you should apply the commands via find's -exec parameter.
Cheers!
This is roughly how you would do it using find. This requires testing
find . -name \*.java -exec sed "s/FIND_STR/REPLACE_STR/g" {}
This translates as "Starting from the current directory find all files that end in .java and execute sed on the file (where {} is a place holder for the currently found file) "s/FIND_STR/REPLACE_STR/g" replaces FIND_STR with REPLACE_STR in each line in the current file.

Why does find -regex not accept my regex?

I want to select some files that are matching a regular expression.
Files are for example:
4510-88aid-50048-INA.txt
4510-88nid-50048-INA.txt
xxxx-05xxx-xxxxx-INA.txt
I want all files that match this regex:
.*[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt
In my opinion this have to be xxxx-05xxx-xxxxx-INA.txt in the case above.
Using some tool like RegexTester, everything works perfect.
Using the bash command find -regex doesn´t seem to work for me.
My question is, why?
I can't figure it out, I am using:
find /some/path -regex ".*[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt" -exec echo {} \;
But nothing is printed... Any ideas?
$ uname -a
Linux debmu838 2.6.5-7.321-smp #1 SMP Mon Nov 9 14:29:56 UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
bash4+ and perl
ls /some/path/**/*.txt | perl -nle 'print if /^[\w]{4}-05(?!aid)[\w]{3}-[\w]{5}-INA\.txt/'
you should have in your .profile shopt -s globstar
According to the find man page the find regex uses per default emacs regex. And according to http://www.regular-expressions.info/refflavors.html emacs is GNU ERE and that does not support look arounds.
You can try a different -regextype like #l0b0 suggested, but also the Posix flavours seems to not support this feature.
I pretty much ditto the other answers: Find's -regex switch can't emulate everything in Perl's regex, However, here's something you can try...
Take a look at the find2perl command. That program can take a typical find statement, and give you a Perl program equivalent for it. I don't believe -regex is recognized by find2perl (It's not in the standard Unix find, but only in the GNU find), but you can simply use -name, and then see the program it generates. From there, you can modify the program to use the Perl expressions you want in your regex. In the end, you'll get a small Perl script that will do the file directory find you want.
Otherwise, try using -regextype posix-extended which pretty much match most of Perl's regex expressions. You can't use look arounds, but you can probably find something that does work.
What you've got looks like a Perl regex. Try with a different -regextype, and tweak the regex accordingly:
Changes the regular expression syntax
understood by -regex and -iregex
tests which occur later on the command
line. Currently-implemented types are
emacs (this is the default),
posix-awk, posix-basic, posix-egrep
and posix-extended.
Try this:
ls ????-??aid-?????-INA.txt
Try simple script like this:
#!/bin/bash
for file in *INA.txt
do
match=$(echo "${file%INA.txt}" | sed -r 's/^\w{4}-\w{5}-\w{5}-$/found/')
[ $match == "found" ] && echo "$file"
done

why isn't this regex working : find ./ -regex '.*\(m\|h\)$

Why isn't this regex working?
find ./ -regex '.*\(m\|h\)$
I noticed that the following works fine:
find ./ -regex '.*\(m\)$'
But when I add the "or a h at the end of the filename" by adding \|h it doesn't work. That is, it should pick up all my *.m and *.h files, but I am getting nothing back.
I am on Mac OS X.
On Mac OS X, you can't use \| in a basic regular expression, which is what find uses by default.
re_format man page
[basic] regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality.
The easiest fix in this case is to change \(m\|h\) to [mh], e.g.
find ./ -regex '.*[mh]$'
Or you could add the -E option to tell find to use extended regular expressions instead.
find -E ./ -regex '.*(m|h)$'
Unfortunately -E isn't portable.
Also note that if you only want to list files ending in .m or .h, you have to escape the dot, e.g.
find ./ -regex '.*\.[mh]$'
If you find this confusing (me too), there's a great reference table that shows which features are supported on which systems.
Regex Syntax Summary [Google Cache]
A more efficient solution is to use the -o flag:
find . -type f \( -name "*.m" -o -name "*.h" \)
but if you want the regex use:
find . -type f -regex ".*\.[mh]$"
Okay this is a little hacky but if you don't want to wrangle the regex limitations of find on OSX, you can just pipe find's output to grep:
find . | grep ".*\(\h\|m\)"
What’s wrong with
find . -name '*.[mh]' -type f
If you want fancy patterns, then use find2perl and hack the pattern.

How to use UNIX find to find (file1 OR file2)?

In the bash command line, I want to find all files that are named foo or bar. I tried this:
find . -name "foo\|bar"
but that doesn't work. What's the right syntax?
You want:
find . \( -name "foo" -o -name "bar" \)
See the wikipedia page (of all places)
I am cheap with find, I would use this:
find ./ | grep -E 'foo|bar'
Thats just my personal pref, I like grep more than find because the syntax is easier to 'get' and once you master it there are more uses than just walking file tree.