Find Regextype non recursive - regex

I'm trying to isolate some PHP infected files which includes 8 alphanurical chars from the /home directory and recursively.
I'm able to have them located once I'm on the directory with the command:
find ./ -regextype posix-egrep -regex ^./[a-zA-Z0-9]{8}\.php$
Or
find ./ -regextype posix-egrep -regex '^./[a-zA-Z0-9]{8}\.php$'
But as soon as I try from another directory:
find /home -regextype posix-egrep -regex '^./[a-zA-Z0-9]{8}\.php$'
It comes without any results.
I have tried to add the flag -L (--follow) but it comes without any results and there are many. file system loop errors.
I have read many answers online which seems to be related on glob and find works.
I tried different solutions such as :
find . -type f -print | egrep '^./[a-zA-Z0-9]{8}\.php$'
Ideally the output should be the full path regardless of depth so I may quickly delete them all.

The main point is that find command regex needs to match the entire path with the file name. So, if there is are other folder/directory names before the file name, you need to consume them, too.
Besides, [a-zA-Z0-9] is better replaced with [[:alnum:]]:
find /home -regextype posix-egrep -regex '^.*/[[:alnum:]]{8}\.php$'
Actually, ^ is redundant here:
find /home -regextype posix-egrep -regex '.*/[[:alnum:]]{8}\.php$'
will work, too.

Related

Differentiate between .h and .sh with find and regex

I am trying to remove files with certain extensions from a directory. The command I am using is not able to differentiate between .h and .sh. Where can I improve my regex?
This is my current command:
find directory/ -type f -regextype posix-extended -regex '.*.(java|[hc]|cpp|hpp|cc|hh)'
Currently this returns .csh and .sh files. I do not want that to happen. When I remove "[hc]" this fixes the problem, but then I cannot find any .c or .h files. I have also tried
find directory/ -type f -regextype posix-extended -regex '.*.(java|h|c|cpp|hpp|cc|hh)'
but this returns .csh and .sh files as well.
Add an end of input anchor:
find ... -regex '.*\.(java|h|c|cpp|hpp|cc|hh)$'
This makes the list an absolute list of extensions, rather than just a prefix of the extension.

Unix find not respecting regex

I'm trying to do a simple find in my /var/log directory to find all syslog files that are not zipped. What I have so far is the regex:
syslog(\.[0-9]*)?$
So this would find syslog, syslog.1, syslog.999, etc and skip over the gzipped logs like syslog.1.gz or anything else not matching the pattern of the aforementioned syslogs. I'm doing a pretty basic find command, too:
find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
However, I always get an empty result! Now, I thought the regex I wrote was POSIX-extended compatible, but it doesn't seem to be so. Here are variations of the command I ran, to no avail:
find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
sudo find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
find /var/log -regextype posix-extended -regex "syslog"
find /var/log -regextype posix-extended -regex "(syslog)"
This following works as expected by listing all files in the directory, however, so I know my command format is correct.
find /var/log -regextype posix-extended -regex ".*"
What am I doing wrong?
The regex pattern you provide needs to match the whole path. That means that you don't need to anchor it at the beginning and end with ^ and $, it's already implicitly anchored at both ends. But you do need to provide a leading .* or something similar if the rest of your pattern should match somewhere other than the beginning (and remember, find paths always include a directory, even if it's .).
find . -regextype posix-extended -regex '.*syslog(\.[0-9]*)?'
works for me.

Find file with at least one number in /usr/include

Using the find command, I want to see the files in /usr/include whose name contains at least one number.
I tried this command :
find /usr/include -type f -regex '.\*[0-9].\*$'
But the number is not always in the name of the file but sometimes in the path. For example /usr/include/linux/netfilter_ipv4/ipt_ah.h is found.
After that, I tried this command :
find /usr/include -type f -regex '/[^\/]*[0-9][^\/]*$'
But it returns nothing.
How can I resolve this problem?
If you use the -name test instead of the -regex test, it will match only the filename, ignoring the preceding directories (see the man page). Note that -name uses a shell pattern rather than a regex pattern, so the syntax is slightly different. You can use this command to find files which have numbers in the filename:
find /usr/include -type f -name '*[0-9]*'
With regex itself:
find /usr/include/ -type f -regex ".*/[^/]*[0-9][^/]*"
Here, we look for atleast 1 number after the last / in the file names.

why isn't this regex working : find ./ -regex '.*\(m\|h\)$

Why isn't this regex working?
find ./ -regex '.*\(m\|h\)$
I noticed that the following works fine:
find ./ -regex '.*\(m\)$'
But when I add the "or a h at the end of the filename" by adding \|h it doesn't work. That is, it should pick up all my *.m and *.h files, but I am getting nothing back.
I am on Mac OS X.
On Mac OS X, you can't use \| in a basic regular expression, which is what find uses by default.
re_format man page
[basic] regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality.
The easiest fix in this case is to change \(m\|h\) to [mh], e.g.
find ./ -regex '.*[mh]$'
Or you could add the -E option to tell find to use extended regular expressions instead.
find -E ./ -regex '.*(m|h)$'
Unfortunately -E isn't portable.
Also note that if you only want to list files ending in .m or .h, you have to escape the dot, e.g.
find ./ -regex '.*\.[mh]$'
If you find this confusing (me too), there's a great reference table that shows which features are supported on which systems.
Regex Syntax Summary [Google Cache]
A more efficient solution is to use the -o flag:
find . -type f \( -name "*.m" -o -name "*.h" \)
but if you want the regex use:
find . -type f -regex ".*\.[mh]$"
Okay this is a little hacky but if you don't want to wrangle the regex limitations of find on OSX, you can just pipe find's output to grep:
find . | grep ".*\(\h\|m\)"
What’s wrong with
find . -name '*.[mh]' -type f
If you want fancy patterns, then use find2perl and hack the pattern.

Using non-consuming matches in Linux find regex

Here's my problem in a simplified scenario.
Create some test files:
touch /tmp/test.xml
touch /tmp/excludeme.xml
touch /tmp/test.ini
touch /tmp/test.log
I have a find expression that returns me all the XML and INI files:
[root#myserver] ~> find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)'
/tmp/test.ini
/tmp/test.xml
/tmp/excludeme.xml
I now want a way of modifying this -regex to exclude the excludeme.xml file from being included in the results.
I thought this should be possible by using/combining a non-consuming regex (?=expr) with a negated match (?!expr). Unfortunately I can't quite get the format of the command right, so my attempts result in no matches being returned. Here was one of my attempts (I've tried many different forms of this with different escaping!):
find /tmp -name -prune -o -regex '\(?=.*excludeme\.xml\).*\.\(xml\|ini\)'
I can't break down the command into multiple steps (e.g. piping through grep -v) as the find command is assumed as input into other parts of our tool.
This does what you want on linux:
find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)' \! -regex '.*excludeme\.xml'
I'm not sure if the "!" operator is unique to gnu find.
Not sure about what escapes you need or if lookarounds work, but these work for Perl:
/^(?!.*\/excludeme\.).*\.(xml|ini)$/
/(?<!\/excludeme)\.(xml|ini)$/
Edit - Just checked find command, best you can do with find is to change the regextype to -regextype posix-extended but that doesen't do stuff like look-arounds. The only way around this looks to be using some gnu stuff, either as #unholygeek suggests with find or piping find into gnu grep with the -P perl option. You can use the above regex verbatim if you go with a gnu grep. Something like find .... -print | xargs grep -P ...
Sorry, thats the best I can do.