Differentiate between .h and .sh with find and regex - regex

I am trying to remove files with certain extensions from a directory. The command I am using is not able to differentiate between .h and .sh. Where can I improve my regex?
This is my current command:
find directory/ -type f -regextype posix-extended -regex '.*.(java|[hc]|cpp|hpp|cc|hh)'
Currently this returns .csh and .sh files. I do not want that to happen. When I remove "[hc]" this fixes the problem, but then I cannot find any .c or .h files. I have also tried
find directory/ -type f -regextype posix-extended -regex '.*.(java|h|c|cpp|hpp|cc|hh)'
but this returns .csh and .sh files as well.

Add an end of input anchor:
find ... -regex '.*\.(java|h|c|cpp|hpp|cc|hh)$'
This makes the list an absolute list of extensions, rather than just a prefix of the extension.

Related

Find Regextype non recursive

I'm trying to isolate some PHP infected files which includes 8 alphanurical chars from the /home directory and recursively.
I'm able to have them located once I'm on the directory with the command:
find ./ -regextype posix-egrep -regex ^./[a-zA-Z0-9]{8}\.php$
Or
find ./ -regextype posix-egrep -regex '^./[a-zA-Z0-9]{8}\.php$'
But as soon as I try from another directory:
find /home -regextype posix-egrep -regex '^./[a-zA-Z0-9]{8}\.php$'
It comes without any results.
I have tried to add the flag -L (--follow) but it comes without any results and there are many. file system loop errors.
I have read many answers online which seems to be related on glob and find works.
I tried different solutions such as :
find . -type f -print | egrep '^./[a-zA-Z0-9]{8}\.php$'
Ideally the output should be the full path regardless of depth so I may quickly delete them all.
The main point is that find command regex needs to match the entire path with the file name. So, if there is are other folder/directory names before the file name, you need to consume them, too.
Besides, [a-zA-Z0-9] is better replaced with [[:alnum:]]:
find /home -regextype posix-egrep -regex '^.*/[[:alnum:]]{8}\.php$'
Actually, ^ is redundant here:
find /home -regextype posix-egrep -regex '.*/[[:alnum:]]{8}\.php$'
will work, too.

How to ignore file with .<numberic>.ext in git?

I have a list of file in my project:
For example:
1. src/index.1.js
2. src/screens/index.1.js
3. src/screens/index.2.js
I want to ignore all the files having the numeric number.
I have tried using **/*.1.* , **/*.2.*. Is there a way to ignore all the file with numeric value?
You can use a range. For your example:
**/*.[0-9].js
Would match a js file in any directory that ends with .(number).js
Git uses glob pattern to match ignored files. Use the following to ignore all such above-mentioned files (with multi-digit numbers also).
**/*.[0-9]*.js
Why don't you run the following find command after eventually adapting the \.js part if you do not want to take into account only the .js files:
find . -type f -regextype sed -regex '.*\/.*\.[0-9]\+\.js'
./src/screens/index.2.js
./src/screens/index.123.js
./src/index.1.js
when you find all the files you are interested in, change your find command into:
find . -type f -regextype sed -regex '.*\/.*\.[0-9]\+\.js' -exec git checkout {} \;
to checkout those files.

Unix find not respecting regex

I'm trying to do a simple find in my /var/log directory to find all syslog files that are not zipped. What I have so far is the regex:
syslog(\.[0-9]*)?$
So this would find syslog, syslog.1, syslog.999, etc and skip over the gzipped logs like syslog.1.gz or anything else not matching the pattern of the aforementioned syslogs. I'm doing a pretty basic find command, too:
find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
However, I always get an empty result! Now, I thought the regex I wrote was POSIX-extended compatible, but it doesn't seem to be so. Here are variations of the command I ran, to no avail:
find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
sudo find /var/log -regextype posix-extended -regex "syslog(\\.[0-9]*)?$"
find /var/log -regextype posix-extended -regex "syslog"
find /var/log -regextype posix-extended -regex "(syslog)"
This following works as expected by listing all files in the directory, however, so I know my command format is correct.
find /var/log -regextype posix-extended -regex ".*"
What am I doing wrong?
The regex pattern you provide needs to match the whole path. That means that you don't need to anchor it at the beginning and end with ^ and $, it's already implicitly anchored at both ends. But you do need to provide a leading .* or something similar if the rest of your pattern should match somewhere other than the beginning (and remember, find paths always include a directory, even if it's .).
find . -regextype posix-extended -regex '.*syslog(\.[0-9]*)?'
works for me.

how to set gnu find use posix-extended regex type as default

When I use find regex to find .c .cpp .h files
I have to type
find . -regex ".*\.\(c\|cpp\|h)"
or use posix-extended regex type
find . -regextype posix-extended -regex ".*\.(c|cpp)"
The first one have so many '\' and not easy to read.
The second one have to type much more characters. And I am familiar with the second one.
Is there any way to make find use posix-extended regex as default?
I tried to set a alias
alias find='find -regextype posix-extended'
at my .zshrc file. But it doesn't work because find need put the path on the second argument.
Thanks for any suggestion.
With zsh you have a few options. You can define a global alias:
alias -g reg="-regextype posix-extended"
This will allow you to type find ./ reg -regex ".*\.(c|cpp)" and zsh will do the replacement for you.
The other option is to create a function. Something like:
function findr()
{
dir=$1;
shift;
find $dir -regextype posix-extended $*
}
You can call it as follows:
findr ./ -regex ".*\.(c|cpp)"

Using regex in find command for multiple file types [duplicate]

This question already has answers here:
How to use find command to find all files with extensions from list?
(10 answers)
Closed 7 years ago.
I am currently using
find . -name '*.[cCHh][cC]' -exec grep -nHr "$1" {} ; \
find . -name '*.[cCHh]' -exec grep -nHr "$1" {} ;
to search for a string in all files ending with .c, .C, .h, .H, .cc and .CC listed in all subdirectories. But since this includes two commands this feels inefficient.
How can I use a single regex pattern to find .c,.C,.h,.H,.cc and .CC files?
I am using bash on a Linux machine.
You can use the boolean OR argument:
find . -name '*.[ch]' -o -name '*.[CH]' -o -name '*.cc' -o -name '*.CC'
The above searches the current directory and all sub-directories for files that end in:
.c, .h OR
.C, .H OR
.cc OR
.CC.
This should work
Messy
find . -iregex '.*\.\(c\|cc\|h\)' -exec grep -nHr "$1" {} +
-iregex for case-insensitive regex pattern.
(c|cc|h) (nasty escapes not shown) matches c, cc, or h extensions
Clean
find -regextype "posix-extended" -iregex '.*\.(c|cc|h)' -exec grep -nHr "$1" {} +
This will find .Cc and .cC extensions too. You have been warned.
This command works.
find -regextype posix-extended -regex '.+\.(h|H|c{1,2}|C{1,2})$'
I wish I could use iregex. iregex would also find .Cc and .cC. If I could, the command would look like this. Just a bit shorter.
find -regextype posix-extended -iregex '.+\.(h|H|c{1,2})$'
find . -regex '.*\.\([chCH]\|cc\|CC\)'
will find all files with names ending in .c,.C,.h,.H,.cc and .CC and does not find any that end in .hc, .cC, or .Cc. In the regex, the first few characters match through the last period in a name, and the parenthesized alternatives match any of the single characters c, h, C, or H, or either of cc or CC.
Note, find's -regex and -iregex switches are analogous to -name and -iname, but the regex-type switches allow regular expressions with | for alternative matches. Like -iname, -iregex is case-insensitive.
The (non-functional) form
find . -name '*.[cCHh][cC]?$'
given in a previous answer doesn't list any names on my linux system with GNU find 4.4.2.
Another problem with '*.[cCHh][cC]?$' as a regex is that it will match names like abc.Cc and xyz.hc which are not in the set of .c,.C,.h,.H,.cc and .CC files that you want.