How to find multiple files with different ending in LInux using regex? - regex

Let's say that I have multiple files such as:
root.file991
root.file81
root.file77
root.file989
If I want to delete all of them, I would need to use a regex first, so I have tried:
find ./ - regex '\.\/root'
...which would find everything in root file, but how do I filter all these specific files?

You can use
find ./ -regextype posix-extended -regex '\./root\.file[0-9]+'
The regex will match paths like
\. - a dot
/root\.file - a /root.file text
[0-9]+ - ending with one or more digits.

I'm not quite sure what you mean by "files in root file" but if I understand correctly regular POSIX glob(7) pattern matching should be sufficient:
rm root.file[0-9]*

Depending on how complex the other files are, you may have to build up the regex more. $ man find has useful help as well. Try the following:
$ find ./ -regex '\.\/root.file[0-9].*'
# if that works to find what you are looking for, add the -delete
$ find ./ -regex '\.\/root.file[0-9].*' -delete

Related

How to use mksquashfs -regex?

I want to mksquashfs a chroot, and include the /cdrom dir, but exclude everything inside it. I already know how to do this with -wildcards, but I want to see if -regex has a bug. Test case:
cd $(mktemp -d)
mkdir -p cdrom cdrom2/why
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom/.*$'
The problem is that cdrom2/why was omitted! It seems to me like "/" is actually ignored there. Is this a mksquashfs bug?
This is because you don't fully understand how regexes work in Mksquashfs exclude files.
An exclude file if wildcards are used is basically treated as series of wildcarded files separated by slashes (/), i.e. wildcard1/wildcard2/wildcard3, will match wildcard1 against the top level directory, wildcard2 against the subdirectory and so on.
Specifying -regex simply replaces wildcard matching with regex matching. It is still evaluated as regexes separated by slashes (/), i.e. regex1/regex2/regex3.
In your example the regex "^cdrom" is evaluated against the files in the top level directory, and matches both "cdrom" and "cdrom2".
If you wanted the regex to only match "cdrom" you should use
mksquashfs . /tmp/chroot.squashfs -info -noappend -regex -e '^cdrom$/.*'

Unix - Using find to List all .html files. (Do not use shell wildcards or the ls command)

I've tried 'find -name .html$', 'find -name .html\>'.
None worked.
I'd like to know why these two are wrong and what's the right one to use with no wildcards?
What you needed was
find -name '*.html'
Or for regex:
find -regex '.*/.*\.html'
To ignore case, use -iname or -iregex:
find -iname '*.html'
find -iregex '.*/.*\.html'
Manual for -name:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
find . -name '*.html'
You have to single quote the wildcard to keep the shell from globbing it when passing it to find.
You want
find . -name "*.html"
Find uses emacs regex by default, not the posix you are probably used to.
You are missing a couple things here. First of all the path. If you are searching in the local path, use . For example: find . will list every file and directory recursively in the current directory. Second a * is a wildcard. So to find all the .html files in the current directory, try
find . -name *.html

why isn't this regex working : find ./ -regex '.*\(m\|h\)$

Why isn't this regex working?
find ./ -regex '.*\(m\|h\)$
I noticed that the following works fine:
find ./ -regex '.*\(m\)$'
But when I add the "or a h at the end of the filename" by adding \|h it doesn't work. That is, it should pick up all my *.m and *.h files, but I am getting nothing back.
I am on Mac OS X.
On Mac OS X, you can't use \| in a basic regular expression, which is what find uses by default.
re_format man page
[basic] regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality.
The easiest fix in this case is to change \(m\|h\) to [mh], e.g.
find ./ -regex '.*[mh]$'
Or you could add the -E option to tell find to use extended regular expressions instead.
find -E ./ -regex '.*(m|h)$'
Unfortunately -E isn't portable.
Also note that if you only want to list files ending in .m or .h, you have to escape the dot, e.g.
find ./ -regex '.*\.[mh]$'
If you find this confusing (me too), there's a great reference table that shows which features are supported on which systems.
Regex Syntax Summary [Google Cache]
A more efficient solution is to use the -o flag:
find . -type f \( -name "*.m" -o -name "*.h" \)
but if you want the regex use:
find . -type f -regex ".*\.[mh]$"
Okay this is a little hacky but if you don't want to wrangle the regex limitations of find on OSX, you can just pipe find's output to grep:
find . | grep ".*\(\h\|m\)"
What’s wrong with
find . -name '*.[mh]' -type f
If you want fancy patterns, then use find2perl and hack the pattern.

Regular Expression Differences Between ls and find to search for 1 string or another string

I'm having a minor brain-fart that I'm sure someone can answer quickly. I'm using cygwin to get a bash shell in windows (in case that has any idiosyncrasies) and am having trouble shifting a regular expression between ls and find.
I have a bunch of files that I need to access, some which start EA_ and some which start FS_ so I can list them with ls like this
ls -l {EA,FS}_*
and this also works fine with wc but when I try to use this in a find, the regex doesn't seem to be right:-
find . -iname "{EA,FS}_*"
I've tried escaping the { and } but that doesn't seem to work either - what am I doing wrong?
Cheers
MH
Looks like you need a regular expression instead of the usual name glob:
find . -iregex './\(EA\|FS\)_.*'
Remember with this syntax that you have to match the directory too. From your commands it looks like you're doing it all in one directory (no depth) so what I've provided will work. For more recursive searches you'd need a different regex.
Test run on Cygwin, Windows 7:
$ find . -iregex './\(RT\|ED\).*' | head
./ED-AT-CK01-A01.xml
./ED-AT-CK02-A01.xml
./ED-AT-CL01-A01.xml
./ED-AT-CL02-A01.xml
./ED-AT-CL03-A01.xml
./ED-AT-CL04-A01.xml
./ED-AT-IL001-A01.xml
./ED-AT-IL01-A01.xml
./ED-AT-IL02-A01.xml
./ED-AT-TB02-A01.xml
you can also do this
find . -type f \( -iname "ES*" -o -iname "FS_*" \)

Using non-consuming matches in Linux find regex

Here's my problem in a simplified scenario.
Create some test files:
touch /tmp/test.xml
touch /tmp/excludeme.xml
touch /tmp/test.ini
touch /tmp/test.log
I have a find expression that returns me all the XML and INI files:
[root#myserver] ~> find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)'
/tmp/test.ini
/tmp/test.xml
/tmp/excludeme.xml
I now want a way of modifying this -regex to exclude the excludeme.xml file from being included in the results.
I thought this should be possible by using/combining a non-consuming regex (?=expr) with a negated match (?!expr). Unfortunately I can't quite get the format of the command right, so my attempts result in no matches being returned. Here was one of my attempts (I've tried many different forms of this with different escaping!):
find /tmp -name -prune -o -regex '\(?=.*excludeme\.xml\).*\.\(xml\|ini\)'
I can't break down the command into multiple steps (e.g. piping through grep -v) as the find command is assumed as input into other parts of our tool.
This does what you want on linux:
find /tmp -name -prune -o -regex '.*\.\(xml\|ini\)' \! -regex '.*excludeme\.xml'
I'm not sure if the "!" operator is unique to gnu find.
Not sure about what escapes you need or if lookarounds work, but these work for Perl:
/^(?!.*\/excludeme\.).*\.(xml|ini)$/
/(?<!\/excludeme)\.(xml|ini)$/
Edit - Just checked find command, best you can do with find is to change the regextype to -regextype posix-extended but that doesen't do stuff like look-arounds. The only way around this looks to be using some gnu stuff, either as #unholygeek suggests with find or piping find into gnu grep with the -P perl option. You can use the above regex verbatim if you go with a gnu grep. Something like find .... -print | xargs grep -P ...
Sorry, thats the best I can do.