OSX find to match whole filenames - regex

I'm trying to search for whole word ending with .properties
So far this works:
find -E . -iregex '.*[:alnum:]+\.properties'
But I want to find only paths like
/some/path/messages.properties
and not
/some/path/messages_en.properties
The previous regex matches "en.properties" So, Im trying to say something like:
.*\/[:alnum:]+\.properties
That is, anything followed by slash then a word and then .properties but the slash part seems not to be working

You can use this regex with anchor $ on RHS and / on LHS to ensure filename is always complete:
find -E . -iregex '.*/[[:alnum:]]+\.properties$'

Related

Trying to use GNU find to search recursively for filenames only (not directories) containing a string in any portion of the file name

Trying to find a command that is flexible enough to allow for some variations of the string, but not other variations of it.
For instance, I am looking for audio files that have some variation of "rain" in the filename only (rains, raining, rained, rainbow, rainfall, a dark rain cloud, etc), whether at the beginning, end or middle of the filename.
However, this also includes words like "brain", "train", "grain", "drain", "Lorraine", et al, which are not wanted (basically any word that has nothing to do with the concept of rain).
Something like this fails:
find . -name '*rain*' ! -name '*brain*'| more
And I'm having no luck with even getting started on building a successful regex variant because I cannot wrap my mind around regex ... for instance, this doesn't do anything:
# this is incomplete, just a stub of where I was going
# -type f also still includes a directory name
find . -regextype findutils-default -iregex '\(*rain*\)' -type f
Any help would be greatly appreciated. If I could see a regex command that does everything I want it to do, with an explanation of each character in the command, it would help me learn more about regex with the find command in general.
edit 1:
Taking cues from all the feedback so far from jhnc and Seth Falco, I have tried this:
find . -type f | grep -Pi '(?<![a-zA-Z])rain'
I think this pretty much works (I don't think it is missing anything), my only issue with it is that it also matches on occurrences of "rain" further up the path, not only in the file name. So I get example output like this:
./Linux/path/to/radiohead - 2007 - in rainbows/09 Jigsaw Falling Into Place.mp3
Since "rain" is not in the filename itself, this is a result I'd rather not see. So I tried this:
find . -type f -printf '%f\n' | grep -Pi '(?<![a-zA-Z])rain'
That does ensure that only filenames are matched, but it also does not output the paths to the filenames, which I would still like to see, so I know where the file is.
So I guess what I really need is a PCRE (PCRE2 ?) which can take the seemingly successful look-behind method, but only apply it after the last path delimiter (/ since I am on Linux), and I am still stumped.
specification:
match "rain"
in filename
only at start of a word
case-insensitive
assumptions:
define "word" to be sequence of letters (no punctuation, digits, etc)
paths have form prefix/name where prefix can have one or more levels delimited by / and name does not contain /
constraints:
find -iregex matches against entire path (-name only matches filename)
find -iregex must match entirety of path (eg. "c" is only a partial match and does not match path "a/b/c")
method:
find can return matches against non-files (eg. directories). Given definition 6, we would be unable to tell if name is a directory or an ordinary file. To satisfy 2, we can exclude non-files using find's -type f predicate.
We can compare paths found by find against our specification by using find's case-insensitive regex matching predicate (-iregex). The "grep" flavour (-regextype grep) is sufficiently expressive.
Just using 1, a suitable regex is: rain
2+6+7 says we must forbid / after "rain": rain[^/]*$
[/] matches character in set (ie. /)
[^/]: ^ inverts match: ie. character that is not /
* matches preceding match zero or more times
$ constrains preceding match to occur at end of input
3+5 says there must be no immediately preceding word characters: [^a-z]rain[^/]*$
a-z is a shortcut for the range a to z
8 requires matching the prefix explicitly: ^.*[^a-z]rain[^/]*$
^ outside of [...] constrains subsequent match to occur at beginning of input
. matches anything
[^a-z] matches a non-alphabetic
Final command-line:
find . -type f -regextype grep -iregex '^.*[^a-z]rain[^/]*$'
Note: The leading ^ and trailing $ are not actually required, given 8, and could be elided.
exercise for the reader:
extend "word" to non-ASCII characters (eg. UTF-8)
You probably want to use either a character class, word boundary, or just have a negative look behind for alpha characters.
Look Behind
^.+(?<![a-zA-Z])rain[^\/]*$
Matches any instance of rain, but only if it's not following [a-zA-Z], and doesn't have any slashes afterwards. Unfortunately, find doesn't support look ahead or look behind… so we'll use a character class instead.
Character Class
^.+(?:^|[^a-zA-Z])rain[^\/]*$
Matches the start of the line, or a character that isn't [a-zA-Z], then proceeds to match by the characters for rain if it comes immediately after, so long as there are no slashes afterwards.
You can use it in find like this:
find ./ -iregex '.+(?:^|[^a-zA-Z])rain[^\/]*'
The ^ at the start and $ at the end of the pattern are implied when using find with -iregex, so you can omit them.

Linux find command: searching for a filename containing parentheses

I need to find files with filenames like this:
<some regex-matched text> (1).<some regex-matched text>
i.e. I want to search for filenames containing
text ending in a space
then an opening bracket (parenthesis)
followed by the numeral 1
followed by a closing bracket
possibly followed by a dot followed by some more text...
I first went find . -regex '.* \(1\)\..*'. But the brackets are sort of ignored: files matching .* 1\..* are returned.
In the course of my attempt to find an answer I found this page covering Linux find. Here I find this phrase:
"Grouping is performed with backslashes followed by parentheses ‘\(’,
‘\)’."
[NB to show you the reader a single backslash, as shown in that page, I have doubled the backslashes to write the single backslashes above!]
I wasn't sure what to make of that, i.e. how to escape ordinary brackets in that case. I thought maybe doubling up the backslashes in the find expression might work, but it didn't.
Even if I try to do it without using a regex, find seems to have some problems with brackets and/or a dot in this place:
mike#M17A .../accounts $ find . -name *(1).pdf
[... finds stuff OK]
mike#M17A .../accounts $ find . -name *(1).*
find: paths must precede expression: ..
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec|time] [path...] [expression]
mike#M17A .../accounts $ find . -name *(1)\.*
find: paths must precede expression: ..
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec|time] [path...] [expression]
NB putting a space after in the initial * in these attempts also fails...
That is because you don't need to escape these parenthesis. This should work :
find . -regex '.* (1)\(\..*\)?'
Though a capture group is used (escaped parenthesis) \(\..*\) so that we can make the last match optional (possibly followed by a dot followed by some more text) with ?

Extracting string before pattern with sed (bash)

I need some help with sed to remove everything after matching pattern and remove the last "." if it exists..
Take this string as an example:
The.100.S02E05.720p.HDTV.x264-KILLERS.mkv
I want everything before the pattern "S[0-9][0-9]E[0-9[0-9]" except the last "."
What I want:
"The.100"
Does anyone have a great oneliner for this one?
It sounds like you can pretty much use exactly what you had in your question:
sed 's/\.*S[0-9][0-9]E[0-9][0-9].*//'
This matches an optional . character followed by the pattern you suggested (and anything after it), replacing with nothing. You were missing a ] in the question, which I have added.
Testing it out:
$ sed 's/\.*S[0-9][0-9]E[0-9][0-9].*//' <<<'The.100.S02E05.720p.HDTV.x264-KILLERS.mkv'
The.100

Basic find -regex is not working under CentOS

My understanding is that find traverses the entire file path to locate a string. As a result I cannot understand why the below regex is not working.
find / -regex '^sysconfig$'
Should return /etc/sysconfig.
Even simple regex such as find / -regex 'bin' returns nothing.
Am I missing something very simple?
Just change your regex to,
find / -regex '.*sysconfig$'
OR
find / -regex '.*/sysconfig$'
Because -regex find expression matches the whole name, including the relative path from the current directory. So .* at first will match the preceding characters.
In no regex implementation I'm aware of will the regex ^sysconfig$ match something like:
blah blah sysconfig
The ^ and $ are start-string and end-string anchors, meaning that your test string will only match if it is exactly sysconfig, with no other text on either side. And, in fact, they're not even needed since -regex matches the entire string rather than substrings.
If you want all files ending with sysconfig, just use:
find / - regex '.*sysconfig'
Or use '.*/sysconfig' if you want all files called sysconfig in any directory.

how to use regex under find command

I need to list all filenames which is having alien.digits
digits can be anytime from 1 to many
but it should not match if its the mixture of any other thing like alien.htm, alien.1php, alien.1234.pj.123, alien.123.12, alien.12.12p.234htm
I wrote:
find home/jassi/ -name "alien.[0-9]*"
But it is not working and its matching everything.
Any solution for that?
I think what you want is
find home/jassi/ -regex ".*/alien\.[0-9]+"
With -name option you don't specify a regular expression but a glob pattern.
Be aware that find expects that the whole path is matched by the regular expression.
Try this: find home/jassi/ -name "alien\.[0-9]+$"
It will match all files that have alien. and end with at least one digit but nothing else than digits. The $ character means end of string.
The * modifier means 0 or more of the previous match, and . means any character, which means it's matching alien.
Try this instead:
alien\.[0-9]+$
The + modifier means 1 or more of the previous match, and the . has been escaped to a literal character, and the $ on the end means "end of string".
You can also add a ^ to the start of the regex if you want to make sure that only files that exactly match your regex. The ^ character means "start of string", so ^alien\.[0-9]+$ will match alien.1234, but it won't match not_an_alien.1234.
It worked for me:
find home/jassi/ type -f -regex ".*/alien.[0-9]+"
I had to provide type -f to check if it's a file , else it would show the directory also of the same name.
Thanks bmk. I just figured out and at the same time you responded exactly the same thing. Great!