Regex: Match all the words that contains some word - regex

I want to match all the words that contains the word "oana". I put "OANA" with uppercase letters in some words, at the beginning, middle, and at the end of words.
blah OANAmama blah aOANAtata aOANAt msmsmsOANAasfasfa mOANAmsmf OANAtata OANA3 oanTy
Anyway, I made a regex, but it is not very good, because it doesn't select all words that contains "oana"
\b\w+(oana)\w+\b
Can anyone give me another solution?

You need to use a case insensitive flag and replace + with *:
/\b\w*oana\w*\b/i
See the regex demo (a global modifier may or may not be used, depending on the regex engine). The case insensitive modifier may be passed as an inline option in some regex engines - (?i)\b\w*oana\w*\b.
Here,
\b - a word boundary
\w* - 0+ word chars
oana - the required char string inside a word
\w* - 0+ word chars
\b - a word boundary

Related

Word boundaries in Atlas Search regex operator [duplicate]

I'd like to a make a regex query in Elastisearch with word boundaries, however it looks like the Lucene regex engine doesn't support \b. What workarounds can I use?
In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial \b is something like (^|[^A-Za-z0-9_]) if the word starts with a word char, and the trailing \b is like ($|[^A-Za-z0-9_]) if the word ends with a word char.
Thus, we need to make sure that there is a non-word char before and after word or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_] optional at start/end of string is add .* beside and wrap with an optional grouping construct:
(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?
Details
(.*[^A-Za-z0-9_])? - either start of string or any 0+ chars (but a line break char, else use (.|\n)*) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word - a word
([^A-Za-z0-9_].*)? - an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).

Negating duplicate words pattern

I am new to regex and have the following pattern that detects duplicate words separated with dashes
\b(\w+)-+\1\b
// matches: hey-hey
// not matches: hey-hei
What I really need is a negated version of this pattern.
I've tried negative lookahead, but no good.
(?!\b(\w+)-+\1\b)
You can use
\b(\w+)-+(?!\1\b)\w+
See the regex demo. Details:
\b - a word boundary
(\w+) - Group 1: one or more word chars
-+ - one or more hyphens
(?!\1\b)\w+ - one or more word chars that are not equal to the first capturing group value.

Match everything until upcase word

I want to capture a word placed before another one which is full capitalized
Mister Foo BAR is here # => "Foo"
Miss Bar-Barz FOO loves cats # => "Bar-Barz"
I've been trying the following regex: (Mister|Miss)\s([[:alpha:]\s\-]+)(?=\s[A-Z]+), but sometimes it includes the rest of the sentence. For example, it'll return Bar-Barz FOO loves cats instead of Bar-Barz).
How can I say, using RegExp, "match every words until the upcase word" ?
To clarify the usage of negative lookahead, can we say it "captures until the specified sub-pattern matches, but does not include it to the match data" ?
As a non-native English speaker, apologies if my answer isn't perfectly formulated. Thanks by advance
Match 1+ word chars optionally repeated by a - and 1+ word chars to not match only hyphens or a hyphen at the end.
Assert a space followed by 1+ uppercase chars and a word boundary at the right.
\w+(?:-\w+)*(?=\s[A-Z]+\b)
Explanation
\w+ Match 1+ word char
(?:-\w+)* Optionally repeat matching - and 1+ word chars
(?=\s[A-Z]+\b) Positive lookahead, assert what is directly at the right is 1+ uppercase chars A-Z followed by a word boundary
Regex demo
If there can not be any newlines between the words, you can use [^\S\r\n] instead of \s
\w+(?:-\w+)*(?=[^\S\r\n]+[A-Z]+\b)
Regex demo
I want to capture a word placed before another one which is full capitalized
You may use this regex with a lookahead:
\b\S+(?=[ \t]+[A-Z]+\b)
RegEx Demo
RegEx Description:
\b: Word boundadry
\S+: Match 1+ non-whitespace characters
(?=[ \t]+[A-Z]+\b): Positive lookahead that asserts we have 1+ space and then a word containing only capital letters
You don't say what language you're working in, but the following works for me. The idea is to stop when the parser hits a sequence of uppercase letters/hyphens.
JS example:
let ptn = /(Mister|Miss)\s[\w\-]+(?=\s[A-Z\-]+)/;
"Mister Foo BAR is here".match(ptn); //["Mister Foo", "Mister"]
"Miss Bar-Barz FOO loves cats".match(ptn); //["Miss Bar-Barz", "Miss"]

find word that each character separated by space

I need a regex to select a word that each char on that word separated by whitespace. Look at the following string
Mengkapan,Sungai Apit,S I A K,Riau,
I want to select S I A K. I am stuck, I was trying to use the following regex
\s+\w{1}\s+
but it's not working.
I suggest
\b[A-Za-z](?:\s+[A-Za-z])+\b
pattern, where
\b - word boundary
[A-Za-z] - letter (exactly one)
(?: - one or more groups of
\s+ - white space (at least one)
[A-Za-z] - letter (exactly one)
)+
\b - word boundary
For your given information, you could use
(?:[A-Za-z] ){2,}[A-Za-z]
See a demo on regex101.com.
You could match a word boundary \b, a word character \w and repeat at least 2 times a space and a word character followed by a word boundary:
\b\w(?: \w){2,}\b
Regex demo

Word boundary in Lucene regex

I'd like to a make a regex query in Elastisearch with word boundaries, however it looks like the Lucene regex engine doesn't support \b. What workarounds can I use?
In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial \b is something like (^|[^A-Za-z0-9_]) if the word starts with a word char, and the trailing \b is like ($|[^A-Za-z0-9_]) if the word ends with a word char.
Thus, we need to make sure that there is a non-word char before and after word or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_] optional at start/end of string is add .* beside and wrap with an optional grouping construct:
(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?
Details
(.*[^A-Za-z0-9_])? - either start of string or any 0+ chars (but a line break char, else use (.|\n)*) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word - a word
([^A-Za-z0-9_].*)? - an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).