Regex - Match any word but ignore specific word [duplicate] - regex

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 5 years ago.
I want to match any word that starts/ends or not contain with word "end" but ignore word "end", for example:
hello - would match
boyfriend - would match
endless - would match
endend - would match
but
end - would NOT match
I'm using ^(?!end)).*$ but its not what I want.
Sorry for my english

Try this:
^(?!(end)$).+$
This will match everything except end.

You can use this \b(?!(?:end\b))[\w]+
Components:
\b -> Start of the word boundary for each words.
(?! Negative lookahead to eliminate the word end.
(?:end\b) Non capturing parenthesis with the word end and word boundary.
) Closing tag for negative lookahead.
[\w]+ character class to capture words.
Explanation: The regex search will only look for locations starting with word boundaries, and will remove matches with end as only word. i.e [WORD BOUNDARY]end[END OF WORD BOUNDARY]. \w will capture rest of the word. You can keep incrementing this character class if you wish to capture some special characters like $ etc.

So you want to match any word, but not "end" ?
Unless I'm misunderstanding, a conditional statement is everything that is needed... In pseudocode:
if (word != "end") {
// Match
}
If you want to match all the words in a text that are not "end" you could just remove all the non-alpha characters, replace pattern (^end | end | end$) by an empty string, and then do a string split.
The other answers with a single regex might be better then, because regex matches are O(n), no matter of the pattern.

Related

regex grab a word

i'm trying to grab a regex from source, but only name from this type.
"name":"HELP-PERP","posOnly":false,"price":40.3,"priceIncrement":0.01,"quote":null,"quoteV":73851918.483,"restricted":false,"sizeIncrement":0.01,"type":"future",
So i got here \b(\w*-PERP\w*)\b
This grabs the word HELP-PERP but duplicates it, so i'm trying to grab that word that matches the type =future.
Grab help-perp that is in the same line with type":"future".
Total nub at this, i've tried several things on regex101 and can't come up :(
Thank you
You can use
/\w*-PERP\w*\b(?=.*type":"future")/g
See the regex demo.
Details
\w*-PERP\w* - zero or more word chars, -PERP, and again zero or more chars
\b - a word boundary
(?=.*type":"future") - a positive lookahead that matches a location in string that is immediately followed with any zero or more chars other than line break chars as many as possible (.*) and then a type":"future" string.

How to fix regex to match the whole word, and not a substring? [duplicate]

This question already has answers here:
Regex.Match whole words
(4 answers)
Regex match entire words only
(7 answers)
Bash regex finding particular words in a sentence
(4 answers)
Closed 1 year ago.
I haven't found any success in fixing this regular expression:
B..y
I am currently searching a text file, its output are the following:
Baby
Babylon
Babyland
eBaby
What should I change in the expression to only output 'Baby' and exclude the other three?
EDIT: What if I have another entry - 'Blay'? I need to get 'Baby' and 'Blay'.
The regex:
\bBaby\b
Test here.
To find both 'Baby' and 'Blay', you need to update the regex to:
\b(Baby|Blay)\b
Test here.
Explanations:
From here about \b:
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
From here about (Baby|Blay) :
If you want to search for the literal text cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog. If you want more options, simply expand the list: cat|dog|mouse|fish.
The alternation operator has the lowest precedence of all regex operators. That is, it tells the regex engine to match either everything to the left of the vertical bar, or everything to the right of the vertical bar. If you want to limit the reach of the alternation, you need to use parentheses for grouping. If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary. If we had omitted the parentheses then the regex engine would have searched for a word boundary followed by cat, or, dog followed by a word boundary.
In Addition to the Answer of virolino:
The Regex Metacharacter \b matches word boundaries, i.e. between two characters, where one is a word character and the other is not is a word character, plus the start and the end of the string, if the first character (or last respectively) is a word character.
A word character is a match to the \w character class - there seems to be no real consent about what a word character actually is, but [A-Za-z0-9_] seems to be the minimum, hence your example should work with virolinos pattern (\bBaby\b) in any case.
Furthermore the pattern match the following strings
Baby-Boomer
Baby.Feed();
See my fork of virolinos regex test.

RegEx for combining "match everything" and "negative lookahead" [duplicate]

This question already has answers here:
RegExp exclusion, looking for a word not followed by another
(3 answers)
Closed 3 years ago.
I'm trying to match the string "this" followed by anything (any number of characters) except "notthis".
Regex: ^this.*(?!notthis)$
Matches: thisnotthis
Why?
Even its explanation in a regex calculator seems to say it should work. The explanation section says
Negative Lookahead (?!notthis)
Assert that the Regex below does not match
notthis matches the characters notthis literally (case sensitive)
The negative lookahead has no impact in ^this.*(?!notthis)$ because the .* will first match until the end of the string where notthis is not present any more at the end.
I think you meant ^this(?!notthis).*$ where you match this from the start of the string and then check what is directly on the right can not be notthis
If that is the case, then match any character except a newline until the end of the string.
^this(?!notthis).*$
Details of the pattern
^ Assert start of the string
this Match this literally
(?!notthis)Assert what is directly on the right is notnotthis`
.* Match 0+ times any char except a newline
$ Assert end of the string
Regex demo
If notthis can not be present in the string instead of directly after this you could add .* to the negative lookahead:
^this(?!.*notthis).*$
^^
Regex demo
See it in a regulex visual
Because of the order of your rules. Before your expression would get to negative lookahead, prior rules has been fulfilled, there is nothing left to match.
If you wish to match everything after this, except for notthis, this RegEx might also help you to do so:
^this([\s\S]*?)(notthis|())$
which creates an empty group () for nothing, with an OR to ignore notthis:
^this([\s\S]*?)(notthis|())$
You might remove (), ^ and $, and it may still work:
this([\s\S]*?)(notthis|)

what is wrong with my word boundary regex? [duplicate]

This question already has answers here:
Regex using word boundary but word ends with a . (period)
(4 answers)
Closed 2 years ago.
I have the following little Python script:
import re
def main ():
thename = "DAVID M. D.D.S."
theregex = re.compile(r"\bD\.D\.S\.\b")
if re.search(theregex, thename):
print ("you did it")
main ()
It's not matching. But if I adjust the regex just slightly and remove the last . it does work, like this:
\bD\.D\.S\b
I feel I'm pretty good at understanding regexes, but this has be baffled. My understanding of \b (word boundary) should be the a zero width match of non alpha-numeric (and underscore). So I would expect
"\bD\.D\.S\.\b"
to match:
D.D.S.
What am I missing?
This doesn't do what you might think it does.
r"\bD\.D\.S\.\b"
Here is an explanation of that regex, with the same examples that are listed below:
D.D.S. # no match, as there is no word boundary after the final dot
D.D.S.S # matches since there is a word boundary between `.` and `S` at the end
Word boundaries are zero-width matchers between word characters (\w, which is [0-9A-Za-z_] plus other "letters" as defined by your locale) and non-word characters (\W, which is the inversion of the previous class). Dot (.) is not a word character, so D.D.S. (note trailing whitespace) has word boundaries (only!) in the following places: \bD\b.\bD\b.\bS\b. (I didn't escape the dots because I'm illustrating the word boundaries, not making a regular expression).
I assume you are trying to match a end of line or whitespace. There are two ways to do that:
r"\bD\.D\.S\.(?!\S)" # by negation: do not match a non-whitespace
r"\bD\.D\.S\.(?:\s|$)" # match either a whitespace character or end of line
I've refined the above regex explanation link to explain the negation example above (note the first ends in …/1 while the second ends in …/2; feel free to further experiment there, it is nice and interactive).
\.\b matches .bla - checks for word character after .
\.\B the opposite matches bla. but not bla.bla - checks for non word after .
\bD\.D\.S\.\B

Regex match all strings except

I have to create a regex that finds next strings:
stackoverflow //not found
stackexchange //found 'stackexchange'
stacksomething //found 'stacksomething '
stacksomething another words //find 'stacksomething '
Explanation: find a string that:
starts with 'stack'
ends with any word except 'overflow'
find only this word.
I created regex which corresponds to the first point, but can't implement the second and the third. I tried solutions like ^((?!overflow).)*$ and ^(?!.*\boverflow\b) but they don't work. That's what I have:
stack.*
You can use this negative lookahead regex:
\bstack(?!\w*overflow\b)\w*
RegEx Demo
Breakup:
\b # word boundary
stack # match literal text stack
(?!\w*overflow\b) # negative lookahead to fail the match if word ends with overflow
\w* # match 0 or more word characters to get full word match