Regex to match a pattern but not include another pattern - regex

I'd like a regex to match thank-you but exclude it when the string contains the word removals.
So contact-thank-you should return a positive but removals/contact-thank-you should return a negative
I don't know much about regex and found a couple of posts refering to negative lookaheads. The best i could come up with was
(?!(?:removals)).*thank-you
which is clearly rubbish. Could anyone help?
Thanks

What you ideally want in this case is a negative lookbehind, since you're looking "behind" (to the left) of the word you're matching to make sure something's not there.
A complication here is that many regex engines don't permit variable-width negative-lookbehinds.
But if you can anchor to the start of the string you want to match somehow, then you can use lookahead from that anchor, instead.
(?:\s|^)((?!removals)\S)+thank-you(?:\s|$)
bananas/fred-thank-you - MATCH.
bananas/fred-no-thank-you - MATCH.
bananas/thank-you-with-words-after - no match.
removals/fred-thank-you - no match.
non-removals/fred-thank-you - no match.
bananas/removals-thank-you - no match.
bananas/thank-you-supremovalsale - no match.
bananas/fred-sorry - no match.
I am presuming that the characters permitted in the string are "anything but whitespace".
So it starts out by looking for either the beginning of the string, or some whitespace; then any number of non-whitespace \S characters that aren't the beginning of the string "removals"; then the string "thank-you".
But I suspect what you're actually looking for is something a little different, maybe something like:
^(?!removals\/)\w+\/[-\w]*thank-you$
bananas/fred-thank-you - MATCH.
bananas/fred-no-thank-you - MATCH.
bananas/thank-you-with-words-after - no match.
removals/fred-thank-you - no match.
non-removals/fred-thank-you - MATCH.
bananas/removals-thank-you - MATCH.
bananas/thank-you-supremovalsale - no match.
bananas/fred-sorry - no match.
This assumes that the structure is very fixed: to include anything that ends "/blah-blah-thank-you", unless the first word is exactly "removals/". Without knowing the exact specification, though, the first seems the most likely to be helpful.
If you're not trying to extract this string from many others, but are just checking a URL to see if it matches this pattern, then you can simplify it a lot:
^(?!.*removals).*thank-you
bananas/fred-thank-you - MATCH.
bananas/fred-no-thank-you - MATCH.
bananas/thank-you-with-words-after - MATCH.
removals/fred-thank-you - no match.
non-removals/fred-thank-you - no match.
bananas/removals-thank-you - no match.
bananas/thank-you-supremovalsale - no match.
bananas/fred-sorry - no match.
This just matches any string that has "thank-you", and not "removals".

Related

regex to ignore string ends with specific pattern

I am trying to write regex which should ignore any string ends with _numbers like (_1234)
like below
abc_def_1234 - should not match
abc_fgh - match
abc_ghj - match
abc_ijk_2345 - not match
I am trying to use lookahead regex like below, but it's matching everything. Can someone please help me how I can achieve this?
\w+(?!_\d+)
Match words separated by underscores, but use a negative look ahead to exclude input that has the unwanted tail:
^(?!.*_\d+$)\w+(?<!_)$
See live demo.
The last look behind (which you can remove) is there to require that the last char is not an underscore - ie that the input is AFAICT well formed.

How to find a particular string

Im using Visual Studio 2017 and in a long long text file Im searching for a particular function but unable to find
here's what the regex Im using
c\.CreateMap\<(\w)+\,\s+Address\>
and I want to in these
c.CreateMap<ClientAddress, Address>()
c.CreateMap<Responses.SiteAddress, Data.Address>()
and so on.
As soon as I add "Address" in the regex it stops matching any.
what am I doing wrong?
You can try this
c\.CreateMap\<\w+\.?\w+?\,\s*\w*?\.?Address\>
Explanation
c\.CreateMap\< - Matches c\.CreateMap\<.
\w+ - Matches any word character one or more time.
\.? - Matches '.' zero or one time.
\, - Matches ','.
\s* - Matches space zero or more time.
\w - Matches word character zero or more time.
\.? - Matches '.' zero or one time.
Address\> - Matches Address\>.
Demo
P.S- In case you also want to match something like this.
c.CreateMap<Responses.SiteAddress.abc, Data.Address.xyz>()
You can use this.
c\.CreateMap\<(\w+\.?\w+?)*\,\s*(?:\w*?\.?)*Address(\.\w*)?\>
Demo
Here is general regex I can suggest:
c\.CreateMap\<[\w.]+,\s+(?:[\w.]+\.)?Address\>\s*\(\s*\)
This will match any term with dots or word characters in the first position in the diamond. In the second, position, it will match Address, or some parent class names, followed by a dot separator, followed by Address.
Demo
Note that I also include the empty function call parentheses in the regex. As well, I allow for flexibility in the whitespace may appear after the diamond, or between the parentheses.
In your second example, you have extra dot which is not handled. Your regex needs little modification. Also, you don't need to escape < or > or , Use this,
c\.CreateMap<([\w.])+,\s+[\w.]*Address>
Demo
To match any of the functions on your question, you can use:
c\.CreateMap[^)]+\)
Regex Demo
Regex Explanation:

Regex to ignore Cobol comment line

I'd like to use regex to scan a few Cobol files for a specific word but skipping comment lines. Cobol comments have an asterisk on the 7. column. The regex i've gotten so far using a negative lookbehind looks like this:
^(?<!.{6}\*).+?COPY
It matches both lines:
* COPY
COPY
I would assume that .+? overrides the negative lookbehind somehow, but i'm stuck on how to correct this. What would i need to fix to get a regex that only matches the second line?
You may use a lookahead instead of a lookbehind:
^(?!.{6}\*).+?COPY
See the regex demo.
The lookbehind required some pattern to be absent before the start of the string, and thus was redundant, it always returned true. Lookaheads check for a pattern that is to the right of the current location.
So,
^ - matches the start of the string
(?!.{6}\*) - fails the match if there are any 6 chars followed with * from the start of the string (replace . with a space if you need to match just spaces)
.+? - matches any 1+ chars, as few as possible, up to the first
COPY -COPY substring.
If you want to filter out EVERY comment you could use:
^ {6}(?!\*)
That will match only lines starting with spaces that DOES NOT have an '*' at the 7th position.
COBOL can use the position 1-6 for numbering the lines, so may be safter to just use:
^.{6}(?!\*).*$

RegEx match sequence of three strings along with text inbetween

I'm trying to get a regular expression to match something inbetween two strings that includes a third. I'm having trouble getting the lazy quantifier to cooperate, as there are multiple instances of these strings in the input and the RegEx matches something that is not useful, i.e.:
Start...End...Start...End...Start...Middle...End
Whet I'm actually looking for (only one instance of Start and End for each match):
Start...Middle...End or Start...Center...End
I'm pretty sure I need to use lookahead/lookbehind, but while I do conceptually understand them, putting them into practice is really difficult. Here's where I'm at:
/<Start[\s\S]*?(Middle|Center)[\s\S]*?End>/gm
Make use of the tempered greedy token:
Start(?:(?!Start|End)[\s\S])*?(Middle|Center)[\s\S]*?End
^^^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details
Start - a literal string
(?:(?!Start|End)[\s\S])*? - any char, 0+ repetitions, as few as possible, that is not a starting point of Start or End sequence
(Middle|Center) - Group 1: Middle or Center
[\s\S]*? - any 0+ chars, as few as possible
End - a literal string

what pattern to get substring using regexp

I have following two strings, how can I get the numbers in them?, ie 233100 and 233800
QA-Ki-233100
QA-Ki-233800-win-vc8-x86-release
This is the pattern I have, but not work.
oRegexp.Pattern = "QA-Ki-\--[\Z]"
Thanks for your help.
This should do:
(?<=-)\d+(?=-|$)
or simply (in this case),
\b\d+\b
In (?<=-)\d+(?=-|$) we have used positive lookbehind and lookahead to make sure only - prepends and either - or the end of the line ($) appends our desired substring \d+ (the numbers between those).
In \b\d+\b, the - and $ both fell under the category of word boundary \b so the regex becomes shorter.
Check: https://regex101.com/r/nL9nR1/1