Regex: negative match on group of characters? - regex

I want to create a regular expression that will match all strings starting with 0205052I0 and then where the next two characters are not BB.
So I want to match:
0205052I0AAAAAA
0205052I0ACAAAA
0205052I0BCABAA
But not match:
0205052I0BBAA
How can I do this with PCRE regular expressions?
I've been trying $0205052I0^(BB) on https://regex101.com/ but it doesn't work.

You can use a negative look ahead :
"0205052I0(?!BB).*"
See demo https://regex101.com/r/mO6uV4/1
Also note that you have putted the anchors at a wrong position. If you want to use anchor you can use following regex
:
"^0205052I0(?!BB).*$"

Just in case: ^ is for NOT in character classes, only. E.g.: [^B]. In your case, you would need something like
0205052I0(B[^B]|[^B]B|[^B][^B])
for the described effect.
See it in action: RegEx 101
Which is rather cumbersome, though. The negative lookahead as suggested by #Kasra is by far the better option.
Still - if you actually wanted to capture the matched expression, you needed to add parentheses:
(0205052I0(?:B[^B]|[^B]B|[^B][^B]).*)
or -again- better (in the sense of readability/extensibility/maintainability)
(0205052I0(?!BB).*)
RegEx 101
But if you want to keep the strings, which do not contain the BB, you might be better off, to match these and to replace them with nothing: (0205052I0(?=BB).*)
RegEx 101
Your sample strings having leading blanks, I didn't add anchors into the picture...
However, talking of anchors: $ is for end of line - but not for line break as your attempt might be read...
Please comment, if and as this requires adjustment / further detail.

Related

Regex for selecting words ending in 'ing' unless

I want to select words ending in with a regular expression, but I want exclude words that end in thing. For example:
everything
running
catching
nothing
Of these words, running and catching should be selected, everything and nothing should be excluded.
I've tried the following:
.+ing$
But that selects everything. I'm thinking look aheads/look arounds could be the solution, but I haven't been able to get one that works.
Solutions that work in Python or R would be helpful.
In python you can use negative lookbehind assertion as this:
^.*(?<!th)ing$
RegEx Demo
(?<!th) is negative lookbehind expression that will fail the match if th comes before ing at the end of string.
Note that if you are matching words that are not on separate lines then instead of anchors use word boundaries as:
\w+(?<!th)ing\b
Something like \b\w+(?<!th)ing\b maybe.
You might also use a negative lookahead (?! to assert that what is on the right is not 0+ times a word character followed by thing and a word boundary:
\b(?!\w*thing\b)\w*ing\b
Regex demo | Python demo

Regular Expression Should not start with a character and contain a sequence

For example, should not start with h and should contain ap.
Should match apology, rap god, trap but not match happy.
I tried
^[^h](ap)*
but it doesn't match sequences which start with ap like apology.
You may use
^(?!h).*ap
See the following demo. To match the whole string to the end, append .* at the end:
^(?!h).*ap.*
If you plan to only match words following the rules you outlined, you may use
\b(?!h)\w*ap\w*
Or, without a lookahead:
\b([^\Wh]\w*)?ap\w*
See this regex demo and the demo without a lookahead.
#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string).
For completeness, you can also use alternation:
^(?:[^h].*ap|ap).*
Demo: https://regex101.com/r/ecVTGm/1

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks
You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo
Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.
Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

Regex: optimal syntax for optional combined expression?

I want to match a combination of expressions that is optional. In this specific example, I want to match on the word through. Also, if the words run or swim precede through (with whitespace) then match on the whole phrase. So that combination of expressions preceding through must be optional.
I want all the following lines to be positive matches:
swim through <-- match entire phrase
jump through <-- match entire phrase
hike through <-- match only the word "through"
To do this, I can use the following expression:
(jump\W|swim\W)?through
However, is it possible to accomplish the same thing without having to add \W after jump and swim? I was trying something like this:
(jump|swim)?\W?through
But that wasn't working properly because it would include the space that precedes through on the 3rd example. I only want the word through, not the whitespace around it.
What about this one: (?:(jump|swim)\W)?through