Regular Expression Should not start with a character and contain a sequence

Regular Expression Should not start with a character and contain a sequence - regex

For example, should not start with h and should contain ap.
Should match apology, rap god, trap but not match happy.
I tried
^[^h](ap)*
but it doesn't match sequences which start with ap like apology.

You may use
^(?!h).*ap
See the following demo. To match the whole string to the end, append .* at the end:
^(?!h).*ap.*
If you plan to only match words following the rules you outlined, you may use
\b(?!h)\w*ap\w*
Or, without a lookahead:
\b([^\Wh]\w*)?ap\w*
See this regex demo and the demo without a lookahead.

#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string).
For completeness, you can also use alternation:
^(?:[^h].*ap|ap).*
Demo: https://regex101.com/r/ecVTGm/1

Related

Regex for selecting words ending in 'ing' unless

I want to select words ending in with a regular expression, but I want exclude words that end in thing. For example:
everything
running
catching
nothing
Of these words, running and catching should be selected, everything and nothing should be excluded.
I've tried the following:
.+ing$
But that selects everything. I'm thinking look aheads/look arounds could be the solution, but I haven't been able to get one that works.
Solutions that work in Python or R would be helpful.

In python you can use negative lookbehind assertion as this:
^.*(?<!th)ing$
RegEx Demo
(?<!th) is negative lookbehind expression that will fail the match if th comes before ing at the end of string.
Note that if you are matching words that are not on separate lines then instead of anchors use word boundaries as:
\w+(?<!th)ing\b

Something like \b\w+(?<!th)ing\b maybe.

You might also use a negative lookahead (?! to assert that what is on the right is not 0+ times a word character followed by thing and a word boundary:
\b(?!\w*thing\b)\w*ing\b
Regex demo | Python demo

Regex: negative match on group of characters?

I want to create a regular expression that will match all strings starting with 0205052I0 and then where the next two characters are not BB.
So I want to match:
0205052I0AAAAAA
0205052I0ACAAAA
0205052I0BCABAA
But not match:
0205052I0BBAA
How can I do this with PCRE regular expressions?
I've been trying $0205052I0^(BB) on https://regex101.com/ but it doesn't work.

You can use a negative look ahead :
"0205052I0(?!BB).*"
See demo https://regex101.com/r/mO6uV4/1
Also note that you have putted the anchors at a wrong position. If you want to use anchor you can use following regex
:
"^0205052I0(?!BB).*$"

Just in case: ^ is for NOT in character classes, only. E.g.: [^B]. In your case, you would need something like
0205052I0(B[^B]|[^B]B|[^B][^B])
for the described effect.
See it in action: RegEx 101
Which is rather cumbersome, though. The negative lookahead as suggested by #Kasra is by far the better option.
Still - if you actually wanted to capture the matched expression, you needed to add parentheses:
(0205052I0(?:B[^B]|[^B]B|[^B][^B]).*)
or -again- better (in the sense of readability/extensibility/maintainability)
(0205052I0(?!BB).*)
RegEx 101
But if you want to keep the strings, which do not contain the BB, you might be better off, to match these and to replace them with nothing: (0205052I0(?=BB).*)
RegEx 101
Your sample strings having leading blanks, I didn't add anchors into the picture...
However, talking of anchors: $ is for end of line - but not for line break as your attempt might be read...
Please comment, if and as this requires adjustment / further detail.

Regex to match certain word but not a particular combination

I have 15 titles as follows:
fruits-and-flowers-themeA
fruits-and-flowers-themeB
fruits-and-flowers-just-test-themeA
themeAfruitsandflowers
nice-fruits-and-flowers-themeA
botanical-names-themeA
I want a regex to help me get only those titles with "themeA" in them, but it should not include "nice" and not include "just-test" or "just-tests".
I tried
^(?!.*just-test|*just-tests|nice).*?(?:themeA).*,
but I still get fruits-and-flowers-just-test-themeA in the output.
How to fix this?
Thanks

You can use this regex with negative lookahead:
^(?!.*?(?:just-tests?|nice)).*?themeA.*$
Working Demo

Option 1
You can use a single regex with lookaheads (see online demo):
^(?!.*nice?)(?!.*just-tests?).*themeA.*
The ^ asserts that the match starts at the beginning of the string (so we don't match a subset of the string
The (?!.*nice?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by nice
The (?!.*just-tests?) is a negative lookahead that asserts that at this position in the string, we cannot find any characters followed by just-test and an optional s
As a further tweak, you can compress the lookaheads into one using an | alternation as in anubhava's answer.
Option 2 without lookaheads (Perl, PHP/PCRE)
^(?:.*(?:nice|just-tests?).*)(*SKIP)(?!)|.*themeA.*
This one doesn't use lookaheads but just skips the unwanted titles. See demo.

Use two different regular expressions for clarity and simplicity.
Match your string against one regex that matches themeA:
/themeA/
and then check that the string does NOT match the one you don't want:
/nice|just-tests?/
Doing it in two different regexes makes it far easier to understand and maintain.

regex optional lookahead

I want a regular expression to match all of these:
startabcend
startdef
blahstartghiend
blahstartjklendsomething
and to return abc, def, ghi and jkl respectively.
I have this the following which works for case 1 and 3 but am having trouble making the lookahead optional.
(?<=start).*(?=end.*)
Edit:
Hmm. Bad example. In reality, the bit in the middle is not numeric, but is preceeded by a certain set of characters and optionally succeeded by it. I have updated the inputs and outputs as requested and added a 4th example in response to someones question.

If you're able to use lookahead,
(?<=start).*?(?=(?:end|$))
as suggested by stema below is probably the simplest way to get the entire pattern to match what you want.
Alternatively, if you're able to use capturing groups, you should just do that instead:
start(.*?)(?:end)?$
and then just get the value from the first capture group.

Maybe like this:
(?<=start).*?(?=(?:end|$))
This will match till "start" and "end" or till the end of line, additionally the quantifier has to be non greedy (.*?)
See it here on Regexr
Extended the example on Regexr to not only work with digits.

An optional lookahead doesn't make sense:
If it's optional then it's ok if it matches, but it's also ok if it doesn't match. And since a lookahead does not extend the match it has absolutely no effect.
So the syntax for an optional lookahead is the empty string.

Lookahead alone won't do the job. Try this:
(?<=start)(?:(?!end).)*
The lookbehind positions you after the word "start", then the rest of it consumes everything until (but not including) the next occurrence of "end".
Here's a demo on Ideone.com

if "end" is always going to be present, then use:
(?<=start)(.*?)(?=end) as you put in the OP. Since you say "make the lookahead optional", then just run up until there's "end" or the carriage return. (?<=start)(.*?)(?=end|\n). If you don't care about capturing the "end" group, you can skip the lookahead and do (?:start)?(.*?)(?:end)? which will start after "start", if it's there and stop before "end", if it's there. You can also use more of those piped "or" patterns: (?:start|^) and (?:end|\n).

Why do you need lookahead?
start(\d+)\w*
See it on rubular

How to negate the whole regex?

I have a regex, for example (ma|(t){1}). It matches ma and t and doesn't match bla.
I want to negate the regex, thus it must match bla and not ma and t, by adding something to this regex. I know I can write bla, the actual regex is however more complex.

Use negative lookaround: (?!pattern)
Positive lookarounds can be used to assert that a pattern matches. Negative lookarounds is the opposite: it's used to assert that a pattern DOES NOT match. Some flavor supports assertions; some puts limitations on lookbehind, etc.
Links to regular-expressions.info
Lookahead and Lookbehind Zero-Width Assertions
Flavor comparison
See also
How do I convert CamelCase into human-readable names in Java?
Regex for all strings not containing a string?
A regex to match a substring that isn’t followed by a certain other substring.
More examples
These are attempts to come up with regex solutions to toy problems as exercises; they should be educational if you're trying to learn the various ways you can use lookarounds (nesting them, using them to capture, etc):
codingBat plusOut using regex
codingBat repeatEnd using regex
codingbat wordEnds using regex

Assuming you only want to disallow strings that match the regex completely (i.e., mmbla is okay, but mm isn't), this is what you want:
^(?!(?:m{2}|t)$).*$
(?!(?:m{2}|t)$) is a negative lookahead; it says "starting from the current position, the next few characters are not mm or t, followed by the end of the string." The start anchor (^) at the beginning ensures that the lookahead is applied at the beginning of the string. If that succeeds, the .* goes ahead and consumes the string.
FYI, if you're using Java's matches() method, you don't really need the the ^ and the final $, but they don't do any harm. The $ inside the lookahead is required, though.

\b(?=\w)(?!(ma|(t){1}))\b(\w*)
this is for the given regex.
the \b is to find word boundary.
the positive look ahead (?=\w) is here to avoid spaces.
the negative look ahead over the original regex is to prevent matches of it.
and finally the (\w*) is to catch all the words that are left.
the group that will hold the words is group 3.
the simple (?!pattern) will not work as any sub-string will match
the simple ^(?!(?:m{2}|t)$).*$ will not work as it's granularity is full lines

This regexp math your condition:
^.*(?<!ma|t)$
Look at how it works:
https://regex101.com/r/Ryg2FX/1

Apply this if you use laravel.
Laravel has a not_regex where field under validation must not match the given regular expression; uses the PHP preg_match function internally.
'email' => 'not_regex:/^.+$/i'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression Should not start with a character and contain a sequence - regex

For example, should not start with h and should contain ap. Should match apology, rap god, trap but not match happy. I tried ^[^h](ap)* but it doesn't match sequences which start with ap like apology.

#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string). For completeness, you can also use alternation: ^(?:[^h].ap|ap). Demo: https://regex101.com/r/ecVTGm/1

Related

Regex for selecting words ending in 'ing' unless

Regex: negative match on group of characters?

Regex to match certain word but not a particular combination

regex optional lookahead

How to negate the whole regex?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular Expression Should not start with a character and contain a sequence - regex

For example, should not start with h and should contain ap. Should match apology, rap god, trap but not match happy. I tried ^[^h](ap)* but it doesn't match sequences which start with ap like apology.

#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string). For completeness, you can also use alternation: ^(?:[^h].*ap|ap).* Demo: https://regex101.com/r/ecVTGm/1

Related

Regex for selecting words ending in 'ing' unless

Regex: negative match on group of characters?

Regex to match certain word but not a particular combination

regex optional lookahead

How to negate the whole regex?

Categories

Resources

#WiktorStribiżew's comment with negative lookahead is correct (you might want to add .* to it if you want to match the whole string). For completeness, you can also use alternation: ^(?:[^h].ap|ap). Demo: https://regex101.com/r/ecVTGm/1