I have regex something like below:
f04((?!z).)*
Requirements :
1.)f04 matches the characters f04 literally (case sensitive)
2.) Assert that the Regex below does not match, z matches the character z literally (case sensitive)
3.) . matches any character
What can be the other possible way to write this particular regexp with the same requirements as above?
Here is one alternative:
f04(.*?)(?=z|$)
Demo
This pattern will match f04 followed by anything until either hitting the first letter z, or until hitting the end of the entire string, should a z never occur.
Your current approach uses a tempered dot, but both patterns should behave similarly.
Related
Right now I try to create an API documentation in Symfony 3 with the NelmioApiDocBundle. So far everything works as described in the given symfony documentation.
Now I'd like to remove the _error and _profiler routes from the swagger docs. It says you can just use path_patterns. So I need to write down all routes there which I need in the documentation. But I have quite some different pathes.
It would be cool to have the opportunity to create negative path patterns like
...
path_patterns:
- !^/_error
- !^/fubar
Is something like that possible?
Those are regex patterns so, yes you should be able to match any kind of pattern regex allows.
Check out "lookaround" zero-length assertions, specifically a Negative lookahead, and try something like below:
path_patterns:
- ^\/((?!_error)(?!fubar).)*$
Regex101 is an excellent tool for testing and understanding your regex. It will explain the impact of every part of the regex like so:
^ asserts position at start of a line
\/ matches the character / literally (case sensitive)
1st Capturing Group ((?!_error)(?!fubar).)*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
Negative Lookahead (?!_error)
Assert that the Regex below does not match
_error matches the characters _error literally (case sensitive)
Negative Lookahead (?!fubar)
Assert that the Regex below does not match
fubar matches the characters fubar literally (case sensitive)
. matches any character (except for line terminators)
$ asserts position at the end of a line
Given the following string:
one.two.three.four
How do I match/capture which results in the following in one go:
one
one.two
one.two.three
(if it's possible at all)
You can use this:
(?=(^|(?<=[.]))([\w.]+))
This will perform a non-width look ahead, it means that the string will be iterated on character at the time and matching the pattern; inside it says:
Using a non-width look-behind:
is there the beginning of the string?
do i have a . behind the cursor?
Using a capture group, it will get the rest of the string that was not consumed yet.
(\w+)\.?
(\w+) matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed
\.? Quantifier — matches the character . literally (case sensitive)
if your characters are lowercased alphabets. then try this. ([a-z]+)\.?
Not quite sure how to go about this, but basically what I want to do is match a character, say a for example. In this case all of the following would not contain matches (i.e. I don't want to match them):
aa
aaa
fooaaxyz
Whereas the following would:
a (obviously)
fooaxyz (this would only match the letter a part)
My knowledge of RegEx is not great, so I am not even sure if this is possible. Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
Basically what I want to do is match any single a that has any other non a character around it (except for the start and end of the string).
^[^\sa]*\Ka(?=[^\sa]*$)
DEMO
\K discards the previously matched characters and lookahead assertes whether a match is possibel or not. So the above matches only the letter a which satifies the conditions.
OR
a{2,}(*SKIP)(*F)|a
DEMO
You may use a combination of a lookbehind and a lookahead:
(?<!a)a(?!a)
See the regex demo and the regex graph:
Details
(?<!a) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a a char
a - an a char
(?!a) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a a char.
You need two things:
a negated character class: [^a] (all except "a")
anchors (^ and $) to ensure that the limits of the string are reached (in other words, that the pattern matches the whole string and not only a substring):
Result:
^[^a]*a[^a]*$
Once you know there is only one "a", you can use the way you want to extract/replace/remove it depending of the language you use.
I'm really struggling to put a label on this, which is probably why I was unable to find what I need through a search.
I'm looking to match the following:
Auto Reply
Automatic Reply
AutomaticReply
The platform that I'm using doesn't allow for the specification of case-insensitive searches. I tried the following regular expression:
.*[aA]uto(?:matic)[ ]*[rR]eply.*
Thinking that (?:matic) would cause my expression to match Auto or Automatic. However, it is only matching Automatic.
What am I doing wrong?
What is the proper terminology here?
This is using Perl for the regular expression engine (I think that's PCRE but I'm not sure).
(?:...) is to regex patterns as (...) is to arithmetic: It simply overrides precedence.
ab|cd # Matches ab or cd
a(?:b|c)d # Matches abd or acd
A ? quantifier is what makes matching optional.
a? # Matches a or an empty string
abc?d # Matches abcd or abd
a(?:bc)?d # Matches abcd or ad
You want
(?:matic)?
Without the needless leading and trailing .*, we get the following:
/[aA]uto(?:matic)?[ ]*[rR]eply/
As #adamdc78 points out, that matches AutoReply. This can be avoided as using the following:
/[aA]uto(?:matic[ ]*|[ ]+)[rR]eply/
or
/[aA]uto(?:matic|[ ])[ ]*[rR]eply/
This should work:
/.*[aA]uto(?:matic)? *[rR]eply/
you were simply missing the ? after (?:matic)
[Aa]uto(?:matic ?| )[Rr]eply
This assumes that you do not want AutoReply to be a valid hit.
You're just missing the optional ("?") in the regex. If you're looking to match the entire line after the reply, then including the .* at the end is fine, but your question didn't specify what you were looking for.
You can use this regex with line start/end anchors:
^[aA]uto(?:matic)? *[rR]eply$
Explanation:
^ assert position at start of the string
[aA] match a single character present in the list below
aA a single character in the list aA literally (case sensitive)
uto matches the characters uto literally (case sensitive)
(?:matic)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed
[greedy]
matic matches the characters matic literally (case sensitive)
* matches the character literally
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
[rR] match a single character present in the list below
rR a single character in the list rR literally (case sensitive)
eply matches the characters eply literally (case sensitive)
$ assert position at end of the string
Slightly different. Same result.
m/([aA]uto(matic)? ?[rR]eply)/
Tested Against:
Some other stuff....
Auto Reply
Automatic Reply
AutomaticReply
Now some similar stuff that shouldn't match (auto).
I have this regex:
(?:\S)\++(?:\S)
Which is supposed to catch all the pluses in a query string like this:
?busca=tenis+nike+categoria:"Tenis+e+Squash"&pagina=4&operador=or
It should have been 4 matches, but there are only 3:
s+n
e+c
s+e
It is missing the last one:
e+S
And it seems to happen because the "e" character has participated in a previous match (s+e), because the "e" character is right in the middle of two pluses (Teni s+e+S quash).
If you test the regex with the following input, it matches the last "+":
?busca=tenis+nike+categoria:"Tenis_e+Squash"&pagina=4&operador=or
(changed "s+e" for "s_e" in order not to cause the "e" character to participate in the match).
Would someone please shed a light on that?
Thanks in advance!
In a consecutive match the search for the next match starts at the position of the end of the previous match. And since the the non-whitespace character after the + is matched too, the search for the next match will start after that non-whitespace character. So a sequence like s+e+S you will only find one match:
s+e+S
\_/
You can fix that by using look-around assertions that don’t match the characters of the assumption like:
\S\++(?=\S)
This will match any non-whitespace character followed by one or more + only if it is followed by another non-whitespace character.
But tince whitespace is not allowed in a URI query, you don’t need the surrounding \S at all as every character is non-whitespace. So the following will already match every sequence of one or more + characters:
\++
You are correct: The fourth match doesn't happen because the surrounding character has already participated in the previous match. The solution is to use lookaround (if your regex implementation supports it - JavaScript doesn't support lookbehind, for example).
Try
(?<!\s)\++(?!\s)
This matches one or more + unless they are surrounded by whitespace. This also works if the plus is at the start or the end of the string.
Explanation:
(?<!\s) # assert that there is no space before the current position
# (but don't make that character a part of the match itself)
\++ # match one or more pluses
(?!\s) # assert that there is no space after the current position
If your regex implementation doesn't support lookbehind, you could also use
\S\++(?!\s)
That way, your match would contain the character before the plus, but not after it, and therefore there will be no overlapping matches (Thanks Gumbo!). This will fail to match a plus at the start of the string, though (because the \S does need to match a character). But this is probably not a problem.
You can use the regex:
(?<=\S)\++(?=\S)
To match only the +'s that are surrounded by non-whitespace.