Right now I try to create an API documentation in Symfony 3 with the NelmioApiDocBundle. So far everything works as described in the given symfony documentation.
Now I'd like to remove the _error and _profiler routes from the swagger docs. It says you can just use path_patterns. So I need to write down all routes there which I need in the documentation. But I have quite some different pathes.
It would be cool to have the opportunity to create negative path patterns like
...
path_patterns:
- !^/_error
- !^/fubar
Is something like that possible?
Those are regex patterns so, yes you should be able to match any kind of pattern regex allows.
Check out "lookaround" zero-length assertions, specifically a Negative lookahead, and try something like below:
path_patterns:
- ^\/((?!_error)(?!fubar).)*$
Regex101 is an excellent tool for testing and understanding your regex. It will explain the impact of every part of the regex like so:
^ asserts position at start of a line
\/ matches the character / literally (case sensitive)
1st Capturing Group ((?!_error)(?!fubar).)*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
Negative Lookahead (?!_error)
Assert that the Regex below does not match
_error matches the characters _error literally (case sensitive)
Negative Lookahead (?!fubar)
Assert that the Regex below does not match
fubar matches the characters fubar literally (case sensitive)
. matches any character (except for line terminators)
$ asserts position at the end of a line
Related
I am developing an app with markdown capabilities, so I am building a lexer to handle this. I am fairly new to Flutter and have little experience with Regex in general.
Essentially there is a difference between *text*, **text**, and ***text***.
My expressions right now are:
r"\B\*[A-Za-z0-9 ]+\*\B"
r"\B\*{2}[A-Za-z0-9 ]+\*{2}\B"
r"\B\*{3}[A-Za-z0-9 ]+\*{3}\B"
The issue is that the first expression is matching the other two. **text*** will get matched also with the second expression. Does anyone know how to solve this?
It looks like you could use:
(?<!\S)(\*{1,3})[A-Za-z0-9 ]+\1(?!\S)
See an online demo
(?<!\S) - Assert position is not preceded by anything that is not a whitespace char;
(\*{1,3}) - Match 1-3 asterisk characters;
[A-Za-z0-9 ]+ - Match 1+ characters from given character class;
\1 - Backreference what is matched in 1st group;
(?!\S) - Assert position is not followed by anything other than whitespace char.
Note that if you'd remove the final negative lookahead you could also match **text** in **test*** if that is what you were after. Or even remove the leading negative lookbehind to match **text** in ****text** test
I have a conditional lookahead regex that tests to see if there is a number substring at the end of a string, and if so match for the numbers, and if not, match for another substring
The string in question: "H2K 101"
If just the lookahead is used, i.e. (?=\d{1,8}$)(\d{1,8}$), the lookahead succeeds, and "101" is found in capture group 1
When the lookahead is placed into a conditional, i.e. (?(?=\d{1,8}\z)(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+)), the lookahead now fails, and the second pattern is used, matching "H2K", and a "2" is found in capture group 2.
If the test string has the "2" swapped for a letter, i.e. "HKK 101"
then the lookahead conditional works as expected, and the number "101" is once again found in capture group 1.
I've tested this in Regex101 and other PCRE engines, and all work the same, so clearly I'm missing something obvious about conditionals or the condition regex I'm using. Any insight greatly appreciated.
Thanks.
The look ahead starts at the current position, so initially it fails, and the alternative is used -- where it finds a match at the current position.
If you want the look ahead to succeed when still at the initial position, you need to allow for the intermediate characters to occur. Also, when the alternative kicks in, realise that there can follow a second match that still uses the look ahead, but now at a position where the look ahead is successful.
From what I understand, you are interested in one match only, not two consecutive matches (or more). So that means you should attempt to match the whole string, and capture the part of interest in a capture group. Also, the look ahead should be made to succeed when still at the initial position. This all means you need to inject several .*. There is no need for a conditional.
(?=.*\d{1,8}\z).*?(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+).*
Note also that (?=.*\d{1,8}\z) succeeds if and only when (?=.*\d\z) succeeds, so you can simplify that:
(?=.*\d\z).*?(\d{1,8}\z)|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+).*
There are two capture groups. It there is a match, exactly one of the capture groups will have a non-empty matching content, which is the content you need.
You want to match a number of specific length at the end of the string, and if there is none, match something else.
There is no need for a conditional here. Conditional patterns are necessary to examine what to match next at the given position inside the string based either on a specific group match or a lookaround test. They are not useful when you want to give priority to a specific pattern.
Here, you can use a PCRE pattern based on the \K operator like
.*?\K\d{1,8}\z|[a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+
Or, using capturing groups
(?|.*?(\d{1,8})\z|([a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+))
See the regex demo #1 and regex demo #2.
Details:
.*?\K\d{1,8}$ - any zero or more chars other than line break chars, as few as possible, then the match reset operator that discards the text matched so far, then one to eight digits at the end of string
| - or
[a-zA-Z]+[\d_-]{1,8}[a-zA-Z]+ - one or more letters, 1-8 digits, underscores or hyphens, and then one or more letters.
And
(?| - start of the branch reset group:
.*? - any zero or more chars other than line break chars, as few as possible
(\d{1,8}) - Group 1: one to eight digits
\z - end of string
| - or
( - Group 1 start:
[a-zA-Z]+ - one or more ASCII letters
[\d_-]{1,8} - one to eight digits, underscores, hyphens
[a-zA-Z]+ - one or more ASCII letters
) - Group 1 end
) - end of the group.
I have the following languages or language locale codes in a URL and i am trying to identify through REGEX. I was partially successful in identifying them but it is failing for some scenarios
Languages that i am testing with
en-us -- Passes
us -- Fails
Here is the REGEX that i have
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}\/)c\/(deals-and-tips\/)?
For instance:
https://forum.leasehackr.com/en-us/c/deals-and-tips (passes)
https://forum.leasehackr.com/us/c/deals-and-tips (fails)
What am I missing in the above REGEX?
The regex you wanted is:
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2})\/c\/(deals-and-tips\/)?
The difference from your regex is that I moved the first \/ from inside the parenthesis to outside (to sit with c\/).
Test here.
The last / fails the match in any case since your urls doesn't have it, in any way I would rewrite your regex as this: ([a-zA-Z]{2})(-[a-zA-Z]{2})?\/c\/(deals-and-tips)?.
This way it always looks for the first part (en) and consider the second (-us) as optional.
Alternatively use (\w{2})(-\w{2})?\/c\/(deals-and-tips)?, if you don't mind risking to match underscores and similar simbols
The reason your pattern does not match us is because the alternation ([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}\/) only matches the \/ in the second part of the alternation.
Also it does not match the last group with deals-and-tips because there is no trailing \/ in the example data.
Your updated pattern might look like
([a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2})\/c\/(deals-and-tips)?
Regex demo
You could shorten the pattern a bit by using an optional non capturing group (?:-[a-zA-Z]{2})? inside the first capturing group to optionally match the part starting with a hyphen.
As in the example data you could match the leading \/ in front of the capturing group to get a more efficient match.
\/([a-zA-Z]{2}(?:-[a-zA-Z]{2})?)\/c\/(deals-and-tips)?
In parts
\/ To be a bit more precise, match the leading /
( Capture group 1
[a-zA-Z]{2} Match 2 chars a-z
(?:-[a-zA-Z]{2})? Optionally match - and 2 chars a-z
) Close group
\/c\/ Match /c/deals-and-tips`
(deals-and-tips)? Optional capture group 2 match deals-and-tips
Regex demo
Note that if you use another delimiter than / you don't have to escape the forward slash.
I want to filter out the links from a group of links which does not contain a product word by using REGEX. The group of link is delimited by |.
I have the following regex to match with products word.
(https:\/\/(?:(?!\|).)*(products)(?:(?!\|).)*.(?=\||$))
When I tried to get the list of links which does not contain product word, it is not showing me any result.
(https:\/\/(?:(?!\|).)*(^products)(?:(?!\|).)*.(?=\||$))
Links are given below.
https://cdn.shopify.com/test/|https://cdn.shopify.com/s/products/Profile.jpg|https://cdn.shopify.com/p/products/1Profile.jpg?v=359|https://cdn.shopify.com/s/4/files/products/19front.jpg?v=453|https://cdn.shopify.com/g/p/Chart.jpg?v=1549402459|https://cdn.shopify.com/s/4/products/19back.jpg?v=453
Please let me what I am missing? I have tried !? and ^. with the same condition.
https://regex101.com/r/Ynj8ni/1
Why the pattern does not work
The pattern that you tried does not match because the first part after matching https:// https:\/\/(?:(?!\|).)* matches any char as what is directly on the right is not |
That will match until right before the first pipe, and the tries to match products at the start of the string ^products. But that can not be matched as it starts with https://
A possible solution
If you want to match the url without products after the first forward slash, you could use a negated character class matching not a | and match a p only if what follows is not orducts and assert either the end of the string or the next pipe.
https?://[^/\r\n]+/[^p|]*(?:p(?!roducts\b)|[^p|\r\n])+(?=\||$)
Explanation
https?:// Match http with optional s
[^/\r\n]+/ Match 1+ times any char except / or a newline, then match /
[^p|]* Match 0+ times any char except p or |
(?: Non capturing group
p(?!roducts\b) Match p, assert what is directly to the right is not roducts
| or
[^p|\r\n] Match any char except p or | or a newline
)+ Close non capturing group and repeat 1+ times (Or use ++ if possessive quantifiers are supported)
(?=\||$) Assert what is directly to the right is | or the end of the string
Regex demo
If products can also not be in the url from the start, so not only after the first forward slash, the pattern can be shortened to:
https?://[^p|\r\n]*(?:p(?!roducts)|[^p|\r\n])+(?=\||$)
Regex demo
Assuming your original pattern in fact be correct, one simple way to handle the negative case would be to use a negative lookahead:
(https:\/\/(?:(?!\|).)*(?!products)\w*(?:(?!\|).)*.(?=\||$))
This is what I believe you were intending to do. The lookahead (?!products) asserts that what follows that exact spot is not products. Then, the \w* matches any valid word which actually does follow.
I'm really struggling to put a label on this, which is probably why I was unable to find what I need through a search.
I'm looking to match the following:
Auto Reply
Automatic Reply
AutomaticReply
The platform that I'm using doesn't allow for the specification of case-insensitive searches. I tried the following regular expression:
.*[aA]uto(?:matic)[ ]*[rR]eply.*
Thinking that (?:matic) would cause my expression to match Auto or Automatic. However, it is only matching Automatic.
What am I doing wrong?
What is the proper terminology here?
This is using Perl for the regular expression engine (I think that's PCRE but I'm not sure).
(?:...) is to regex patterns as (...) is to arithmetic: It simply overrides precedence.
ab|cd # Matches ab or cd
a(?:b|c)d # Matches abd or acd
A ? quantifier is what makes matching optional.
a? # Matches a or an empty string
abc?d # Matches abcd or abd
a(?:bc)?d # Matches abcd or ad
You want
(?:matic)?
Without the needless leading and trailing .*, we get the following:
/[aA]uto(?:matic)?[ ]*[rR]eply/
As #adamdc78 points out, that matches AutoReply. This can be avoided as using the following:
/[aA]uto(?:matic[ ]*|[ ]+)[rR]eply/
or
/[aA]uto(?:matic|[ ])[ ]*[rR]eply/
This should work:
/.*[aA]uto(?:matic)? *[rR]eply/
you were simply missing the ? after (?:matic)
[Aa]uto(?:matic ?| )[Rr]eply
This assumes that you do not want AutoReply to be a valid hit.
You're just missing the optional ("?") in the regex. If you're looking to match the entire line after the reply, then including the .* at the end is fine, but your question didn't specify what you were looking for.
You can use this regex with line start/end anchors:
^[aA]uto(?:matic)? *[rR]eply$
Explanation:
^ assert position at start of the string
[aA] match a single character present in the list below
aA a single character in the list aA literally (case sensitive)
uto matches the characters uto literally (case sensitive)
(?:matic)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed
[greedy]
matic matches the characters matic literally (case sensitive)
* matches the character literally
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
[rR] match a single character present in the list below
rR a single character in the list rR literally (case sensitive)
eply matches the characters eply literally (case sensitive)
$ assert position at end of the string
Slightly different. Same result.
m/([aA]uto(matic)? ?[rR]eply)/
Tested Against:
Some other stuff....
Auto Reply
Automatic Reply
AutomaticReply
Now some similar stuff that shouldn't match (auto).