Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
Related
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)
used to match string
123 FEX-1-80 Online N2K-C2248TP-1GE SSDFDFWFw23r23
How come this works in regexr.com but Python 3.5.1 can't find a match
r'(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))'
can match up to
123 FEX-1-80 Online N2K-C2248TP
but the second hyphen - in group(4) is not matched
From what I understand, non-capture group character can appear more than once in the group, what went wrong here?
Just a comment, not really an answer but for the sake of clarity I have put it as an answer.
Being relatively new to regular expressions, one should use the verbose mode. With this, your expression becomes much much more readable:
(1[0-9]{2})\s+ # three digits, the first one needs to be 1
(\w+(?:-\w+)+)\s+ # a word character (wc), followed by - and wcs
(\w+)\s+ # another word
(\w+(?:-\w+)+)\s+ # same expression as above
(\w+) # another word
Also, check if your (second and fourth) expression could be rewritten as [\w-]+ - it is not the same as yours and will match other substrings but try to avoid nested parenthesis in general.
Concerning your question, the second string cannot be matched as you made all of your expressions mandatory (and group 5 is missing in the second example, so it will fail).
See a demo on regex101.com.
This regular expression matches the full input string:
(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)
This one doesn't:
(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))
The latter is missing a + after the last non-capturing group, and it's missing the \s+(\w+) at the end that matches the SSDFDFWFw23r23 at the end of the input string.
From what I understand, non-capture group character can appear more than once in the group, what went wrong here?
I'm not sure I follow. A non-capturing group is really just there to group a part of a regular expression.
(?:-\w+) or just -\w+ will both match a hyphen (-) followed by one or more "word" characters (\w+). It doesn't matter whether that regular expression is in a non-capturing group or not. If you want to match repetitions of that pattern, you can use the + modifier after the non-capturing group, e.g. (?:-\w+)+. That pattern will match a string like -foo-bar-baz.
So the reason your second regular expression doesn't match the repeated pattern is because it's lacking the + modifier.
This text
"dhdhd89(dd)"
Matched against this regex
.+?(?:\()
..returns "dhdhd89(".
Why is the start parenthesis included in the capture?
Two different tools, as well as the .NET Regex class, returns the same result. So I gather there is something I don't understand about this.
The way I read my regex is.
Match any character, at least one occurrence. As few as possible.
The matched string should be followed by a start parenthesis, but not to be included in the capture.
I can find workaround, but I still want to know what is going on.
Just turn the non-capturing group to positive lookahead assertion.
.+?(?=\()
.+? non-greedy match of one or more characters followed by an opening parenthesis. Assertions won't match any characters but asserts whether a match is possible or not. But the non-capturing group will do the matching operation.
DEMO
You can just use this negation based regex to capture only text before a literal (:
^([^(]+)
When you use:
.+?(?:\()
Regex engine does match ( after initial text but it just doesn't return that in a captured group to you.
You havn't defined capture groups then I guess you display the whole match (group 0), you can do:
(.+?)(?:\()
and the string you want is in group 1
or use lookahead as #AvinashRaj said.
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1