Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
Related
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
My problem is simple, but I've been pulling my hair out trying to solve it. I have two types of strings: one has a semicolon and the other doesn't. Both have colons.
Reason: A chosen reason
Delete: Other: testing
Reason for action: Other; testing
Blah: Other; testing;testing
If the string has a semicolon, I want to match anything after the first one. If it has no semicolon, I want to match everything after the first colon. For lines above I should get:
A chosen reason
Other: testing
testing
testing;testing
I can get the semicolon to match by using ;(.*) and I can get the colon to match by using :(.*).
I tried using an alternative like this: ;(.*)|:(.*) thinking that maybe if I have the right order I can get it to match the semicolon first, and then the colon if there is no semicolon, but it always just matched the colon.
What am I doing wrong?
Edit
I added another test case above to match the requirements I had stated. For strings with no semicolon, it should match the first colon.
Also, "Reason" could be anything, so I am clarifying that as well in the test cases.
Second Edit
To clarify, I'm using the POSIX Regular Expressions (using in PostgeSQL).
My guess is that you might want to design an expression, maybe similar to:
:\s*(?:[^;\r\n]*;)?\s*(.*)$
Demo
Here you have a fast regex (233 steps) with no look aheads.
.*?:\s*(?:([^\n;]+)|.*?;\s*(.*))$
Check out the regex https://regex101.com/r/9gbpjW/3
UPDATED: to match any placeholder. Instead of just Reason
One option is to use an alternation to first check if the string has no ; If there is none, then match until the first : and capture the rest in group 1.
In the case that there a ; match until the first semicolon and capture the rest in group 1.
For the logic stated in the question:
If the string has a semicolon, I want to match anything after the first one.
If it has no semicolon, I want to match everything after the first colon
You could use:
^(?:(?!.*;)[^\r\n:]*:|[^;\r\n]*;)[ \t]*(.*)$
Explanation
^ Start of string
(?: Non capturing group
(?!.*;) Negative lookahead (supported by Postgresql), assert string does not contain ;
[^\r\n:]*: If that is the case, match 0+ times not : or a newline, then match :
| Or
[^;\r\n]*; Match 0+ times not ; or newline, then match ;
) Close non capturing group
[ \t]* Match 0+ spaces or tabs
(.*) Capturing group 1, match any char 0+ times
$ End of string
Regex demo | Postgresql demo
regex = .*?:(?(?!.*;)(.*)|.*?;(.*))
demo
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
This text
"dhdhd89(dd)"
Matched against this regex
.+?(?:\()
..returns "dhdhd89(".
Why is the start parenthesis included in the capture?
Two different tools, as well as the .NET Regex class, returns the same result. So I gather there is something I don't understand about this.
The way I read my regex is.
Match any character, at least one occurrence. As few as possible.
The matched string should be followed by a start parenthesis, but not to be included in the capture.
I can find workaround, but I still want to know what is going on.
Just turn the non-capturing group to positive lookahead assertion.
.+?(?=\()
.+? non-greedy match of one or more characters followed by an opening parenthesis. Assertions won't match any characters but asserts whether a match is possible or not. But the non-capturing group will do the matching operation.
DEMO
You can just use this negation based regex to capture only text before a literal (:
^([^(]+)
When you use:
.+?(?:\()
Regex engine does match ( after initial text but it just doesn't return that in a captured group to you.
You havn't defined capture groups then I guess you display the whole match (group 0), you can do:
(.+?)(?:\()
and the string you want is in group 1
or use lookahead as #AvinashRaj said.
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1