What's the difference between regex [-+]? and (-|+)? [duplicate] - regex

This question already has answers here:
Using alternation or character class for single character matching?
(3 answers)
Closed 4 years ago.
What's the difference between regex
[-+]?
and
(-|+)?
Don't they mean the same?

Both match same characters. But the second form produce capturing group. You can use backreference to access the group (\1 or $1, .. according to your regular expression engine).
UPDATE
The second form is invalid in many regular expression engines. (valid for some old regular expression engine that match + match literally).
Because + has special meaning: One or more repetitions of preceding pattern, but there's nothing to repeat.

They are same but I would prefer character class (1st form) since 2nd form captures - or + which you may not need.
Even this will be equivalent without capturing the text in the group:
(?:-|+)?

Most regexes can be put in the form of alternation groups and the star operator - for example, [ab]+ can be written as (a|b)(a|b)* - but this is much more verbose, so the other operators exist. You included the question mark operator in your regex, but really [-+]? is equivalent to (+|-|)
So there really is no difference (except for capturing as others have mentioned), but that doesn't mean the other operators aren't useful in making a regex compact and intuitive to understand.

Related

Regular Expression for anything in Between ${something} [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am a newbie in regular expression, I have written regular expression for ${serviceName} basicly I want to take the words in between ${ } So I already wrote regular expression for this that is perfectly fine
"\\$\\{(\\w+)\\}"
But what I want to take any values not only the words which are in between ${serviceName.1.Type}.So can you guys help me with regular expression for ${serviceName.1.Type}.
I hope my question is clear.
Thanks In Advance.
A good place to test regular expressions is https://regex101.com/
\w+ matches any word character (equal to [a-zA-Z0-9_])
If you want to match anything you can replace it with: .*
.* matches any character (except for line terminators)
You might want to add a "?" at the end to match to first "}"
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
Also you don't need to escape the { } in this case
So what you want is:
"\\${(.*?)}"
\$\{([\w?\.?\d?\s?]+)\}
This expression captures as a group everything that appears between {}
You can then call the group with the expression $1
On this web you will see your exercise solved and if other expressions have some additional character you can try to add it. Now it is prepared for points \. , spaces \s, letters \w and digits \d

Regular Expression lazy modifier matches too much [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 4 years ago.
The following regular expression is jumping [url] tags...
Regular Expression (generic regular expression)
(?:\[url.*?\])(.*?youtu.*?)(?:\[\/url\])
String:
[url]blahyoutubeblah[/url] heyya [url]blahblah[/url] [url]www.youtube.com/blah[/url]
Help!!
Your captured group requires youtu inside, so the substring
[url]blahblah[/url] [url]www.youtube.com/blah[/url]
matches, because it starts with [url], includes youtu, and ends with [/url].
Simply using a negated character set, excluding [, probably isn't enough, because that wouldn't allow for nested tags to match, such as an input of
[url]foobar youtube[b]BOLD TEXT[/b][/url]
You might require negative lookahead for [/url] right before each repeated character:
(?:(?!\[\/url\]).)*
Also, make sure that whatever comes after the [url does not contain ]s before coming to the true ], with:
\[url[^]]*\]
In full:
\[url[^]]*\]((?:(?!\[\/url\]).)*youtu(?:(?!\[\/url\]).)*)\[\/url\]
There's no need to make the quantifiers lazy anymore, because of the negative lookahead.
Demo:
https://regex101.com/r/hSAJEp/1
You are matching .* which means it will match url, up until youtu, then find /url
A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu
(?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])
The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic
so
(?:\[url.*?\])(.*?)(?:\[\/url\])
It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use
(?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])

Regex substitution: find double quotes not following by specific character [duplicate]

This question already has an answer here:
Regex Match a character which is not followed by another specific character
(1 answer)
Closed 4 years ago.
I have the following situation:
3" a
3":a
3",a
3"a
3"2
3"A
I need to find a replace a double quote with space every time the double quote is not following by : or ,.
So, for my case the expected results will be:
3 a
3":a
3",a
3 a
3 2
3 A
Any idea how write this logic using regex?
Regards,
You can use a negative lookahead A(?!B) for that. It matches an expression A that is not followed by expression B.
The replacement of the matches with spaces will depend on the used language.
"(?![:,])
Applied to your examples: https://regex101.com/r/UiPlaC/2
If you want to handle the case 3" a without having multiple spaces, just include one (or even more?) optional spaces in the match.
"(?![:,])\ ?
See here for more information:
Regex lookahead, lookbehind and atomic groups
https://www.regular-expressions.info/lookaround.html

Regex negated character disjunction [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Very quick and simple question.
Consider the vector of character strings ("AvAv", "AvAvAv")
Why does the pattern (Av)\1([^A]|$) match both strings?
The pattern says have an isntance of "Av", have another, then either have a character that is not an "A" or else come to an end. The first string clearly matches, the latter I do not see how it does. It has two copies of "Av" but then it fails to end (missing the second disjunct), and fails to be followed by a charavter other than "A" (missing the first disjunct), so how does the pattern successfully match it?
Thank you so much for your time and assistance. It is greatly appreciated.
Here is an explanation:
AvAv - matches (Av)\1$
In this case, we can match Av, followed by that captured quantity, followed by $ from the alternation. In the case of AvAvAv we also have a match:
AvAvAv - again matches (Av)\1$
^^^^ last four letters match
It is the same logic here, except that in order to match, we have to skip the first Av.
If the pattern were ^(Av)\1([^A]|$) then only AvAv would be a match.
A RegEx only needs to match a part of the string to be considered "a match".
In other words, your RegEx matches this part:
AvAvAv
for the second example.
If you don't want it to match the second one, use a caret ^
^(Av)\1([^A]|$)
In this way the second one won't be matched.

Regular Expressions - What is the difference between .* and (.*)? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What is the difference between .* and (.*) in regular expressions?
From what I've seen,
AB.*DE
and
AB(.*)DE
appear to match the same things but I want to know if there are any differences so I use the correct one.
I need to be able to match any number of characters between AB and DE and even match if there isn't anything between them (ABDE).
If .* and (.*) mean the same thing, is there a "better" one to use in terms of standards/best practice?
.* Matches any character zero or more times.
(.*) - Matched characters are stored into a group for later back-referencing(any charcter within () would be captrued).
AB.DE Matches the string ABanycharDE. Dot represent any character except newline character.
AB(.)DE AB and DE are matched and the in-between character is captured.
The parentheses indicate a capture group.
There is no difference. Both will match any character zero+ times. However, the capture group is considered better because it allows you to group together your conditions. This makes your regular expressions look nicer and more readable just like parenthesis in math equations make the equation look nicer.