Regular Expressions - What is the difference between .* and (.*)? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
What is the difference between .* and (.*) in regular expressions?
From what I've seen,
AB.*DE
and
AB(.*)DE
appear to match the same things but I want to know if there are any differences so I use the correct one.
I need to be able to match any number of characters between AB and DE and even match if there isn't anything between them (ABDE).
If .* and (.*) mean the same thing, is there a "better" one to use in terms of standards/best practice?

.* Matches any character zero or more times.
(.*) - Matched characters are stored into a group for later back-referencing(any charcter within () would be captrued).
AB.DE Matches the string ABanycharDE. Dot represent any character except newline character.
AB(.)DE AB and DE are matched and the in-between character is captured.

The parentheses indicate a capture group.

There is no difference. Both will match any character zero+ times. However, the capture group is considered better because it allows you to group together your conditions. This makes your regular expressions look nicer and more readable just like parenthesis in math equations make the equation look nicer.

Related

Regular Expression for anything in Between ${something} [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am a newbie in regular expression, I have written regular expression for ${serviceName} basicly I want to take the words in between ${ } So I already wrote regular expression for this that is perfectly fine
"\\$\\{(\\w+)\\}"
But what I want to take any values not only the words which are in between ${serviceName.1.Type}.So can you guys help me with regular expression for ${serviceName.1.Type}.
I hope my question is clear.
Thanks In Advance.
A good place to test regular expressions is https://regex101.com/
\w+ matches any word character (equal to [a-zA-Z0-9_])
If you want to match anything you can replace it with: .*
.* matches any character (except for line terminators)
You might want to add a "?" at the end to match to first "}"
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed
Also you don't need to escape the { } in this case
So what you want is:
"\\${(.*?)}"
\$\{([\w?\.?\d?\s?]+)\}
This expression captures as a group everything that appears between {}
You can then call the group with the expression $1
On this web you will see your exercise solved and if other expressions have some additional character you can try to add it. Now it is prepared for points \. , spaces \s, letters \w and digits \d

Regular Expression lazy modifier matches too much [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 4 years ago.
The following regular expression is jumping [url] tags...
Regular Expression (generic regular expression)
(?:\[url.*?\])(.*?youtu.*?)(?:\[\/url\])
String:
[url]blahyoutubeblah[/url] heyya [url]blahblah[/url] [url]www.youtube.com/blah[/url]
Help!!
Your captured group requires youtu inside, so the substring
[url]blahblah[/url] [url]www.youtube.com/blah[/url]
matches, because it starts with [url], includes youtu, and ends with [/url].
Simply using a negated character set, excluding [, probably isn't enough, because that wouldn't allow for nested tags to match, such as an input of
[url]foobar youtube[b]BOLD TEXT[/b][/url]
You might require negative lookahead for [/url] right before each repeated character:
(?:(?!\[\/url\]).)*
Also, make sure that whatever comes after the [url does not contain ]s before coming to the true ], with:
\[url[^]]*\]
In full:
\[url[^]]*\]((?:(?!\[\/url\]).)*youtu(?:(?!\[\/url\]).)*)\[\/url\]
There's no need to make the quantifiers lazy anymore, because of the negative lookahead.
Demo:
https://regex101.com/r/hSAJEp/1
You are matching .* which means it will match url, up until youtu, then find /url
A simple workaround could be something like which means it won't match a opening [ bracket before finding youtu
(?:\[url.*?\])([^\[]*?youtu.*?)(?:\[\/url\])
The problem was that there is youtu you had in your regex but there was blahblah between url to be matched, making it generic
so
(?:\[url.*?\])(.*?)(?:\[\/url\])
It's lazy, but it still will match if it can - it won't be moving left border if match is possible. There are other things to do that. One of them is just to prevent unwanted match by regex itself - just use
(?:\[url[^\]]*?\])([^\[]*?youtu.*?)(?:\[\/url\])

Can someone please explain ,that what is exactly happening in the 3rd line of this program [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
the code is presented below
import re
line = "dogs are better than humans"
matchObj = re.match( r'(.*) are (.*?) .*', line)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
(.*): matches and captures any character (except new lines) any number of times. This may be zero times. . denotes "any character" and * signifies repetition. The parentheses are used to denote capture groups (explained below).
are: literal string " are "
(.*?): same as (.*) except it tries to match as few characters as possible (non-greedy). This means that it would try to stop matching as soon as possible. If your string contained multiple spaces after (.*?), this part of the expression would match all those spaces. Adding the non-greedy symbol (?) will make it stop at the first space (since that is the character after this segment of the expression).
.* any character any number of times.
Capture groups or captures for short are portions of the entire match. Wrapping an expression within your regex allows you to easily retrieve that portion of your match.
(dogs) are (better) than humans
(.*)   are  (.*?)     .*
In your example, dogs and better would be captured. These are also referred to as "groups". In regular expressions, they are marked by a pair of parentheses.
Play around with the regex here. Hover on the match to see which portions of the expression are captured.

Does not match when the string does not have a dot but it will match multiple dots [duplicate]

This question already has answers here:
Regex to allow alphanumeric and dot
(3 answers)
Closed 4 years ago.
I am trying to match the string when there's 0 or multiple dots. The regex that I can only match multiple dots but not 0 dot.
(\w*)((\w*\.)+\w*)
These are the test string I am using
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
abc
The Regex will match these
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
But not this one:
abc
https://regexr.com/?38ed7
If you really must use a regex, here is one (but it is inefficient):
/^(?![^.]*\.[^.]*$).*$/
It says:
Match a string so that the beginning of the string is not followed by a whole string with a single dot.
It does some backtracking when parsing the negative lookahead.
As mentioned in the comments to the question, I do think, unless you must have a regex, that a simple function might be better. But if you like the conciseness of a regex and performance is not a huge concern, you can go with the one I gave above. Regexes with "nots" in them are generally a tad messy, but once you understand lookarounds they do become doable. Cheers.
/\..*\.|^[^.]*$/
Or, in plain English:
Match EITHER a dot, then any number of characters, then another dot; OR the beginning of the string, then any number of non-dots, then the end of the string.

What's the difference between regex [-+]? and (-|+)? [duplicate]

This question already has answers here:
Using alternation or character class for single character matching?
(3 answers)
Closed 4 years ago.
What's the difference between regex
[-+]?
and
(-|+)?
Don't they mean the same?
Both match same characters. But the second form produce capturing group. You can use backreference to access the group (\1 or $1, .. according to your regular expression engine).
UPDATE
The second form is invalid in many regular expression engines. (valid for some old regular expression engine that match + match literally).
Because + has special meaning: One or more repetitions of preceding pattern, but there's nothing to repeat.
They are same but I would prefer character class (1st form) since 2nd form captures - or + which you may not need.
Even this will be equivalent without capturing the text in the group:
(?:-|+)?
Most regexes can be put in the form of alternation groups and the star operator - for example, [ab]+ can be written as (a|b)(a|b)* - but this is much more verbose, so the other operators exist. You included the question mark operator in your regex, but really [-+]? is equivalent to (+|-|)
So there really is no difference (except for capturing as others have mentioned), but that doesn't mean the other operators aren't useful in making a regex compact and intuitive to understand.