Regex to get domain from email [duplicate] - regex

This question already has answers here:
Regex get domain name from email
(9 answers)
Closed 4 years ago.
I am using the below regex for getting domain from email address:
re.findall('#(.+?)',str), which is not giving me the intended result.
I have got the correct regex: re.findall('#(\w.+)',str).
Please explain the difference between the patterns.

The main difference is the way it's matching the actual domain.
.+?
. Matches any non-newline character.
+ Matches the previous element (.) one or more times.
? In this case, as it's after a repeater, it makes it "lazy." Without this, it would be "greedy", matching as many times as possible. When a repeater is "lazy" it matches as few times as possible.
\w.+
\w Matches any "word character" (generally alphabetical upper- and lower-case, and underscores).
. Matches any non-newline character.
+ Repeats the previous element (.) one or more times. And because there is no ?, it will match as many times as possible.
That should outline the differences between the two. If you have examples that you wanted to match or not match, and add them to the original post, I can help with a further explanation on why one works while the other doesn't for those cases.

Related

Extract String Between Slashes, But Second Slash May Not Exists [duplicate]

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 2 years ago.
I'm trying to figure out how to extract usernames from a URL that's captured in a form. I do have the below regex, but the issue is that the second forward slash may not exist. Here are the examples:
Sample URLs
https://test.site.com/u/username
https://test.site.com/u/username/pref/summary
I'm trying to extract the username.
Current Regex
/u/(.*?)/
The current one I have above successfully extracts the username, but only when there is another / after the username. The second / needs to be optional; it may or may not be there, and there may or may not be more after that.
I just couldn't find the correct regex to make the second / optional (using ? at the end didn't help) but not exactly "optional," if that makes sense.
Thanks in advance!
/u/([^/]*) will match as many non-/ characters after /u/ as possible.
It will not match pref and summary,
because [^/] matches any character other than /,
so [^/]* matches a string (as long as possible)
of characters other than /. 
Consider: if your pattern is B[aeiou]*
and your input is Beetles (or Beethoven),
it will match only Bee,
stopping at (before) the first character that isn’t a vowel. 
Similarly, [^/]* stops at (before) the first occurrence of /.

Multiple possible matches for regex in Perl [duplicate]

This question already has answers here:
What do 'lazy' and 'greedy' mean in the context of regular expressions?
(13 answers)
Closed 3 years ago.
I'm new to Perl and is working with regular expressions. I am not able to decide how Perl resolves the ambiguity for a regex match when multiple matches are possible for a given query string. For example
('hellohellohello' =~ m/h.*o/)
This could match 'hello', 'hellohello' or 'hellohellohello'. Which one will it choose - shortest or largest match ? What if we want opposite behavior (like if default is to find the shortest match then finding the largest match) ?
In case the answer to the first is largest consider
('hello
hellohello' =~ m/h.*o/)
Here, it could match from the first line (before the newline character) or the second line (after the newline character) - first vs largest match. Which one will it use ?
What are the complete set of rules that can be used to decide which substring of a string would match a given regex (might be some case other than the one mentioned in the examples where multiple matches could be found) ?
* is greedy, so it tries to match the longest possible string, so long as the rest of the pattern can still be matched. So it will match hellohellohello.
If you use *? instead, that makes it non-greedy, and it will match the shortest possible string, again as long as the rest of the pattern matches. So m/h.*?o/ will match hello.

Does not match when the string does not have a dot but it will match multiple dots [duplicate]

This question already has answers here:
Regex to allow alphanumeric and dot
(3 answers)
Closed 4 years ago.
I am trying to match the string when there's 0 or multiple dots. The regex that I can only match multiple dots but not 0 dot.
(\w*)((\w*\.)+\w*)
These are the test string I am using
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
abc
The Regex will match these
dial.check.Catch.Url
dial.check.Catch.Url.Dial.check.Catch.Url
32443.324342.23423424.23.423.423.42.34.234.32.4..2..2.342.4
234dfasfd2aa4234234.234aa341.4.123daaadf.df.af....
12fd.dafd
.
But not this one:
abc
https://regexr.com/?38ed7
If you really must use a regex, here is one (but it is inefficient):
/^(?![^.]*\.[^.]*$).*$/
It says:
Match a string so that the beginning of the string is not followed by a whole string with a single dot.
It does some backtracking when parsing the negative lookahead.
As mentioned in the comments to the question, I do think, unless you must have a regex, that a simple function might be better. But if you like the conciseness of a regex and performance is not a huge concern, you can go with the one I gave above. Regexes with "nots" in them are generally a tad messy, but once you understand lookarounds they do become doable. Cheers.
/\..*\.|^[^.]*$/
Or, in plain English:
Match EITHER a dot, then any number of characters, then another dot; OR the beginning of the string, then any number of non-dots, then the end of the string.

Regex negated character disjunction [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
Very quick and simple question.
Consider the vector of character strings ("AvAv", "AvAvAv")
Why does the pattern (Av)\1([^A]|$) match both strings?
The pattern says have an isntance of "Av", have another, then either have a character that is not an "A" or else come to an end. The first string clearly matches, the latter I do not see how it does. It has two copies of "Av" but then it fails to end (missing the second disjunct), and fails to be followed by a charavter other than "A" (missing the first disjunct), so how does the pattern successfully match it?
Thank you so much for your time and assistance. It is greatly appreciated.
Here is an explanation:
AvAv - matches (Av)\1$
In this case, we can match Av, followed by that captured quantity, followed by $ from the alternation. In the case of AvAvAv we also have a match:
AvAvAv - again matches (Av)\1$
^^^^ last four letters match
It is the same logic here, except that in order to match, we have to skip the first Av.
If the pattern were ^(Av)\1([^A]|$) then only AvAv would be a match.
A RegEx only needs to match a part of the string to be considered "a match".
In other words, your RegEx matches this part:
AvAvAv
for the second example.
If you don't want it to match the second one, use a caret ^
^(Av)\1([^A]|$)
In this way the second one won't be matched.

JSON schema pattern validation is failing [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I am using below pattern in json schema to validate strings.
"pattern": "^(nfs://)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([0-9]{4})"
But currently it is not validating "nfs://172.1.1:2049" as invalid string.
This doesn't immediately seem like an obvious problem, but the . character needs to be escaped because you're trying to literally match that character.
This regex, with escaped . and forward slashes works:
^(nfs:\/\/)(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([0-9]{4})
The problem was that since each capturing group that matches digits can match as few as one digit or as many as three, the regex engine looked at the first 1 (in 172), found that it was valid, then tried matching . (any character) and found the digit 7, which is not what you want.
In nfs://172.1.1:2049, the second capturing group in your regex matched the first 1 in the IP address, the . matched the 7, the third capturing group matched the 2.. and so on.
Try it here: https://regex101.com/r/TNXDiQ/1