I have 2 variants of strings:
some_prefix.needed part*some_suffix
some_prefix.needed part
I need only 'needed part' to be matched.
Left boundary is always dot.
Right boundary is asterisk (if exists) or end of line.
Already tried:
/.*[.](.*)[*].*/ - is working for first case
/.*[.](.*)/ - is working for second case
How to do the same with one regex?
You can use
/\.([^*]+)/
See the regex demo.
Details
\. - a dot
([^*]+) - Group 1: any one or more chars other than a *.
You can also make sure you get the rightmost match by using .* before the pattern (as in the original regex):
/.*\.([^*]+)/
If supported, you might also use a lookbehind to assert a . to the left.
(?<=\.)[^*]+
The pattern matches:
(?<=\.) Positive lookbehind, assert . directly to the left
[^*]+ Match 1+ times any char except * using a negated character class
Regex demo
Related
I am developing an app with markdown capabilities, so I am building a lexer to handle this. I am fairly new to Flutter and have little experience with Regex in general.
Essentially there is a difference between *text*, **text**, and ***text***.
My expressions right now are:
r"\B\*[A-Za-z0-9 ]+\*\B"
r"\B\*{2}[A-Za-z0-9 ]+\*{2}\B"
r"\B\*{3}[A-Za-z0-9 ]+\*{3}\B"
The issue is that the first expression is matching the other two. **text*** will get matched also with the second expression. Does anyone know how to solve this?
It looks like you could use:
(?<!\S)(\*{1,3})[A-Za-z0-9 ]+\1(?!\S)
See an online demo
(?<!\S) - Assert position is not preceded by anything that is not a whitespace char;
(\*{1,3}) - Match 1-3 asterisk characters;
[A-Za-z0-9 ]+ - Match 1+ characters from given character class;
\1 - Backreference what is matched in 1st group;
(?!\S) - Assert position is not followed by anything other than whitespace char.
Note that if you'd remove the final negative lookahead you could also match **text** in **test*** if that is what you were after. Or even remove the leading negative lookbehind to match **text** in ****text** test
The goal of my regular expression adventure is to create a matcher for a mechanism that could add a trailing slash to URLs, even in the presence of parameters denoted by # or ? at the end of the URL.
For any of the following URLs, I'm looking for a match for segment as follows:
https://example.com/what-not/segment matches segment
https://example.com/what-not/segment?a=b matches segment
https://example.com/what-not/segment#a matches segment
In case there is a match for segment, I'm going to replace it with segment/.
For any of the following URLs, there should be no match:
https://example.com/what-not/segment/ no match
https://example.com/what-not/segment/?a=b no match
https://example.com/what-not/segment/#a no match
because here, there is already a trailing slash.
I've tried:
This primitive regex and their variants: .*\/([^?#\/]+). However, with this approach, I could not make it not match when there is already a trailing slash.
I experimented with negative lookaheads as follows: ([^\/\#\?]+)(?!(.*[\#\?].*))$. In this case, I could not get rid of any ? or # parts properly.
Thank you for your kind help!
Lookahead and lookbehind conditionals are so powerful!
(?<=\/)[\w]+(?(?=[\?\#])|$)
P.s: I just added [\w]+ that means [a-zA-Z0-9_]+.
Of course URLs can contain many other character like - or ~ but for the examples provided it works nicely.
If you want to match urls, you might use
\b(https?://\S+/)[^\s?#/]+(?![^\s?#])
Explanation
\b A word boundary to prevent a partial word match
( Capture group 1
https?://\S+/ Match the protocol, 1+ non whitespace chars and then the last occurrence of /
) Close group 1
[^\s?#/]+ Match 1+ chars other than a whitespace char ? # /
(?![^\s?#]) Negative lookahead, assert that directly to the right is not a non whitespace char other than ? or #
See a regex demo.
In the replacement use group 1 followed by segment/
For a match only instead of a capture group:
(?<=\bhttps?://\S+/)[^\s?#/]+(?![^\s?#])
See another regex demo.
I have a regex
[a-zA-Z][a-z]
I have to change this regex such that the regex should not accept string that starts with "de","DE","dE" and "De" .I cannot use look behind or look ahead because my system does not support it?
There's a solution without a lookahead or lookbehind, but you need to be able to use groups.
The idea there is to create a sort of "honeypot" that will match your negative results and keep only the results that do interest you.
In your case, that would write:
[dD][eE].*|(<your-regex>)
If the proposition is de<anything> (case insensitive here), it will match, but group(1) will be null.
On the other hand, matching diZ for instance would match not match what is before the or and would therefore fall into the group(1).
Finally, if the proposition doesn't start with de and doesn't match your regex, well, there will be no groups to get at all.
If you need to be sure that your proposition will match the whole provided string, you can update the regex thus:
^(?:[dD][eE].*|(<your-regex>))$
Note that ?: is not a lookahead of any kind, it serves to mark the group as non-capturing, so that <your-regex> will still be captured by group(1) (would become group(2) otherwise and the capture of a group is not always a transparent operation, performance-wise).
Simply ignore those characters:
[a-ce-z][a-df-z][a-gi-kwxyzWZXZ]
Make sure the flag is set to case insensitive. Also, [a-gi-kwxyzWZXZ] can then be modified to [a-gi-kwxyz].
EDIT:
As pointed out in this comment, the regex here won't support other words that start with d but are not followed by e. In this case, negative lookahead is a possible solution:
^(?!de)[a-z]+
This matches anything not starting with "DE" (case insensitive, without look arounds, allowing leading whitespace):
^ *+(?:[^Dd].|.[^Ee])<your regex for rest of input>
See live demo.
The possessive quantifier *+ used for whitespace prevents [^Dd] from being allowed to match a space via backtracking, making this regex hardened against leading spaces.
You can use an alternation excluding matching the d and D from the first character, or exclude matching the e as the second character.
Note that the pattern [a-zA-Z][a-z] matches at least 2 characters, so will the following pattern:
^(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z]).*
^ Start of string
(?: Non capture group
[abce-zABCE-Z][a-z] Match a char a-zA-Z without d and D followed by a lowercase char a-z
| or
[a-zA-Z][a-df-z] Match a char a-zA-Z followed by a lowercase chars a-z without e
) Close non capture grou
.* Match 0+ times any char except a newline
Regex demo
Another option is to use word boundaries \b instead of an anchor ^
\b(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z])[a-zA-Z]*\b
Regex demo
I have a problem with a regex that has to capture a substring that it's already captured...
I have this regex:
(?<domain>\w+\.\w+)($|\/|\.)
And I want to capture every subdomain recursively. For example, in this string:
test1.test2.abc.def
This expression captures test1.test2 and abc.def but I need to capture:
test1.test2
test2.abc
abc.def
Do you know if there is any option to do this recursively?
Thanks!
Maybe the following:
(\.|^)(?=(\w+\.\w+))
Go with capturing group 2
You can use a positive look ahead to capture the next group.
/(\w+)\.(?=(\w+))/g
Demonstration.
Edit: JvdV's regex is more correct.
Note that \w+ is will fail to match domains like regex-tester.com and will match invalid regex_tester.com. [a-zA-Z0-9-]+ is closer to correct. See this answer for a complete regex.
It's simpler and more robust to do this by splitting on . and iterating through the pieces in pairs. For example, in Ruby...
"test1.test2.abc.def".split(".").each_cons(2) { |a|
puts a.join(".")
}
test1.test2
test2.abc
abc.def
You may use a well-known technique to extract overlapping matches, but you can't rely on \b boundaries as they can match between a non-word / word char and word / non-word char. You need unambiguous word boundaries for left and right hand contexts.
Use
(?=(?<!\w)(?<domain>\w+\.\w+)(?!\w))
See the regex demo. Details:
(?= - a positive lookahead that enables testing each location in the string and capture the part of string to the right of it
(?<!\w) - a left-hand side word boundary
(?<domain>\w+\.\w+) - Group "domain": 1+ word chars, . and 1+ word chars
(?!\w) - a right-hand side word boundary
) - end of the outer lookahead.
Another approach is to use dots as word delimiters. Then use
(?=(?<![^.])(?<domain>[^.]+\.[^.]+)(?![^.]))
See this regex demo. Adjust as you see fit.
I was trying to write some regex to be able to fetch the value of banana. So given this list of text.
So essentially, for each line, I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
apple=1|banana=2.5|oranges=1
banana=2.5|apple=1|oranges=1
apple=1|oranges=1|banana=2.5
apple=1|oranges=1|banana=-2.5
banana=2.5
I got as far as writing (?i)banana=(.*) but of course it gets everything after the exact match.
Do you guys have any solutions?
Thanks!
I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
You may use a negated character class instead of a greedy dot pattern:
(?i)banana=([^|]*)
See the regex demo
The greedy dot, .*, matches any 0+ chars other than line break chars (in NFA engines) as many as possible (usually, up to the end of the line).
If you use [^|], a negated character class, it will match any char but |.
Pattern details
(?i) - case insensitive modifier
banana= - a literal substring (prepend with \b to match it as a whole word)
([^|]*) - Capturing group 1: any 0+ chars other than | (to avoid empty matches, replace * with + quantifier).