I am developing an app with markdown capabilities, so I am building a lexer to handle this. I am fairly new to Flutter and have little experience with Regex in general.
Essentially there is a difference between *text*, **text**, and ***text***.
My expressions right now are:
r"\B\*[A-Za-z0-9 ]+\*\B"
r"\B\*{2}[A-Za-z0-9 ]+\*{2}\B"
r"\B\*{3}[A-Za-z0-9 ]+\*{3}\B"
The issue is that the first expression is matching the other two. **text*** will get matched also with the second expression. Does anyone know how to solve this?
It looks like you could use:
(?<!\S)(\*{1,3})[A-Za-z0-9 ]+\1(?!\S)
See an online demo
(?<!\S) - Assert position is not preceded by anything that is not a whitespace char;
(\*{1,3}) - Match 1-3 asterisk characters;
[A-Za-z0-9 ]+ - Match 1+ characters from given character class;
\1 - Backreference what is matched in 1st group;
(?!\S) - Assert position is not followed by anything other than whitespace char.
Note that if you'd remove the final negative lookahead you could also match **text** in **test*** if that is what you were after. Or even remove the leading negative lookbehind to match **text** in ****text** test
Related
I have 2 variants of strings:
some_prefix.needed part*some_suffix
some_prefix.needed part
I need only 'needed part' to be matched.
Left boundary is always dot.
Right boundary is asterisk (if exists) or end of line.
Already tried:
/.*[.](.*)[*].*/ - is working for first case
/.*[.](.*)/ - is working for second case
How to do the same with one regex?
You can use
/\.([^*]+)/
See the regex demo.
Details
\. - a dot
([^*]+) - Group 1: any one or more chars other than a *.
You can also make sure you get the rightmost match by using .* before the pattern (as in the original regex):
/.*\.([^*]+)/
If supported, you might also use a lookbehind to assert a . to the left.
(?<=\.)[^*]+
The pattern matches:
(?<=\.) Positive lookbehind, assert . directly to the left
[^*]+ Match 1+ times any char except * using a negated character class
Regex demo
I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo
I'd like to use regex to scan a few Cobol files for a specific word but skipping comment lines. Cobol comments have an asterisk on the 7. column. The regex i've gotten so far using a negative lookbehind looks like this:
^(?<!.{6}\*).+?COPY
It matches both lines:
* COPY
COPY
I would assume that .+? overrides the negative lookbehind somehow, but i'm stuck on how to correct this. What would i need to fix to get a regex that only matches the second line?
You may use a lookahead instead of a lookbehind:
^(?!.{6}\*).+?COPY
See the regex demo.
The lookbehind required some pattern to be absent before the start of the string, and thus was redundant, it always returned true. Lookaheads check for a pattern that is to the right of the current location.
So,
^ - matches the start of the string
(?!.{6}\*) - fails the match if there are any 6 chars followed with * from the start of the string (replace . with a space if you need to match just spaces)
.+? - matches any 1+ chars, as few as possible, up to the first
COPY -COPY substring.
If you want to filter out EVERY comment you could use:
^ {6}(?!\*)
That will match only lines starting with spaces that DOES NOT have an '*' at the 7th position.
COBOL can use the position 1-6 for numbering the lines, so may be safter to just use:
^.{6}(?!\*).*$
I'm just having trouble figuring out how to regex properly. What I need is to match an asterisk followed by a space followed by any amount of characters that aren't \n. (Similar to reddit list formatting)
Example:
* Test
* Test2
* Test3
The closest I got was this, but it wasn't working.
/^[*][ ](.*?)/s
Can anyone familiar with PCRE help me.
You should not use a lazy dot pattern at the end of the regex because it will never match any single char (as it will be skipped when the regex engine comes up to it, and since there is nothing to match after it, the empty string will be matched by .*?).
Use the greedy dot pattern:
^\* (.*)
See the regex demo
Other notes: you may use \h to match any horizontal whitespace instead of the regular space in the pattern. To match start of lines with ^ use m modifier. Only use s modifier if you need . to match any chars including a newline (and carriage return depending on PCRE verbs that are active).
Which regex needs to be used to extract 'Manchester City' from string.
String is:
Aston Villa - Manchester City
I tried -(.*)\w|-(.), but it grabs - .
Note that -(.*)\w|-(.) matches - since both the alternatives here start with matching a hyphen. You can usually check if something is present or not with a lookaround.
However, in this case, I'd suggest
-\s*\K[^-]+$
Since you need to only match the substring after the last - with spaces trimmed off, you need something like a negative infinite width lookbehind (?<=-\s*). However, in PCRE, infinite width lookbehind is not supported. Instead, there is a \K operator that makes the engine omit the whole match that was grabbed so far by the current pattern.
See a regex demo
Breakdown:
- - a literal hyphen
\s* - zero or more whitespace characters
\K - operator that resets (empties) all currently kept match buffer
[^-]+ - one or more characters other than - up to ...
$ - the end of the string.
The simplest is[code] . *- (. *) [/code] and your data is in $1 or \1 or something else that depends on your tool. That assume that data are in format xxxxx-xxxxxx
Another simple option is - (.*) see: https://regex101.com/r/fY3oE7/1. Use the first capturing group in your language to get the part after the dash.