Dart RegExp, Why does this throw a FormatException - regex

I'm not clear on why this is throwing a FormatException:
void main(){
RegExp cssColorMatch = new RegExp(r'^#([0-9a-fA-F]{3}{1,2}$)');
print(cssColorMatch.hasMatch('#F56'));
}

You are trying to specify multiple range quantifiers back to back which causes an exception error. You need to end your capturing group around your first range quantifier and place the following range quantifier outside of the capturing group if you want to use it this way.
RegExp re = new RegExp(r"#([0-9a-fA-F]{3}){1,2}");
Since you are using hasMatch, you can remove the start ^ and end $ anchors since this function returns if the regular expression has a match in the string input and you really don't need {1,2} here either.
RegExp re = new RegExp(r"#([0-9a-fA-F]{3})");

You cannot do {3}{1,2}. But you can do:
RegExp cssColorMatch = new RegExp(r'^\#((?:[0-9a-fA-F]{3}){1,2})$');
which still does not match Hex colors correctly.

Because your regex contains {1,2} at the last. There is no need to include this part.
Below regex would be enough,
RegExp cssColorMatch = new RegExp(r'^#([0-9a-fA-F]{3})$');

Related

Regex match last substring among same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd
Try this one. This works in python.
import re
reg = re.compile(r"\/[a-z]{1,}\/\d+[#a-z_]{1,}")
s = "asd/asd/asd/asd/1#s_"
print(reg.findall(s))
# ['/asd/1#s_']
Update:
Since the question lacks clarity, this only works with the given order and hence, I suppose any other combination simply fails.
Edits:
New Regex
reg = r"\/\w+(\/\w*\d+\W*)*(\/\d+\w*\W*)*(\/\d+\W*\w*)*(\/\w*\W*\d+)*(\/\W*\d+\w*)*(\/\W*\w*\d+)*$"

Regex wrapping word

Regex example
How can I exclude the first space in every match?
The same regex: (?:^|\W)#(\w+)(?!\w)
Is this what you're looking for?
http://regexr.com/3ca98
From the information you gave us until now, this regex should also be sufficient: #(\w+)(?!\w).
But maybe there's more to it than we know. What did you want to achieve with the (?:^|\W)?
Edit: Thinking about what you probably want to achieve, it occured to me that you might only match your pattern if it's not in the middle of another word (e.g. test#case). You probably don't want to match this.
To exclude such cases, you have to asure that there's some kind of whitespace character in front of it, or in other words: nothing else but whitespace characters or nothing.
I assume you use javascript because regexr.com does and sadly, there is no regex lookbehind available in javascripts regex implementation. So there is no real option to make sure there is only nothing or whitespace in front of your pattern.
One solution would be to work with capture groups. Take this regex:
(?:^|\s+)(#\w+)
It searches for one or more whitespace characters or linestarts in front of your pattern but doesn't use a capture group for that. Then your pattern is up and it's the first capture group in the whole expression.
To use this in javascript now, you need to instantiate a RegExp object and use its function exec until there are no more matches and save the first capture group to a result array.
JS code:
var txt = text.innerHTML;
var re = /(?:^|\s+)(#\w+)/g;
var res = [];
var tmpresult = [];
while ((tmpresult = re.exec(txt)) !== null) {
res.push(tmpresult[1]); // push first capture group to result stack
}
result.innerHTML = JSON.stringify(res, null, 2);
JSFiddle: https://jsfiddle.net/j41tw4hm/1/
Updated regexr.com: http://regexr.com/3ca9n

Regular expression in python re.findall()

I tryed the folowing:
I want to split with the re.findall()
str="<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
print(re.findall('<(abc|ghj)>.*?<*>',str))
The out should be
['<abc>somechars<*>','<ghj>somechars<*>']
In notepad, if I try this expression I get right, but here:
['abc', 'ghj']
Any idea?
Thanks for the answers.
(<(?:abc|ghj)>.*?<\*>)
Try this.See demo.
http://regex101.com/r/kP8uF5/12
import re
p = re.compile(ur'(<(?:abc|ghj)>.*?<\*>)', re.IGNORECASE | re.MULTILINE)
test_str = u"<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
re.findall(p, test_str)
You're capturing (abc|ghj). Use a non-capturing group (?:abc|ghj) instead.
Also, you should escape the second * in your regex since you want a literal asterisk: <\*> rather than <*>.
>>> s = '<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>'
>>> re.findall(r'<(?:abc|ghj)>.*?<\*>', s)
['<abc>somechars<*>', '<ghj>somechars<*>']
Also also, avoid shadowing the built-in name str.
Just make the group a non-capturing group:
str="<abc>somechars<*><def>somechars<*><ghj>somechars<*><ijk>somechars<*>"
print(re.findall('<(?:abc|ghj)>.*?<*>',str))
The function returns the groups from left to right, and since you specified a group it left out the entire match.
From the Python documentation
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the pattern
has more than one group. Empty matches are included in the result
unless they touch the beginning of another match
.

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Regular Expression: how can I impose a perfect string matching?

Currently I am using this one ( edit: I missed to explain that I use this one for excluding exactly these words :p ):
String REGEXP = "^[^(REG_)?].*";
but matches (exluding) also ERG, EGR, GRE, etc... above
P.S.
I removed super because it is another keyword that I must filter, figure an array list composed with more of the following three words to be used as model:
REG_info1, info2, SUPER_info3, etc...
I need three filter matching one model at time, my question focus only on the second filter parsing keywords based on model "info2".
Just type it literally:
REG
This will only match REG.
So:
String REGEXP = "^(REG_|SUPER_)?.*";
Edit   After you clarified that you want to match every word that does not begin with REG_ or SUPER_, you could try this:
\b(?!REG_|SUPER_)\w+
The \b is a word boundary and the expression (?!expr) is a look-ahead assertion.
As everyone have already replied, if you want to match a line starting with REG, you use the regexp "^REG", if you want to match any line that starts REG or SUPER, you use "^(REG|SUPER)" and regular expression negation is, in general, a tricky problem.
To match all lines NOT starting with 'REG' you need to match "^[^R]|R[^E]|RE[^G]" and a regular expression to match all lines not starting with REG or SUPER can be constructed in a similar fashion (start by grouping the "not REG" in parentheses, then construct the "not SUPER" patterns as "[^S]|S[^U]|[SU[^P]...", group this and use alternation for both groups).
How about
\mREG\M
// \mREG\M
//
// Options: ^ and $ match at line breaks
//
// Assert position at the beginning of a word «\m»
// Match the characters “REG” literally «REG»
// Assert position at the end of a word «\M»
The [] indicate character classes. This is not what you want. You can just use "REG" to match REG. (You can use REG|SUPER for REG or SUPER)
REGEXP = "^(REG_|SUPER_)"
would match anything that haves REG_ or SUPER_ at the beginning of a string. You don't need more after the group "(..|..)"