LookAround or default regex if symbol is not present - regex

I have got this regex
^\d+(?<=\d)_?(?=\d)\d*
My original goal is to match these patterns:
5
55
5_5
55_5
But ignore
_5
5_
_
As long as I understand, it matches at least 1 digit from the beginning of the line and anderscore if it is surrounded by digits. Pretty simple. So,
5_5 is passed,
555_555 is also passed,
_5 is not passed, it is expected,
_ also not passed.
In additon, 55 is also passed, which is fine.
But for some reason 5 is not passed as well. Why? It is single digit and it has to passed even though there is no underscore sign later. Any ideas why is this happening? Thanks.
Tested on https://regex101.com/

The reason is because the pattern should match at least 2 digits.
This is due to the ^\d+ and asserting another digit to the right (?=\d)
In your pattern, you can remove the lookaround assertions, as you are also matching the digits that you are asserting so they are redundant.
Your pattern can be written as ^\d+_?\d+ where you can see that you have to match at least 2 digits with an optional underscore.
To get the current matches that you want, you might write the pattern as:
^\d+(?:_\d+)?$
Explanation
^ Start of string
\d+ Match 1+ digits
(?:_\d+)? Optionally match _ and 1+ digits (to prevent an underscore at the end)
$ End of the string
Regex demo

Related

Regex to find a line with two capture groups that match the same regex but are still different

I am trying to analyse my source code (written in C) for not corresponding timer variable comparisons/allocations. I have a rage of timers with different timebases (2-250 milliseconds). Every timer variable contains its granularity in milliseconds in its name (e.g. timer10ms) as well as every timer-photo and define (e.g. fooTimer10ms, DOO_TIMEOUT_100MS).
Here are some example lines:
fooTimer10ms = timer10ms;
baaTimer20ms = timer10ms;
if (DIFF_100MS(dooTimer10ms) >= DOO_TIMEOUT_100MS)
if (DIFF_100MS(dooTimer10ms) < DOO_TIMEOUT_100MS)
I want to match those line where the timebases are not corresponding (in this case the second, third and fourth line). So far I have this regex:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))
that is capable of finding every line where there are two of those granularities. So instead of just line 2, 3 and 4 it matches all of them. The only idea I had to narrow it down is to add a negative lookbehind with a back-reference, like so:
(\d{1,3}(?i)ms(?-i)).*[^\d](\d{1,3}(?i)ms(?-i))(?<!\1)
but this is not allowed because a negative lookbehind has to have a fixed length.
I found these two questions (one, two) but the fist does not have the restriction of having both capture groups being of the same kind and the second is looking for equal instances of the capture group.
If what I want can be achieved way easier, by using something else than regex, I would be happy to know. My mind is just stuck due to my believe that regex is capable of that and I am just not creative enough to use it properly.
One option is to match the timer part followed by the digits and use a negative lookahead with a backreference to assert that it does not occur at the right.
For the example data, a bit specific pattern using a range from 2-250 might be:
.*?(timer(?:2[0-4]\d|250|1?\d\d|[2-9])ms)\b\S*[^\S\r\n]*[<>]?=[^\S\r\n]*\b(?!\S*\1)\S+
The pattern matches
.*? Match any char except a newline, as least as possible (Non greedy)
( Capture group 1
timer Match literally
(?:2[0-4]\d|250|1?\d\d|[2-9]) Match a digit in the range of 2-250
ms Match literally
)\b Close group and a word boundary
\S*[^\S\r\n]* Match optional non whitespace chars and optional spaces without newlines
[<>]?= Match an optional < or > and =
[^\S\r\n]*\b Match optional whitespace chars without a newline and a word boundary
(?!\S*\1) Negative lookahead, assert no occurrence of what is captured in group 1 in the value
\S+ Match 1+ non whitespace chars
Regex demo
Or perhaps a broader pattern matching 1-3 digits and optional whitespace chars which might also match a newline:
.*?(timer\d{1,3}ms\b)\S*\s*[<>]?=\s*\b(?!.*\1)\S+
Regex demo
Note that {1-3} should be {1,3} and could also match 999

Match only two combinations and ignore the rest in REGEX - tableau

I have a dozen input ID's and I need to match only two particular patterns while ignoring the rest. I have a column that would flag those valid/invalid if the regex match is true.
Test string:
1.) B-123456
2.) 985463728
My regex should strictly match the above two patterns and ignore the rest. The first test string would have an alphabet B followed by a hyphen and then few digits while the second test string is purely numbers. Below is what I tried:
[Bb\d][-\d][0-9]{1,9}
Please help me out with this as I have tried weird combinations and I am missing out on something tiny. My regex includes other combinations as well which should not happen.
You could match either bB a - and 6 digits, or match 9 digits surrounded by word boundaries:
\b(?:[Bb]-[0-9]{6}|[0-9]{9})\b
Regex demo
If the number of digits can vary, you could make the bB and the hyphen optional and either match 1+ digits using [0-9]+ or use a quantifier [0-9]{1,9}
\b(?:[bB]-)?[0-9]+\b
Or use anchors to assert the start ^ and the end $ of the string
^(?:[bB]-)?[0-9]+$

Regex - Exactly 7 digits no more no less

I am looking for help here. I want to write a regex to help me find EXACTLY a 7 digit in string - no more or less.
For instance in this string:
1234567 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return only 1234567
In this one:
12345678 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return none.
Can you help me with this one.
thanks in advance.
The proper regex should include \d{7} (7 digits) and 2 "border criteria",
for both start and end of the match, to block matching of a fragment
from longer sequence of digits.
My first thought was that neither before nor after the match there can be any digit.
But as I see from your example, these border criteria should be extended.
The set of "forbidden" chars (either before or after the match) should
include also - and letters.
E.g. 2744870 in your example data contains just 7 digits (no more, no less),
but you still don't want it to be matched, apparently because they are surrounded with - chars.
To keep the regex short, I propose:
(?<![\w-])\d{7}(?![\w-])
Details:
(?<![\w-]) - Negative lookbehind for word char or -.
\d{7} - 7 digits.
(?![\w-]) - Negative lookahead for word char or -.
If you decide to extend the set of "forbidden" chars in both border criteria,
just add them to [...] fragments in lookbehind / lookahead (but - char
should remain at the end, otherwise it must be quoted with \).
Regex like (\d{7})[^\d] (in other proposition) is wrong,
as it matches last 7 digits from any longer sequence of digits
(no "front border criterion").
It matches also both 2744870 (surronded with - chars), which are not
to be matched.
This one should do for your examples:
(\d{7})[^\d]
The first matching group contains the seven digits.
Alternatively –as suggested in the comments– you can use a negative lookahead to only match the seven digits and not require matching groups:
^\d{7}(?!\d)

Difference between regex quantifiers plus and star

I try to extract the error number from strings like "Wrong parameters - Error 1356":
Pattern p = Pattern.compile("(\\d*)");
Matcher m = p.matcher(myString);
m.find();
System.out.println(m.group(1));
And this does not print anything, that became strange for me as the * means * - Matches the preceding element zero or more times from Wiki
I also went to the www.regexr.com and regex101.com and test it and the result was the same, nothing for this expression \d*
Then I start to test some different things (all tests made on the sites I mentioned):
(\d)* doesn't work
\d{0,} doesn't work
[\d]* doesn't work
[0-9]* doesn't work
\d{4} works
\d+ works
(\d+) works
[0-9]+ works
So, I start to search on the web if I could find an explanation for this. The best I could find was here on the Quantifier section, which states:
\d? Optional digit (one or none).
\d* Eat as many digits as possible (but none if necessary)
\d+ Eat as many digits as possible, but at least one.
\d*? Eat as few digits as necessary (possibly none) to return a match.
\d+? Eat as few digits as necessary (but at least one) to return a match.
The question
As english is not my primary language I'm having trouble to understand the difference (mainly the (but none if necessary) part). So could you Regex expert guys explain this in simple words please?
The closest thing that I find to this question here on SO was this one: Regex: possessive quantifier for the star repetition operator, i.e. \d** but here it is not explained the difference.
The * quantifier matches zero or more occurences.
In practice, this means that
\d*
will match every possible input, including the empty string. So your regex matches at the start of the input string and returns the empty string.
but none if necessary means that it will not break the regex pattern if there is no match. So \d* means it will match zero or more occurrences of digits.
For eg.
\d*[a-z]*
will match
abcdef
but \d+[a-z]*
will not match
abcdef
because \d+ implies that at least one digit is required.
\d* Eat as many digits as possible (but none if necessary)
\d* means it matches a digit zero or more times. In your input, it matches the least possible one (ie, zero times of the digit). So it prints none.
\d+
It matches a digit one or more times. So it should find and match a digit or a digit followed by more digits.
With the pattern /d+ at least one digit will need to be reached, and then the match will return all subsequent characters until a non-digit character is reached.
/d* will match all the empty strings (zero or more), as well at the match. The .Net Regex parser will return all these empty string groups in its set of matches.
Simply:
\d* implies zero or more times
\d+ means one or more times

Match against 1 hyphen per any number of digit groups

I'm trying to come up with some regex to match against 1 hyphen per any number of digit groups. No characters ([a-z][A-Z]).
123-356-129811231235123-1235612346123451235
/[^\d-]/g
The one above will match the string below, but it will let the following go through:
1223--1235---123123-------
I was looking at the following post How to match hyphens with Regular Expression? for an answer, but I didn't find anything close.
#Konrad Rudolph gave a good example.
Regular expression to match 7-12 digits; may contain space or hyphen
This tool is useful for me http://www.gskinner.com/RegExr/
Assuming it can't ever start with a hyphen:
^\d(-\d|\d)*$
broken down:
^ # match beginning of line
\d # match single digit
(-\d|\d)+ # match hyphen & digit or just a digit (0 or more times)
$ # match end of line
That makes every hyphen have to have a digit immediately following it. Keep in mind though, that the following are examples of legal patterns:
213-123-12314-234234
1-2-3-4-5-6-7
12234234234
gskinner example
Alternatively:
^(\d+-)+(\d+)$
So it's one or more group(s) of digits followed by hyphen + final group of digits.
Nothing very fancy, but in my tests it matched only when there were hyphen(s) with digits on both sides.