Regular expression non-capturing group character length - regex

I'm trying to generate a regular expression with the next pattern.
A number, of a maximum of 16 digits, that can have or not a comma inside, never at the beginning, never at the end.
I tried this:
^(?:\d+(?:,{1}\d+){1})$
The problem is that I can't count the result of a group {0,16}.
This is a list of numbers that should fit the expression:
123,34
1,33333333
1222222233
Example numbers that shouldn't fit:
1111,1111,1111
,11111
11111,
11111111111111111111111111111,111111111111111 (more than 16
characters)

You may check the length before that or using ^(?=[\d,]{1,16}$)(?:\d+(?:,\d+)?)$
That is a lookahead that checks the length before doing the real match.

If your regex flavour supports lookahead assertions, you can use this:
^(?!(?:\d{17,}$|[,\d]{18,}$))(?:\d+(?:,\d+)?)$
See it here on Regexr
I removed the superfluous {1} and made the group with the fraction optional.
The negative lookahead assertion (?!(?:\d{17,}$|[,\d]{18,}$)) is checking your length requirement. It fails if it finds 17 or more digits till the end OR 18 or more digits and commas till the end. That I allow multiple commas in the character class here is not a problem, that there is only one comma is ensured by the following pattern.

Related

Regex how can i get only exact part in a string

I should only catch numbers which are fit the rules.
Rules:
it should be 16 digit
first 11 digit can be any number
after 3 digit should have all zero
last two digit can be any number.
I did this way;
([0-9]{11}[0]{3}[0-9]{2})
number example:
1234567890100012
now I want to get the number even it has got any letter beginning or ending of the string like " abc1234567890100012abc"
my output should be just number like "1234567890100012"
When I add [a-zA-Z]* it gives all string.
Also another point is if there is any number beginning or ending of the string like "999912345678901000129999". program shouldn't take this. I mean It should return none or nothing. How can I write this with regex.
You can use look around to exclude the cases where there are more digits before/after:
(?<!\d)\d{11}000\d\d(?!\d)
On regex101
You can use a capture group, and match optional chars a-zA-Z before and after the group.
To prevent a partial match, you can use word boundaries \b or if the string should match from the start and end of the line you can use anchors ^ and $
\b[a-zA-Z]*([0-9]{11}000[0-9]{2})[a-zA-Z]*\b
Regex demo

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.
In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.
I understand the logic of
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.
I thought this might work:
^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$
which I interpret as:
^$: The entire expression must only contain what's between these 2 symbols
[0-9]*: Any digits between 0-9, 0 or more times followed by:
(?=\d{5}0*1*2*3*4*5*6*7*8*9*): A part where at least 5 increasing digits are found followed by:
[0-9]*: Any digits between 0-9, 0 or more times.
However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:
00001459000
12345
This shouldn't match:
abc12345
9871234444
While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.
That said, here's how I'd do it if I had to:
^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$
Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".
We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).
Live demo: https://regex101.com/r/03rJET/3
(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)
Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).
First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:
^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+
Separated out for better readability:
^(?=\d+$)\d*?
(?:
0(?=[1-9])|
1(?=[2-9])|
2(?=[3-9])|
3(?=[4-9])|
4(?=[5-9])|
5(?=[6-9])|
6(?=[7-9])|
7(?=[89])|
8(?=9)
){4}
\d+
The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)
https://regex101.com/r/03rJET/2
It's ugly, but it works.

regular expression with or case

I'm trying to compute a regex for this scenario:
the ID must start with the letter 'M' and ends with 3 digits, but triple zeros are not allowed.
I've tried
M(00[1-9])
But this only works on blocking triple zeros, how can I cater for the other digits?
The easiest way is probably with a negative lookahead:
M(?!0{3})\d{3}
[Regex101]
This matches literal M, checks that the next thing is not triple zero, then matches three digits.
If you want to block a specific set of digits, you can modify your lookahead to check for specific repeated digits (0, 2 5, 6 here):
M(?!([0256])\1{2})\d{3}
[Regex101]
To check for all triple digits, replace [0256] with \d. This regex makes the lookahead check for one digit, then test if it is repeated twice using a backreference.
A less redundant way might be to put the capture group outside the lookahead:
M(\d)(?!\1{2})\d{2}
[Regex101]
This version says to capture one digit, make sure it is not repeated two more times, then capture two more digits.

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

Regex negative match query

I've got a regex issue, I'm trying to ignore just the number '41', I want 4, 1, 14 etc to all match.
I've got this [^\b41\b] which is effectively what I want but this also ignores all single iterations of the values 1 and 4.
As an example, this matches "41", but I want it to NOT match:
\b41\b
Try something like:
\b(?!41\b)(\d+)
The (?!...) construct is a negative lookahead so this means: find a word boundary that is not followed by "41" and capture a sequence of digits after it.
You could use a negative look-ahead assertion to exclude 41:
/\b(?!41\b)\d+\b/
This regular expression is to be interpreted as: At any word boundary \b, if it is not followed by 41\b ((?!41\b)), match one or more digits that are followed by a word boundary.
Or the same with a negative look-behind assertion:
/\b\d+\b(?<!\b41)/
This regular expression is to be interpreted as: Match one or more digits that are surrounded by word boundaries, but only if the substring at the end of the match is not preceded by \b41 ((?<!\b41)).
Or can even use just basic syntax:
/\b(\d|[0-35-9]\d|\d[02-9]|\d{3,})\b/
This matches only sequences of digits surrounded by word boundaries of either:
one single digit
two digits that do not have a 4 at the first position or not a 1 at the second position
three or more digits
This is similar to the question "Regular expression that doesn’t contain certain string", so I'll repeat my answer from there:
^((?!41).)*$
This will work for an arbitrary string, not just 41. See my response there for an explanation.