Regular Expression that includes plus sign and decimal - regex

I'm having trouble putting together a regular expression for a string that contains a number between 0 and 99999, followed by a plus sign, followed by one or two digits, optionally followed by a decimal and a single digit. For instance:
99999+99.9
This would also be valid:
0+00
This would also be valid:
0+02.5
This would also be valid:
0+2.5
I found this topic: How can I check for a plus sign using regular expressions?
And this one:
Regular Expression for decimal numbers
But am unable to put the 2 together and fulfill the other requirements listed above.
Any help you can provide is much appreciated!

This should work:
\d{1,5}\+\d{1,2}(?:\.\d)?
\d{1,5} captures anything between 0 and 99999 but also allows zero padding, e.g. 00000 or 00123 (it'll be a little more complicated if you don't want zero padding).
\+ matches a plus sign.
\d{1,2} matches one or two digits.
(?:\.\d) matches a period followed by a single digit. The (?:) bit indicates a non-capture group.
The ? at the end makes the non-capture group optional.

You need to escape the plus and the . -- like so
\d{1,5}\+\d{1,2}\.?\d
Hth!

Here it is
"^[0-9]*([0-9]{0,5}\+[0-9]{1,2}(\.[0-9])?)[0-9]*$"
EDIT: as per you comment, I have modified the expression.

Related

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.
In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.
I understand the logic of
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.
I thought this might work:
^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$
which I interpret as:
^$: The entire expression must only contain what's between these 2 symbols
[0-9]*: Any digits between 0-9, 0 or more times followed by:
(?=\d{5}0*1*2*3*4*5*6*7*8*9*): A part where at least 5 increasing digits are found followed by:
[0-9]*: Any digits between 0-9, 0 or more times.
However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:
00001459000
12345
This shouldn't match:
abc12345
9871234444
While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.
That said, here's how I'd do it if I had to:
^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$
Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".
We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).
Live demo: https://regex101.com/r/03rJET/3
(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)
Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).
First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:
^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+
Separated out for better readability:
^(?=\d+$)\d*?
(?:
0(?=[1-9])|
1(?=[2-9])|
2(?=[3-9])|
3(?=[4-9])|
4(?=[5-9])|
5(?=[6-9])|
6(?=[7-9])|
7(?=[89])|
8(?=9)
){4}
\d+
The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)
https://regex101.com/r/03rJET/2
It's ugly, but it works.

What would be the regex to fine the bold numbers

I tried \b[0-9]{1,4}\.[0-9]{1,3}\.[0-9]{0,3} but it misses 34.89 and 23.89
I want all the number sequences in the text below except 28.72%
34.89
0105.93.10 ghghghh
0105.93.20 ghghhh
jjjjjhjj 0105.93.30 jsdfsd iksifsdjfk sdfsdk
0105.93.40ierfgg dfgkdfg dfgolgh 23.89
28.72%
Thanks
Paul
Your regex requires two dots to be present. You need to make the last dot-digit sequence optional, and you need to exclude matches where % or another digit follows (otherwise 28.7 within 28.72% would match):
\b[0-9]{1,4}\.[0-9]{1,3}(?:\.[0-9]{1,3})?(?![0-9%])
Make the last part optional
\b[0-9]{1,4}\.[0-9]{1,3}(?:\.[0-9]{0,3})?
Your original expression was requiring the period.
You were not very specific with your rules about matching %, so I made this:
\b[0-9]{1,4}\.[0-9]{1,3}(?:\.[0-9]{0,3})?(?=[^%\d]|$)
The last part is a positive lookahead for any non-% non-digit character or the end of the line. It needs to be non-digit as well or else 28.7 will match the rest of the expression, and the last 2 matches a non-percent character.
This will find all groups of numbers, separated by single dots, that are not followed by %:
(?:\d+\.)+\d+(?!\%)
It requires at least one digit on each side of a dot. Other than that, it doesn't care how many digits are in each group. At requires there be at least one dot in the number.
This would also require each group to have between 2 and 4 digits:
(?:\d{2,4}\.)+\d{2,4}(?!\%)

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

Regular Expression - 4 digits in a row, but can't be all zeros

I am looking for a solution that can exclusively be done with a regular expression. I know this would be easy with variables, substrings, etc.
And I am looking for PCRE style regex syntax even though I mention vim.
I need to identify strings with 4 numeric digits, and they can't be all 0's. So the following strings would be a match:
0001
1000
1234
0101
And this would not:
0000
This is a substring that will occur at a set location within a large string, if that matters; I don't think it should. For example
xxxxxxxxxxxx0001xxxxx
xxxxxxxxxxxx1000xxxxx
xxxxxxxxxxxx1234xxxxx
xxxxxxxxxxxx0101xxxxx
xxxxxxxxxxxx0101xxxxx
xxxxxxxxxxxx0000xxxxx
(?<!\d)(?!0000)\d{4}(?!\d)
or, more kindly/maintainably/sanely:
m{
(?<! \d ) # current point cannot follow a digit
(?! 0000 ) # current point must not precede "0000"
\d{4} # match four digits at this point, provided...
(?! \d ) # that they are not then followed by another digit
}x
Since I complained that the some of the answers here weren't regular expressions, I thought I'd best give you a regex answer. This is primitive, there's probably a better way, but it does work:
([1-9][0-9][0-9][0-9]|[0-9][1-9][0-9][0-9]|[0-9][0-9][1-9][0-9]|[0-9][0-9][0-9][1-9])
This checks for something which contains 0-9 in each location, except one which must lie in 1-9, preventing 0000 from matching. You can probably write this simpler using \d instead of [0-9] if your regex parser supports that metacharacter.
Just match for 4 digits (\d{4} should do it) and then verify that your match is not equal to '0000'.
Since PCRE supports lookarounds, \d{4}(?<!0000) will find any instance of four consecutive non-zero characters. See it in action here.
If you must make sure the match only occurs in the correct position of the string, you can use ^.{X}\d{4}(?<!0000).{Y}$ instead, where X and Y are the number of preceding and following characters, respectively (12 and 5 in your example.)
Test for a sequence of 3 digits (0-9), then a 4th with only (1-9)
/\d{3}[1-9]/

How to validate numeric values which may contain dots or commas?

I need a regular expression for validation two or one numbers then , or . and again two or one numbers.
So, these are valid inputs:
11,11
11.11
1.1
1,1
\d{1,2}[\,\.]{1}\d{1,2}
EDIT: update to meet the new requirements (comments) ;)
EDIT: remove unnecesary qtfier as per Bryan
^[0-9]{1,2}([,.][0-9]{1,2})?$
In order to represent a single digit in the form of a regular expression you can use either:
[0-9] or \d
In order to specify how many times the number appears you would add
[0-9]*: the star means there are zero or more digits
[0-9]{2}: {N} means N digits
[0-9]{0,2}: {N,M} N digits to M digits
Lets say I want to represent a number between 1 and 99 I would express it as such:
[0-9]{1,2} or \d{1,2}
Or lets say we were working with binary display, displaying a byte size, we would want our digits to be between 0 and 1 and length of a byte size, 8, so we would represent it as follows:
[0-1]{8} representation of a binary byte
Then if you want to add a , or a . symbol you would use:
\, or \. or you can use [.] or [,]
You can also state a selection between possible values as such
[.,] means either a dot or a comma symbol
And you just need to concatenate the pieces together, so in the case where you want to represent a 1 or 2 digit number followed by either a comma or a period and followed by two more digits you would express it as follows:
[0-9]{1,2}[.,]\d{1,2}
Also note that regular expression strings inside C++ strings must be double-back-slashed so every \ becomes \\
\d means a digit in most languages. You can also use [0-9] in all languages. For the "period or comma" use [\.,]. Depending on your language you may need more backslashes based on how you quote the expression. Ultimately, the regular expression engine needs to see a single backslash.
* means "zero-or-more", so \d* and [0-9]* mean "zero or more numbers". ? means "zero-or-one". Neither of those qualifiers means exactly one. Most languages also let you use {m,n} to mean "between m and n" (ie: {1,2} means "between 1 and 2")
Since the dot or comma and additional numbers are optional, you can put them in a group and use the ? quantifier to mean "zero-or-one" of that group.
Putting that all together you can use:
\d{1,2}([\.,][\d{1,2}])?
Meaning, one or two digits \d{1,2}, followed by zero-or-one of a group (...)? consisting of a dot or comma followed by one or two digits [\.,]\d{1,2}
\d{1,2}[,.]\d{1,2}
\d means a digit, the {1,2} part means 1 or 2 of the previous character (\d in this case) and the [,.] part means either a comma or dot.
Shortest regexp I know (16 char)
^\d\d?[,.]\d\d?$
The ^ and $ means begin and end of input string (without this part 23.45 of string like 123.45 will be matched). The \d means digit, the \d? means optional digit, the [,.] means dot or comma. Working example (when you click on left menu> tools> code generator you can gen code for one of 9 popular languages like c#, js, php, java, ...) here.
[ // tests
'11,11', // valid
'11.11',
'1.1',
'1,1',
'111,1', // nonvalid
'11.111',
'11-11',
',11',
'11.',
'a.11',
'11,a',
].forEach(n=> console.log(`${n}\t valid: ${ /^\d\d?[,.]\d\d?$/.test(n) }`))
If you want to be very permissive, required only two final digits with comma or dot:
^([,.\d]+)([,.]\d{2})$