How to validate numeric values which may contain dots or commas? - regex

I need a regular expression for validation two or one numbers then , or . and again two or one numbers.
So, these are valid inputs:
11,11
11.11
1.1
1,1

\d{1,2}[\,\.]{1}\d{1,2}
EDIT: update to meet the new requirements (comments) ;)
EDIT: remove unnecesary qtfier as per Bryan
^[0-9]{1,2}([,.][0-9]{1,2})?$

In order to represent a single digit in the form of a regular expression you can use either:
[0-9] or \d
In order to specify how many times the number appears you would add
[0-9]*: the star means there are zero or more digits
[0-9]{2}: {N} means N digits
[0-9]{0,2}: {N,M} N digits to M digits
Lets say I want to represent a number between 1 and 99 I would express it as such:
[0-9]{1,2} or \d{1,2}
Or lets say we were working with binary display, displaying a byte size, we would want our digits to be between 0 and 1 and length of a byte size, 8, so we would represent it as follows:
[0-1]{8} representation of a binary byte
Then if you want to add a , or a . symbol you would use:
\, or \. or you can use [.] or [,]
You can also state a selection between possible values as such
[.,] means either a dot or a comma symbol
And you just need to concatenate the pieces together, so in the case where you want to represent a 1 or 2 digit number followed by either a comma or a period and followed by two more digits you would express it as follows:
[0-9]{1,2}[.,]\d{1,2}
Also note that regular expression strings inside C++ strings must be double-back-slashed so every \ becomes \\

\d means a digit in most languages. You can also use [0-9] in all languages. For the "period or comma" use [\.,]. Depending on your language you may need more backslashes based on how you quote the expression. Ultimately, the regular expression engine needs to see a single backslash.
* means "zero-or-more", so \d* and [0-9]* mean "zero or more numbers". ? means "zero-or-one". Neither of those qualifiers means exactly one. Most languages also let you use {m,n} to mean "between m and n" (ie: {1,2} means "between 1 and 2")
Since the dot or comma and additional numbers are optional, you can put them in a group and use the ? quantifier to mean "zero-or-one" of that group.
Putting that all together you can use:
\d{1,2}([\.,][\d{1,2}])?
Meaning, one or two digits \d{1,2}, followed by zero-or-one of a group (...)? consisting of a dot or comma followed by one or two digits [\.,]\d{1,2}

\d{1,2}[,.]\d{1,2}
\d means a digit, the {1,2} part means 1 or 2 of the previous character (\d in this case) and the [,.] part means either a comma or dot.

Shortest regexp I know (16 char)
^\d\d?[,.]\d\d?$
The ^ and $ means begin and end of input string (without this part 23.45 of string like 123.45 will be matched). The \d means digit, the \d? means optional digit, the [,.] means dot or comma. Working example (when you click on left menu> tools> code generator you can gen code for one of 9 popular languages like c#, js, php, java, ...) here.
[ // tests
'11,11', // valid
'11.11',
'1.1',
'1,1',
'111,1', // nonvalid
'11.111',
'11-11',
',11',
'11.',
'a.11',
'11,a',
].forEach(n=> console.log(`${n}\t valid: ${ /^\d\d?[,.]\d\d?$/.test(n) }`))

If you want to be very permissive, required only two final digits with comma or dot:
^([,.\d]+)([,.]\d{2})$

Related

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.
In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.
I understand the logic of
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.
I thought this might work:
^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$
which I interpret as:
^$: The entire expression must only contain what's between these 2 symbols
[0-9]*: Any digits between 0-9, 0 or more times followed by:
(?=\d{5}0*1*2*3*4*5*6*7*8*9*): A part where at least 5 increasing digits are found followed by:
[0-9]*: Any digits between 0-9, 0 or more times.
However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:
00001459000
12345
This shouldn't match:
abc12345
9871234444
While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.
That said, here's how I'd do it if I had to:
^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$
Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".
We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).
Live demo: https://regex101.com/r/03rJET/3
(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)
Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).
First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:
^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+
Separated out for better readability:
^(?=\d+$)\d*?
(?:
0(?=[1-9])|
1(?=[2-9])|
2(?=[3-9])|
3(?=[4-9])|
4(?=[5-9])|
5(?=[6-9])|
6(?=[7-9])|
7(?=[89])|
8(?=9)
){4}
\d+
The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)
https://regex101.com/r/03rJET/2
It's ugly, but it works.

Why is my regular Expression that ignore the order of the characters does not work?

I want to make a string pattern that is:
at least 7 characters long
have at least 1 digits, max 5
have at least 3 capital alphabetic characters , max 5
have at least 1 lower alphabetic characters , max 5
have at least 1 special characters , max 5
How to express this in a regular expression?
I can do something like
^((?=.*[A-Z]{3,5})(?=.*[a-z]{1,5})(?=.*[0-9]{1,5})(?=.*[.~!##$%^_&-]{1,5}))(?=.{7,20}).*$
I don't want to require this kind of order. In fact, any mixed order should be accepted, only require the number of characters.
This Match:
PASSW120P45ccb^&#%#
But this one does not
PA12S1SW2045ccb^&#%#
How can i fix this?
P&#Ass120W45ccb^%#
P&#Ass20W45cb^%#
Please have a look at https://regex101.com/r/vF2yO7/51
You need to operate with the contrary character classes, put these into non-capturing groups and repeat these:
^
(?=(?:\D*\d){1,5})
(?=(?:[^A-Z]*[A-Z]){3,5})
(?=(?:[^a-z]*[a-z]){1,5})
(?=(?:[^.\~!##$%^_&-]*[.\~!##$%^_&-]){1,5})
.{7,20}
$
See a demo on regex101.com.
The structure here is always the same, e.g. with the numbers: require anything not a number zero or more times, followed by a number and repeat the whole pattern 1-5 times. In general:
(?=(?:not_what_you_want*what_you_want){min_times, max_times})
In the expression above, all pos. lookaheads follow this scheme, [^...] negates the characters to be matched in the class and \D* is essentially the same as [^\d]*.

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.
\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.
\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890
\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

How can I recognize a valid barcode using regex?

I have a barcode of the format 123456########. That is, the first 6 digits are always the same followed by 8 digits.
How would I check that a variable matches that format?
You haven't specified a language, but regexp. syntax is relatively uniform across implementations, so something like the following should work: 123456\d{8}
\d Indicates numeric characters and is typically equivalent to the set [0-9].
{8} indicates repetition of the preceding character set precisely eight times.
Depending on how the input is coming in, you may want to anchor the regexp. thusly:
^123456\d{8}$
Where ^ matches the beginning of the line or string and $ matches the end. Alternatively, you may wish to use word boundaries, to ensure that your bar-code strings are properly separated:
\b123456\d{8}\b
Where \b matches the empty string but only at the edges of a word (normally defined as a sequence consisting exclusively of alphanumeric characters plus the underscore, but this can be locale-dependent).
123456\d{8}
123456 # Literals
\d # Match a digit
{8} # 8 times
You can change the {8} to any number of digits depending on how many are after your static ones.
Regexr will let you try out the regex.
123456\d{8}
should do it. This breaks down to:
123456 - the fixed bit, obviously substitute this for what you're fixed bit is, remember to escape and regex special characters in here, although with just numbers you should be fine
\d - a digit
{8} - the number of times the previous element must be repeated, 8 in this case.
the {8} can take 2 digits if you have a minimum or maximum number in the range so you could do {6,8} if the previous element had to be repeated between 6 and 8 times.
The way you describe it, it's just
^123456[0-9]{8}$
...where you'd replace 123456 with your 6 known digits. I'm using [0-9] instead of \d because I don't know what flavor of regex you're using, and \d allows non-Arabic numerals in some flavors (if that concerns you).