Regex Verification of String in Correct Order with Delimiters in PHP - regex

I'm trying to make a expression to verify that the string supplied is a valid format, but it seems that if I don't use regex in a few months, I forget everything I learned and have to relearn it.
My expression is supposed to match a format like this: 010L0404FFCCAANFFCC00M000000XXXXXX
The four delimiters are (L, N, K, M) which arent in the 0-9A-F hexidecimal range to indicate uniqueness must be in that order or not in the list. Each delimiter can only exist once!
It breaks down to this:
Starts off with a 3 digit numbers, which is simply ^([0-9]{3}) and is always required
Second set begins with L, and must be 2 digits + 2 digits + 6 hexdecimal and is optional
Third set begins with N and must be a 6 digit hexdecimal and is optional
The fourth set K is simply any amount of numbers and is optional
The fifth set is M and can be any 6 hexdecimals or XXXXXX to indicate nothing, it must be in multiples of 6 excluding 0, like 336699 (6) or 336699XXXXXXFFCC00 (18) and is optional
The hardest part I cant figure out making it require it in that order, and in multiples, like the L delimiter must come before and K always if it's there (the reason so I don't get variations of the same string which means the same thing with delimiters swapped). I can already parse it, I just want to verify the string is the correct format.
Any help would be appreciated, thanks.

Requiring the order isn't too bad. Just make each set optional. The regex will still match in order, so if the L section, for example, isn't there and the next character is N, it won't let L occur later since it won't match any of the rest of the regex.
I believe a direct translation of your requirements would be:
^([0-9]{3})(L[0-9]{4}[0-9A-F]{6})?(N[0-9A-F]{6})?(K[0-9]+)?(M([0-9A-F]{6}|X{6})+)?$
No real tricks, just making each group optional except for the first three digits, and adding an internal alternative for the two patterns of six digits in the M block.

^([0-9]{3})(L[0-9]{4}[0-9A-F]{6})?(N[0-9A-F]{6})?(K[0-9]+)?(M([0-9A-F]{6})+|MX{6})$

Related

regular expression that accepts numbers like 1,000.10? [duplicate]

I need regex to validate a number that could contain thousand separators or decimals using javascript.
Max value being 9,999,999.99
Min value 0.01
Other valid values:
11,111
11.1
1,111.11
INVALID values:
1111
1111,11
,111
111,
I've searched all over with no joy.
/^\d{1,3}(,\d{3})*(\.\d+)?$/
About the minimum and maximum values... Well, I wouldn't do it with a regex, but you can add lookaheads at the beginning:
/^(?!0+\.00)(?=.{1,9}(\.|$))\d{1,3}(,\d{3})*(\.\d+)?$/
Note: this allows 0,999.00, so you may want to change it to:
/^(?!0+\.00)(?=.{1,9}(\.|$))(?!0(?!\.))\d{1,3}(,\d{3})*(\.\d+)?$/
which would not allow a leading 0.
Edit:
Tests: http://jsfiddle.net/pKsYq/2/
((\d){1,3})+([,][\d]{3})*([.](\d)*)?
It worked on a few, but I'm still learning regex as well.
The logic should be 1-3 digits 0-1 times, 1 comma followed by 3 digits any number of times, and a single . followed by any number of digits 0-1 times
First, I want to point out that if you own the form the data is coming from, the best way to restrict the input is to use the proper form elements (aka, number field)
<input type="number" name="size" min="0.01" max="9,999,999.99" step="0.01">
Whether "," can be entered will be based on the browser, but the browser will always give you the value as an actual number. (Remember that all form data must be validated/sanitized server side as well. Never trust the client)
Second, I'd like to expand on the other answers to a more robust (platform independent)/modifiable regex.
You should surround the regex with ^ and $ to make sure you are matching against the whole number, not just a subset of it. ex ^<my_regex>$
The right side of the decimal is optional, so we can put it in an optional group (<regex>)?
Matching a literal period and than any chain of numbers is simply \.\d+
If you want to insist the last number after the decimal isn't a 0, you can use [1-9] for "a non-zero number" so \.\d+[1-9]
For the left side of the decimal, the leading number will be non-zero, or the number is zero. So ([1-9]<rest-of-number-regex>|0)
The first group of numbers will be 1-3 digits so [1-9]\d{0,2}
After that, we have to add digits in 3s so (,\d{3})*
Remember ? means optional, so to make the , optional is just (,?\d{3})*
Putting it all together
^([1-9]\d{0,2}(,?\d{3})*|0)(\.\d+[1-9])?$
Tezra's formula fails for '1.' or '1.0'. For my purposes, I allow leading and trailing zeros, as well as a leading + or - sign, like so:
^[-+]?((\d{1,3}(,\d{3})*)|(\d*))(\.|\.\d*)?$
In a recent project we needed to alter this version in order to meet international requirements.
This is what we used: ^-?(\d{1,3}(?<tt>\.|\,| ))((\d{3}\k<tt>)*(\d{3}(?!\k<tt>)[\.|\,]))?\d*$
Creating a named group (?<tt>\.|\,| ) allowed us to use the negative look ahead (?!\k<tt>)[\.|\,]) later to ensure the thousands separator and the decimal point are in fact different.
I have used below regrex for following retrictions -
^(?!0|\.00)[0-9]+(,\d{3})*(.[0-9]{0,2})$
Not allow 0 and .00.
','(thousand seperator) after 3 digits.
'.' (decimal upto 2 decimal places).

Regex expression for date within dates range

I need to validate with regex a date in format yyyy-mm-dd (2019-12-31) that should be within the range 2019-12-20 - 2020-01-10.
What would be the regex for this?
Thanks
Regex only deal with characters. so we have to work out at each position in the date what are the valid characters.
The first part is easy. The first two characters have to be 20
Now it gets complicated the next character can be a 1 or a 2 but what follows depends on the value of that character so we split the rest of the regex into two sections the first if the third character matches 1 and the second if it matches 2
We know that if the third character is a 1 then what must follow is the characters 9-12- as the range starts at 2019-12-20 now for the day part. The 9th character is the tens for the day this can only be 2 or 3 as we are already in the last month and the minimum date is 20. The last character can be any digit 0-9. This gives us a day match of [23][0-9]. Putting this together we now have a pattern for years starting 2019 as 19-12-[23][0-9]
It the third character is a 2 then we can match up to the day part of the date a gain as the range ends in January. This gives us a partial match of 20-01- leaving us to work on the day part. Hear we know that the first character of the day can either be a 1 or 0 however if it's a 1 then the last character must be a 0 and if it's a 0 then the last character can only be in the range 1 to 9. This give us another alteration (?:0[1-9]|10) Putting the second part together we get 20-01-(?:0[1-9]|10).
Combining these together gives the final regex 20(?:19-12-[23][0-9]|20-01-(?:0[1-9]|10))
Note that I'm assuming that the date you are testing against is a validly formatted date.
Try this:
(2019|2020)\-(12|01)\-([0-3][0-9]|[0-9])
But be aware that this will allow number up to where the first digit is between zero and three and the second digit between zero and nine for the dd value. You could specify all numbers you want to allow (from 20 to 10) like this (20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10).
(2019|2020)\-(12|01)\-(20|21|22|23|24|25|26|27|28|29|30|31|01|1|02|2|03|3|04|4|05|5|06|6|07|7|08|8|09|9|10)
But honestly... Regular-Expressions are not the right tool for this. RegExp gives a mask to something, not a logical context. Use regex to extract the data/value from a string and validate those values using another language.
The above 2nd Regex will, f.e. match your dates, but also values outside of this range since there is no context between 2019|2020 and the second group 12|01 so they match values like 2019-12-11 but also 2020-12-11.
To only match the values you want this will be a really large regex like this (inner brackets only if you need them) ((2019)-(12)-(20)|(2019)-(12)-(21)|(2019)-(12)-(22)|...) and continue with all possible dates - and ask yourself: what would you do if you find such a regex in a project you have to work with ;)
Better solution (quick and dirty, there might be better solutions):
(?<yyyy>20[0-9]{2})\-(?<mm>[01][0-9]|[0-9])\-(?<dd>[0-3][0-9]|[0-9])
This way you have three named groups (yyyy, mm, dd) you can access and validate the matched values... The regex is smaller, you have a better association between code and regex and both are easier to maintain.

Optimization of Regular Expression to match numbers bigger or equal to 50

I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.

Need to capture single character, but ignore digit

I'm parsing out flight info.
Here's the sample data:
E0.777 7 3:09
E0.319 N 1:43
E0.735 8 1:45
E0.735 N 1:48
E0.M80 9 3:21
E0.733 1:48
I need to populate fields like this:
Equipment: 735
On Time: N
Duration: 1:48
Problem I'm having is capturing the Y or N character but ignoring the single digit, then capturing the duration.
This is the expression I have tried:
#"^.{3}(.{3})\s?([N|Y]?)?(?:[0-9]\s+)?(\w{4})"
Edit: I updated the sample data to clarify my question. Equipment is not always three digits, it could be a character and two digits. The data between the equipment and the duration could be a boolean N or Y, a single digit, or white space. Only the boolean should be captured.
Firstly, you mix up the concepts of alternation and character classes [Y|N] would match 3 different characters: Y or | or N. Either use (...) or leave out the pipe.
Secondly your double ? after the character class does not really do anything. Thirdly, at the end you only match consecutive spaces if a digit was found. But if there is no digit, the last ? will ignore the subpattern, thus not allowing spaces either.
Lastly, \w does not match :.
Try this:
#"^.{3}(\d{3})\s?(?:([NY])|\d)\s+(\d:\d\d)"
You should also think about restricting the repeated . at the beginning to a more precise character class (i.e \w{2}\., but I don't know the possibilities there).
#"^..\.(\d{3})\s(?:([YN])|\d)\s*(\S{4})"
Changed .{3} to ..\. which is a bit more specific about there being a literal . for character 3.
(?:([YN])|\d) matches either Y/N or a digit, but only captures a Y or N. Notice that it's [YN] not [Y|N].
Changed \w{4} to \S{4} since \w doesn't match colons :.
This will do it...
^\w\d\.(\d{3})\s(?:([YN])|\d)\s*(\d:\d{2})$
I made some other changes to your regex because it was easier for me to just rewrite it based off your data then to try to modify what you had.
This will capture the Y or N or it won't capture anything in that group. I also tried to be more specific with your duration regex.
Update: This works with your new requirements...
^\w\d\.(\w{3})\s(?:([YN])|\d|\s)\s*(\d:\d{2})$
You can see it working on your data here... http://regexr.com?32j1b
(hover over each line to see the matched groups)
This captures all lines with Y or N and ignores everything else:
^...(\d{3})\s*([YN])\s*(\d+:\d+)

Regular Expression to match pattern once or more with no partial matches

Better explained with examples:
HHH
HHHH
HHHBBHHH
HHHBH
BB
HHBH
I need to come up with a regexp that matches only 3 H's or a multiple of 3 H's (so 6, 9, 12, ... H's are ok as well) and 5 H's are not ok. And if possible I don't want to use Perl regexps.
So for the input above the regexp would match (1), (3) and (6) only.
I'm just starting with regular expressions here so I don't exactly know how I'm supposed to approach this.
edit
Just to clear something up:, an H can only be in one group of 3 H's. The group of 3 H's might be HHH or HHBH.
That's why in example 2 above it is not a match because the last H is not in a group of 3 H's. And you can't take the last 3 H's in a group because the middle 2 H's have already been inside a group before.
You can use the following regular expression:
^([^H]*H[^H]*H[^H]*H[^H]*)+$
It matches any string which contains in total 3 H or any multiple of 3. In between there might be any other character.
Explanation:
^ begin of string
( start of group
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]*H any string of characters (or none) not including 'H' plus a single 'H'
[^H]* any string of characters (or none) which is not 'H'
)+ containing the group once or twice or ...
$ end of string
By repeating the subpattern [^H]*H three times we make sure that there are indeed 3 H included, [^H]* allows any separating characters.
Note: use either egrep or run grep with additional argument -E.
Use this to match a multiple of 3 H's:
(H{3})+
Here is a complete regex for your examples:
^(H{3})+B*(H{3})*$
Edit: It looks like you need to count non-consecutive H's. In that case:
^(([^H]*H){3})+[^H]*$
That should match any string with a multiple of 3 H's.
Given the requirement that H's can be arbitrarily interleaved with non-H's, but that the total number of H's must be a non-zero multiple of 3 (so XXX, containing no H's, is not a match), then the total regular expression is anything but trivial. This is not a beginner's regular expression.
I'm going to assume that the dialect of regular expression treats {} and () as metacharacters for counting and grouping, and includes + for one-or-more. If you're using a regular expression system that has a different requirement (\{\}, for example) then adjust accordingly.
You need the regex to match the whole string, so there are no stray H's allowed. So, it must start with ^ and end with $. You need to allow an arbitrary number of non-H's at front and back. The H's may be separated by an arbitrary number of non-H's. That leads to:
^([^H]*H[^H]*H[^H]*H)+[^H]*$
Ouch; that is hard to read! It says the line must consist of 1 or more (+) groups of an arbitrary number of non-H's followed by an H, an arbitrary number of non-H's, another H, an arbitrary number of non-H's and a third H; all of which can be followed by an arbitrary number of non-H's.
Using the {} for counting:
^(([^H]*H){3})+[^H]*$
That's still hard to read. Note that my description said "arbitrary number of non-H's at front and back", but I only use the [^H]* at the back; that's because the repeating pattern allows an arbitrary number of non-H's at the front anyway so there's no need to repeat that fragment.