Regular Expression to match set of arbitrary codes - regex

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4

This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/

Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20

/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)

^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)

^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.

I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.

Related

Am I implementing negative lookaheads correctly with my regex?

I'm a beginner with regex and stuck with creating regex with the following conditions:
Minimum of 8 characters
Maximum of 60 characters
Must contain 2 letters
Must contain 1 number
Must contain 1 special character
Special character cannot be the following: & ` ( ) = [ ] | ; " ' < >
So far I have the following...
(?=^.{8,60}$)(?=.*\d)(?=[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*
But my last two tests are failing and I have no idea why...
!##$%^*+-_~?,.{}!HR12345
123456789AB!
If you'd like to see my test and expected results, visit here: https://regexr.com/73m2o
My tests contains acceptable number of characters, appropriate number of alphabetic characters, and supported special characters... I don't know why it's failing!
Using .* to verify a character in the string can be very inefficient and I would suggest using negated character classes for the principle of contast.
Apart from that, there is a point in the question Must contain 1 special character that is not addressed yet in the current answers.
You can use a positive lookahead for that to assert one of the characters that you consider special.
^(?=[^\d\n]*\d)(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z])(?=[^!##$%^\n]*[!##$%^])[^&`()=[|;"''\]'<>\n]{8,60}$
Explanation
^ Start of string (Outside of the lookahead)
(?=[^\d\n]*\d) Assert a digit
(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z]) Assert 2 chars a-zA-Z
(?=[^!##$%^\n]*[!##$%^]) Assert a "special" character
[^&`()=[|;"''\]'<>\n]{8,60} Match 8-60 characters except for the ones that you don't want to match
$ End of string
See a regex demo.
Part of the issue is that you're missing the .* in (?=[a-zA-Z]{2,}). However, your implementation of "two or more" letters is not correct unless the letters must be consecutive.
You'll see that the string 1234567B89A! fails to match, even with the correction. You can fix this like so:
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z].*[a-zA-Z])(?!.*[&`()=[|;"''\]'<>]).*
The part I changed is (?=.*[a-zA-Z].*[a-zA-Z]) asserting that we can match a letter, zero or more other characters, and then another letter.
https://regex101.com/r/jEsK0S/1
Also, there's currently no assertion that there must be a special character, only an assertion of which ones shouldn't match. So I'd suggest adding another lookahead with a list of valid special characters.
Since the 2+ alphabetical characters can appear anywhere in the string, you need to prepend your check for them with .* (as you have with the other character classes you're checking for); otherwise the positive lookaheads will, in this scenario, try to assert their appearance at the beginning of the string (position 0):
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*

Match 3 and 4 delimiters and between them; not less not more

I have a command-line program that its first argument ( = argv[ 1 ] ) is a regex pattern.
./program 's/one-or-more/anything/gi/digit-digit'
So I need a regex to check if the entered input from user is correct or not. This regex can be solve easily but since I use c++ library and std::regex_match and this function by default puts begin and end assertion (^ and $) at the given string, so the nan-greedy quantifier is ignored.
Let me clarify the subject. If I want to match /anything/ then I can use /.*?/ but std::regex_match considers this pattern as ^/.*?/$ and therefore if the user enters: /anything/anything/anyhting/ the std::regex_match still returns true whereas the input-pattern is not correct. The std::regex_match only returns true or false and the expected pattern form the user can only be a text according to the pattern. Since the pattern is various, here, I can not provide you all possibilities, but I give you some example.
Should be match
/.//
s/.//
/.//g
/.//i
/././gi
/one-or-more/anything/
/one-or-more/anything/g/3
/one-or-more/anything/i
/one-or-more/anything/gi/99
s/one-or-more/anything/g/4
s/one-or-more/anything/i
s/one-or-more/anything/gi/54
and anything look like this pattern
Rules:
delimiters are /|##
s letter at the beginning and g, i and 2 digits at the end are optional
std::regex_match function returns true if the entire target character sequence can be match, otherwise return false
between first and second delimiter can be one-or-more +
between second and third delimiter can be zero-or-more *
between third and fourth can be g or i
At least 3 delimiter should be match /.// not less so /./ should not be match
ECMAScript 262 is allowed for the pattern
NOTE
May you would need to see may question about std::regex_match:
std::regex_match and lazy quantifier with strange
behavior
I no need any C++ code, I just need a pattern.
Do not try d?([/|##]).+?\1.*?\1[gi]?[gi]?\1?d?\d?\d?. It fails.
My attempt so far: ^(?!s?([/|##]).+?\1.*?\1.*?\1)s?([/|##]).+?\2.*?\2[gi]?[gi]?\d?\d?$
If you are willing to try, you should put ^ and $ around your pattern
If you need more details please comment me, and I will update the question.
Thanks.
You could use this regular expression:
^s?([/|##])((?!\1).)+\1((?!\1).)*\1((gi?|ig)(\1\d\d?)?|i)?$
See regex101.com
Note how this also rejects these cases:
///anything/
/./anything/gg
/./anything/ii
/./anything/i/12
How it works:
Some explanation of the parts that are different:
((?!\1).): this will match any character that is not the delimiter. This way you are sure you can keep track of the exact number of delimiters used. You can this way also prevent that the first character after the first delimiter, is again that delimiter, which should not be allowed.
(gi?|ig): matches any of the valid modifier combinations, except a sole i, which is treated separately. So this also excludes gg and ii as valid character sequences.
(\1\d\d?)?: optionally allows for an extra delimiter (after a g modifier -- see previous) to be added with one or two digits following it.
( |i)?: for the case there is no g modifier present, but just the i or none: then no digits are allowed to follow.
This is a tricky one, but I took the challenge - here is what I have ended up with:
^s?([\/|##])(?:(?!\1).)+\1(?:(?!\1).)*\1(?:i|(?:gi?|ig)(\1\d{1,2})?)?$
Pattern breakdown:
^ matches start of string
s? matches an optional 's' character
([\/|##]) matches the delimeter characters and captures as group 1
(?:(?!\1).)+ matches anything other than the delimiter character one or more times (uses negative lookahead to make sure that the character isn't the delimiter matched in group 1)
\1 matches the delimiter character captured in group 1
(?:(?!\1).)* matches anything other than the delimiter character zero or more times
\1 matches the delimiter character captured in group 1
(?: starts a new group
i matches the i character
| or
(?:gi?|ig) matches either g, gi, or ig
(\1\d{1,2})? followed by an optional extra delimiter and 0-9 once or twice
)? closes group and makes it optional
$ matches end of string
I have used non capturing groups throughout - these are groups that start ?:

Regex with more than one OR/AND operator

I'm trying to match text that is:
a combination of numbers and letters, and might contain [:,.]
OR
a * character plus at least one number OR letter (not necessarily in this order)
Meaning my regex should match all these
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
but not these:
0205
252,25
Yes, regex alternation with | does not have the meaning in a character group (e.g. [a-z|0-9]) that it does elsewhere in a pattern. (Think of it as implied between characters & character ranges within a character group, making it redundant.)
Pattern
This pattern should do what you need:
^((?=^.{0,}[0-9])(?=^.{0,}[a-zA-Z])[0-9a-zA-Z :,.]{2,}|(?!^\*$)(?=^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,})(?!^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}\*)[*0-9.a-zA-Z]{2,})$
It matches...
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
...and does not match...
0205
252,25
...as you require.
You can try the pattern with the inputs you specified in a regex fiddle.
Explanation
Some explanation for the 1st subpattern (on the left side of the |) matching your 1st set of match criteria:
(?=^.{0,}[0-9]) - Assert that a number appears in the string.
(?=^.{0,}[a-zA-Z]) - Assert that a letter also (i.e. AND) appears in the string.
[0-9a-zA-Z :,.]{2,} - "a combination of numbers and letters, and might contain [ :,.]" (assuming the aforementioned assertions)
Similarly, some explanation for the 2nd subpattern (on the right side of the |) matching your 2nd set of match criteria:
(?!^\*$) - Assert that the string is not just *.
(?=^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}) - Assert that the string contains *.
(?!^[0-9.a-zA-Z]{0,}\*[0-9.a-zA-Z]{0,}\*) - Assert that the string does not contain more than one *.
[*0-9.a-zA-Z]{2,} - "a * character + atleast one number OR letter (not necessarily in this order)" (assuming the aforementioned assertions)
There is probably room to sand & polish the pattern - especially the lookahead assertions for * in the second subpattern I suspect; but it works and conveys the strategy I employed of multiple lookahead assertions to constrain each of the two subpatterns to fit your requirements.
As you comment below, I think you dose want a full line match, and by saying number and letter, I think it means digits and letters both occurred in the right match.
And by saying "a * character + atleast one number OR letter" I suppose "*" occurs only once in match.
Maybe you could try this one:
(^(?=.*[a-zA-Z]+)(?=.*[0-9]+)[0-9a-zA-Z :,.]+$)|(^[a-zA-Z0-9.]*\*[a-zA-Z0-9.]+$)|(^[a-zA-Z0-9.]+\*[a-zA-Z0-9.]*$)
It matches:
Bf1305020008401 6798ubbii230693
Nettbank til: Troij iudh Betalt: 03.05.13
7509*30.04
*87589
123456*
.*.
test123
123test
But won't match any of:
0205
252,25
*
123*345*789
rebound
test
123
Original:
This should work
(^[A-Za-z0-9 ]*(([A-Za-z]+[ ]*[0-9]+)|([0-9]+[ ]*[A-Za-z]+))[A-Za-z0-9 ]*$)|(^\*[A-Za-z0-9]+$)

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

Regex - simple phone number

I know there are a ton of regex examples on how to match certain phone number types. For my example I just want to allow numbers and a few special characters. I am again having trouble achieving this.
Phone numbers that should be allowed could take these forms
5555555555
555-555-5555
(555)5555555
(555)-555-5555
(555)555-5555 and so on
I just want something that will allow [0-9] and also special characters '(' , ')', and '-'
so far my expression looks like this
/^[0-9]*^[()-]*$/
I know this is wrong but logically I believe this means allow numbers 0-9 or and allow characters (, ), and -.
^(\(\d{3}\)|\d{3})-?\d{3}-?\d{4}$
\(\d{3}\)|\d{3} three digits with or without () - The simpler regex would be \(?\d{3}\)? but that would allow (555-5555555 and 555)5555555 etc.
An optional - followed by three digits
An optional - followed by four digits
Note that this would still allow 555555-5555 and 555-5555555 - I don't know if these are covered in your and so on part
This match what you want numbers,(, ) and -
/^[0-9()-]+$/
^[0-9-+\s]+$
06754654
+54654654
+546 546 5654 43534 +
+09945 345 3453 45
Why do you have a stray ^ in there? I think you meant [()-] This is actually making you have to have two beginning-of-strings in the regex, which will never match.
Also, \d is a nice shortcut for [0-9]. They are exactly the same.
Also, this will only match a bunch of numbers, then a bunch of ( or ) or -. Something like: 1294819024()()()()()-----()- would match. I think you want the whole thing to be able to repeat, something like: ^(\d*[()-]*)*$. Now, you can match repeating sequences of this.
Now, it is important to notice that nested * are typically inefficient, we can realize that we are just wanting to match any digit and the punctuation you want: [\d()-]*
For digits you can use \d. For more than one digit, you can use \d{n}, where n is the number of digits you want to match. Some special characters must be escaped, for example \( matches (. For example: \(\d{3}\)\-\d{3}\-\d{4} matches (555)-555-5555.
The second carat (afaik) is going to break anything you do since it means "start of string".
What you appear to be asking for therefore is:
start of string, followed by...
any number of numeric characters, followed by...
start of string, followed by...
any number of '(',')', or '-' characters, followed by...
end of string
Which won't work even if that second carat does nothing, because you're not accounting for anything after the first '(',')', or '-', and in fact will probably only validate an empty string if that.
You want /^[0-9()-]+$/ for a very crude pattern which will "work".
If you are doing US only number the best solution is to strip out all the non-digit characters and then just test to see if the length == 10.