What is the point of having * in a regular expression - regex

Recently I am thinking the reason why we need a * in regular expression. For example, if we want to represent A0,A1..,Z99, we can do:
[A-Z][0-9][0-9]*
But A0A (which is not we want) is also valid according to the above. What benefit does the * give me?

* is just a quantifier, matching between zero and unlimited times.
[A-Z][0-9][0-9]* matches A0,A1..,Z99 and also A10000,Z123456789...
Remembering that if you dont put the ^ and $ as anchors, the processor will match the specified part, and return true even if the input contain more characters, because you don't said that you want a positive result ONLY if the entire input matches the regex.
If your goal is to match just A0,A1..,Z99, the regex should be:
^[A-Z][0-9][0-9]?$
Or simply:
^[A-Z]\d{1,2}$
\d means 'digit', and is the same as [0-9].
{1,2} means at least 1 time and nothing more than 2 times.
? also is a quantifier, matching 0 or 1 time.

But A0A (which is not we want) is also valid
No it is not valid, you just need to use anchors:
^[A-Z][0-9][0-9]*$
^ will ensure this matches at line start and $ ensures it matches till line end.
Also if only 2nd digit is optional then better to use:
^[A-Z][0-9][0-9]?$
Since * matches 0 or more times whereas ? matches 0 or 1 time.

Seems like you're trying to match the strings starts with an uppercase alphabet and the following numbers ranges from 1 to 99.
^[A-Z][1-9]?[0-9]$
^ asserts that we are at the start and $ asserts that we are at the end. So this helps to do an exact string match. It won't match at the middle or start or at the end of a string or line. That is, [A-Z][1-9]?[0-9] will match A10 in fooA10 string but ^[A-Z][1-9]?[0-9]$ won't produce a match in fooA10 string.

Related

Use RegEx to create an input mask

I'm trying to use RegEx to create an input mask. The first letter can be either A or B and it has to be 5 digits after the number but ranging from 1-99999.
For example,
A00001
B20000
B00412
This is what I have so far,
^[S|T]{1}[0-9]{4}[1-9]
but it's not allowing A52210 for example.
Thanks in advance :)
Brief
I'm not sure what [S|T] is supposed to do, but that is saying one of S|T - one of S, |, or T. Also the {1} is irrelevant and your numbers won't work for the range you expect ([0-9]{4}[1-9] says 00001 to 99999, but not any number followed by a zero, i.e. 11110).
Code
See regex in use here
^[AB](?!0{5})\d{5}$
Note: If this regex is to be used on a long string and the contents of the string might include the string you're searching for, you should replace both position assertions ^ and $ with a word boundary \b as per the following regex.
\b[AB](?!0{5})\d{5}\b
Variations
Slightly shorter version, but dependent on end of string (won't work if the string is in the middle of a sentence)
^[AB](?!0+$)\d{5}$
Negative lookbehind (won't work in some flavours of regex)
^[AB]\d{5}(?<!0{5})$
Long nuisance regex that checks every possibility, but ensures that at least one number is not 0
^AB$
Results
Input
A00001
B20000
B00412
A52210
A00001
A00000
S00001
A1000
Output
Note: Below are matches; anything from Input above that isn't here wasn't matched.
A00001
B20000
B00412
A52210
A00001
Explanation
^ Assert position at the start of the line
[AB] Match a character from the set (either A or B literally)
(?!0{5}) Ensure what follows isn't 0 five times
\d{5} Match any digit five times
$ Assert position at the end of the line

AUTOHOTKEY: RegExMatch() a series of numbers and letters

I've tested my regular expression in http://www.regextester.com/
([0-9]{4,4})([A-Z]{2})([0-9]{1,3})
It's matching perfect with the following strings just as I want it.
1234AB123
2000AZ20
1000XY753
But when I try it in Autohotkey I get 0 result
test := RegExMatch("2000SY155","([0-9]{4,4})([A-Z]{2})([0-9]{1,3})")
MsgBox %test%
testing for:
first 4 characters must be a number
next 2 characters must be caps letters
next 1 to 3 characters must be numbers
You had to many ( )
This is the correct implementation:
test := RegExMatch("1234AB123","[0-9]{4,4}([A-Z]{2})[0-9]{1,3}")
Edit:
So what I noticed is you want this pattern to match, but you aren't really telling it much.
Here's what I was able to come up with that matches what you asked for, it's probably not the best solution but it works:
test := RegExMatch("1234AB567","^[0-9]{4,4}[A-Z]{2}(?![0-9]{4,})[0-9$]{1,3}")
Breaking it down:
RegExMatch(Haystack, NeedleRegEx [, UnquotedOutputVar = "", StartingPosition = 1])
Circumflex (^) and dollar sign ($) are called anchors because
they don't consume any characters; instead, they tie the pattern to
the beginning or end of the string being searched.
^ may appear at the beginning of a pattern to require the match to occur at
the very beginning of a line. For example, **
** matches abc123 but not 123abc.
$ may appear at the end of a pattern to require the match to occur at the very > end of a line. For example, abc$ matches 123abc but not abc123.
So by adding Circumflex we are requiring that our Pattern [0-9]{4,4} be at the beginning of the our Haystack.
Look-ahead and look-behind assertions: The groups (?=...), (?!...) are
called assertions because they demand a condition to be met but don't
consume any characters.
(?!...) is a negative look-ahead because it requires that the specified pattern not exist.
Our next Pattern is looking for two Uppercase Alpha Characters [A-Z]{2}(?![0-9]{4,}) that does not have four or more Numeric characters after it.
And finally our last Pattern that needs to match one to three Numeric characters as the last characters in our Haystack [0-9$]{1,3}
test := RegExMatch("2000SY155","([0-9]{4,4})([A-Z]{2})([0-9]{1,3})")
MsgBox %test%
But when I try it in Autohotkey I get 0 result
The message box correctly returns 1 for me, meaning your initial script works fine with my version. Usually, braces are no problem in RegExes, you can put there as many as you like... maybe your AutoHotkey version is outdated?

Why doesn't the regex ^([0|1]1)+$ match the string "111"?

I'm trying to write a regex to match binary strings where every odd character is a 1.
I came up with this:
^([0|1]1)+$
My logic:
^ matches the start of the line
( starts a capture group
[0|1] match a 0 or 1 (since the 0th position is even)
1 the previous character (0 or 1) must be followed by a 1
+ repeat the previous pattern one or more times
$ matches the end of the line
So by my logic, it the above regex should match binary strings where every other character (with the first "other" character being the second one in the string) is a 1.
However, it doesn't work correctly. As an example, the string 111 is not matched.
Why isn't it working and what should I change to make it work?
Regex101 Test
If you need every odd character to be a 1, then you need something more like this:
^([01]1)*[01]?$
The first character can be anything, the next has to be 1, then repeated several times while the last character can be 0 or 1.
The pipe in your character class is not needed, and is actually making your regex also match a pipe character. So remove it entirely. You use the pipe in groups (i.e. (?: ... ) or ( ... ) to denote alternation).
The above will also match an empty string, so you could add (?=.) at the beginning to force matching at least 1 character (i.e. ^(?=.)([01]1)*[01]?$.
The above will match where you have (where x is either 0 or 1):
x
x1
x1x
x1x1
x1x1x
x1x1x1
etc.
Your current regex on the other side is attempting to match even number of characters. You repeat the group ([0|1]1) which matches 2 characters exactly (no more no less) so the length of your whole match will be a multiple of 2.
Adding the optional [01] at the end allows for strings with odd number of characters to match.
Your regex is for even-length strings only. [01] and 1 each match a character, therefore your capturing group matches 2 characters.
This modifies your regex to accept odd-length strings:
^([01](1|$))+$
Firstly, the [0|1] should read [01]. Otherwise you have a character group that matches, 0, | or 1.
Now, [01]1 matches exactly two characters. Thus ([01]1)+ cannot match a string whose length is not a multiple of two.
To make it work with inputs of odd length, change the regex to
^(([01]1)+[01]?|1)$
You can use this pattern:
^1?([01]1)+$|^1$
or
^(1?([01]1)+|1)$
To deal with an odd or even number of digits you need to put an optional 1? at the begining. To ensure that there is at least one digit, you can't use a * quantifier for the group, otherwhise the pattern can match the empty string. This why, you need to use + for the group and add the case of a single 1

Regex allow a string to only contain numbers 0 - 9 and limit length to 45

I am trying to create a regex to have a string only contain 0-9 as the characters and it must be at least 1 char in length and no more than 45. so example would be 00303039 would be a match, and 039330a29 would not.
So far this is what I have but I am not sure that it is correct
[0-9]{1,45}
I have also tried
^[0-9]{45}*$
but that does not seem to work either. I am not very familiar with regex so any help would be great. Thanks!
You are almost there, all you need is start anchor (^) and end anchor ($):
^[0-9]{1,45}$
\d is short for the character class [0-9]. You can use that as:
^\d{1,45}$
The anchors force the pattern to match entire input, not just a part of it.
Your regex [0-9]{1,45} looks for 1 to 45 digits, so string like foo1 also get matched as it contains 1.
^[0-9]{1,45} looks for 1 to 45 digits but these digits must be at the beginning of the input. It matches 123 but also 123foo
[0-9]{1,45}$ looks for 1 to 45 digits but these digits must be at the end of the input. It matches 123 but also foo123
^[0-9]{1,45}$ looks for 1 to 45 digits but these digits must be both at the start and at the end of the input, effectively it should be entire input.
The first matches any number of digits within your string (allows other characters too, i.e.: "039330a29"). The second allows only 45 digits (and not less). So just take the better from both:
^\d{1,45}$
where \d is the same like [0-9].
Use this regular expression if you don't want to start with zero:
^[1-9]([0-9]{1,45}$)
If you don't mind starting with zero, use:
^[0-9]{1,45}$
codaddict has provided the right answer. As for what you've tried, I'll explain why they don't make the cut:
[0-9]{1,45} is almost there, however it matches a 1-to-45-digit string even if it occurs within another longer string containing other characters. Hence you need ^ and $ to restrict it to an exact match.
^[0-9]{45}*$ matches an exactly-45-digit string, repeated 0 or any number of times (*). That means the length of the string can only be 0 or a multiple of 45 (90, 135, 180...).
A combination of both attempts is probably what you need:
^[0-9]{1,45}$
^[0-9]{1,45}$ is correct.
Rails doesnt like the using of ^ and $ for some security reasons , probably its better to use \A and \z to set the beginning and the end of the string
For this case word boundary (\b) can also be used instead of start anchor (^) and end anchor ($):
\b\d{1,45}\b
\b is a position between \w and \W (non-word char), or at the beginning or end of a string.

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.