Regular expression with "not character" not matches as expected - regex

I am trying to satisfy next restrictions:
line has from 3 to 256 chars that are a-z, 0-9, dash - or dot .
this line cannot start or end with -
I want to get kind of next output:
aaa -> good
aaaa -> good
-aaa -> bad
aaa- -> bad
---a -> bad
A have some of regexes that don't give right answer:
1) ^[^-][a-z0-9\-.]{3,256}[^-]$ gives all test lines as bad;
2) ^[^-]+[a-z0-9\-.]{3,256}[^-]+$ treats first three lines as one matching string since [^-] matches new line I guess.
3) ^[^-]?[a-z0-9\-.]{3,256}[^-]?$ (? for one or zero matching dash) gives all test lines as good
Where is the truth? I'm sensing it's either close to mine or much more complicated.
P.S. I use python 3 re module.

This one is almost correct: ^[^-][a-z0-9\-.]{3,256}[^-]$
The [^-] at the start and end represent one character already, so you will need to change {3,256} into {1,254}
Also, you probably only want a-z, 0-9 and . at the start and end (not just anything except -), so the full regex becomes:
^[a-z0-9.][a-z0-9\-.]{1,254}[a-z0-9.]$

Use a lookahead to confirm that the line matches your basic requirement ((?=^[0-9a-z.-]{3,256}$)) and then apply further restrictions.:
^((?=^[0-9a-z.-]{3,256}$)[^-].*[^-])$
Regex101 link

You can use this:
^(?!-)[a-z0-9.-]{3,256}(?<!-)$
Where (?!-) is a negative lookahead assertion (not followed by a dash) and (?<!-) is a negative lookbehind (not preceded by a dash).

You don't want {3,256}... You want {1,254} because [^-] each also match 1 character at the beginning and end of your string, so you have to subtract them from the total amount of characters that you want.
^[a-z0-9.][a-z0-9.-]{1,254}[^a-z0-9.]$
Or, if you want to keep your values you can also use lookahead/behinds:
^(?=[a-z0-9.])[a-z0-9.-]{3,256}(?<=[a-z0-9.])$

Related

Am I implementing negative lookaheads correctly with my regex?

I'm a beginner with regex and stuck with creating regex with the following conditions:
Minimum of 8 characters
Maximum of 60 characters
Must contain 2 letters
Must contain 1 number
Must contain 1 special character
Special character cannot be the following: & ` ( ) = [ ] | ; " ' < >
So far I have the following...
(?=^.{8,60}$)(?=.*\d)(?=[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*
But my last two tests are failing and I have no idea why...
!##$%^*+-_~?,.{}!HR12345
123456789AB!
If you'd like to see my test and expected results, visit here: https://regexr.com/73m2o
My tests contains acceptable number of characters, appropriate number of alphabetic characters, and supported special characters... I don't know why it's failing!
Using .* to verify a character in the string can be very inefficient and I would suggest using negated character classes for the principle of contast.
Apart from that, there is a point in the question Must contain 1 special character that is not addressed yet in the current answers.
You can use a positive lookahead for that to assert one of the characters that you consider special.
^(?=[^\d\n]*\d)(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z])(?=[^!##$%^\n]*[!##$%^])[^&`()=[|;"''\]'<>\n]{8,60}$
Explanation
^ Start of string (Outside of the lookahead)
(?=[^\d\n]*\d) Assert a digit
(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z]) Assert 2 chars a-zA-Z
(?=[^!##$%^\n]*[!##$%^]) Assert a "special" character
[^&`()=[|;"''\]'<>\n]{8,60} Match 8-60 characters except for the ones that you don't want to match
$ End of string
See a regex demo.
Part of the issue is that you're missing the .* in (?=[a-zA-Z]{2,}). However, your implementation of "two or more" letters is not correct unless the letters must be consecutive.
You'll see that the string 1234567B89A! fails to match, even with the correction. You can fix this like so:
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z].*[a-zA-Z])(?!.*[&`()=[|;"''\]'<>]).*
The part I changed is (?=.*[a-zA-Z].*[a-zA-Z]) asserting that we can match a letter, zero or more other characters, and then another letter.
https://regex101.com/r/jEsK0S/1
Also, there's currently no assertion that there must be a special character, only an assertion of which ones shouldn't match. So I'd suggest adding another lookahead with a list of valid special characters.
Since the 2+ alphabetical characters can appear anywhere in the string, you need to prepend your check for them with .* (as you have with the other character classes you're checking for); otherwise the positive lookaheads will, in this scenario, try to assert their appearance at the beginning of the string (position 0):
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

AUTOHOTKEY: RegExMatch() a series of numbers and letters

I've tested my regular expression in http://www.regextester.com/
([0-9]{4,4})([A-Z]{2})([0-9]{1,3})
It's matching perfect with the following strings just as I want it.
1234AB123
2000AZ20
1000XY753
But when I try it in Autohotkey I get 0 result
test := RegExMatch("2000SY155","([0-9]{4,4})([A-Z]{2})([0-9]{1,3})")
MsgBox %test%
testing for:
first 4 characters must be a number
next 2 characters must be caps letters
next 1 to 3 characters must be numbers
You had to many ( )
This is the correct implementation:
test := RegExMatch("1234AB123","[0-9]{4,4}([A-Z]{2})[0-9]{1,3}")
Edit:
So what I noticed is you want this pattern to match, but you aren't really telling it much.
Here's what I was able to come up with that matches what you asked for, it's probably not the best solution but it works:
test := RegExMatch("1234AB567","^[0-9]{4,4}[A-Z]{2}(?![0-9]{4,})[0-9$]{1,3}")
Breaking it down:
RegExMatch(Haystack, NeedleRegEx [, UnquotedOutputVar = "", StartingPosition = 1])
Circumflex (^) and dollar sign ($) are called anchors because
they don't consume any characters; instead, they tie the pattern to
the beginning or end of the string being searched.
^ may appear at the beginning of a pattern to require the match to occur at
the very beginning of a line. For example, **
** matches abc123 but not 123abc.
$ may appear at the end of a pattern to require the match to occur at the very > end of a line. For example, abc$ matches 123abc but not abc123.
So by adding Circumflex we are requiring that our Pattern [0-9]{4,4} be at the beginning of the our Haystack.
Look-ahead and look-behind assertions: The groups (?=...), (?!...) are
called assertions because they demand a condition to be met but don't
consume any characters.
(?!...) is a negative look-ahead because it requires that the specified pattern not exist.
Our next Pattern is looking for two Uppercase Alpha Characters [A-Z]{2}(?![0-9]{4,}) that does not have four or more Numeric characters after it.
And finally our last Pattern that needs to match one to three Numeric characters as the last characters in our Haystack [0-9$]{1,3}
test := RegExMatch("2000SY155","([0-9]{4,4})([A-Z]{2})([0-9]{1,3})")
MsgBox %test%
But when I try it in Autohotkey I get 0 result
The message box correctly returns 1 for me, meaning your initial script works fine with my version. Usually, braces are no problem in RegExes, you can put there as many as you like... maybe your AutoHotkey version is outdated?

Regex to match string not ending with pattern

I try to find a regex that matches the string only if the string does not end with at least three '0' or more. Intuitively, I tried:
.*[^0]{3,}$
But this does not match when there one or two zeroes at the end of the string.
If you have to do it without lookbehind assertions (i. e. in JavaScript):
^(?:.{0,2}|.*(?!000).{3})$
Otherwise, use hsz's answer.
Explanation:
^ # Start of string
(?: # Either match...
.{0,2} # a string of up to two characters
| # or
.* # any string
(?!000) # (unless followed by three zeroes)
.{3} # followed by three characters
) # End of alternation
$ # End of string
You can try using a negative look-behind, i.e.:
(?<!000)$
Tests:
Test Target String Matches
1 654153640 Yes
2 5646549800 Yes
3 848461158000 No
4 84681840000 No
5 35450008748 Yes
Please keep in mind that negative look-behinds aren't supported in every language, however.
What wrong with the no-look-behind, more general-purpose ^(.(?!.*0{3,}$))*$?
The general pattern is ^(.(?!.* + not-ending-with-pattern + $))*$. You don't have to reverse engineer the state machine like Tim's answer does; you just insert the pattern you don't want to match at the end.
This is one of those things that RegExes aren't that great at, because the string isn't very regular (whatever that means). The only way I could come up with was to give it every possibility.
.*[^0]..$|.*.[^0].$|.*..[^0]$
which simplifies to
.*([^0]|[^0].|[^0]..)$
That's fine if you only want strings not ending in three 0s, but strings not ending in ten 0s would be long. But thankfully, this string is a bit more regular than some of these sorts of combinations, and you can simplify it further.
.*[^0].{0,2}$

Regular Expression to match set of arbitrary codes

I am looking for some help on creating a regular expression that would work with a unique input in our system. We already have some logic in our keypress event that will only allow digits, and will allow the letter A and the letter M. Now I need to come up with a RegEx that can match the input during the onblur event to ensure the format is correct.
I have some examples below of what would be valid. The letter A represents an age, so it is always followed by up to 3 digits. The letter M can only occur at the end of the string.
Valid Input
1-M
10-M
100-M
5-7
5-20
5-100
10-20
10-100
A5-7
A10-7
A100-7
A10-20
A5-A7
A10-A20
A10-A100
A100-A102
Invalid Input
a-a
a45
4
This matches all of the samples.
/A?\d{1,3}-A?\d{0,3}M?/
Not sure if 10-A10M should or shouldn't be legal or even if M can appear with numbers. If it M is only there without numbers:
/A?\d{1,3}-(A?\d{1,3}|M)/
Use the brute force method if you have a small amount of well defined patterns so you don't get bad corner-case matches:
^(\d+-M|\d+-\d+|A\d+-\d+|A\d+-A\d+)$
Here are the individual regexes broken out:
\d+-M <- matches anything like '1-M'
\d+-\d+ <- 5-7
A\d+-\d+ <- A5-7
A\d+-A\d+ <- A10-A20
/^[A]?[0-9]{1,3}-[A]?[0-9]{1,3}[M]?$/
Matches anything of the form:
A(optional)[1-3 numbers]-A(optional)[1-3 numbers]M(optional)
^A?\d+-(?:A?\d+|M)$
An optional A followed by one or more digits, a dash, and either another optional A and some digits or an M. The '(?: ... )' notation is a Perl 'non-capturing' set of parentheses around the alternatives; it means there will be no '$1' after the regex matches. Clearly, if you wanted to capture the various bits and pieces, you could - and would - do so, and the non-capturing clause might not be relevant any more.
(You could replace the '+' with '{1,3}' as JasonV did to limit the numbers to 3 digits.)
^A?\d{1,3}-(M|A?\d{1,3})$
^ -- the match must be done from the beginning
A? -- "A" is optional
\d{1,3} -- between one and 3 digits; [0-9]{1,3} also work
- -- A "-" character
(...|...) -- Either one of the two expressions
(M|...) -- Either "M" or...
(...|A?\d{1,3}) -- "A" followed by at least one and at most three digits
$ -- the match should be done to the end
Some consequences of changing the format. If you do not put "^" at the beginning, the match may ignore an invalid beginning. For example, "MAAMA0-M" would be matched at "A0-M".
If, likewise, you leave $ out, the match may ignore an invalid trail. For example, "A0-MMMMAAMAM" would match "A0-M".
Using \d is usually preferred, as is \w for alphanumerics, \s for spaces, \D for non-digit, \W for non-alphanumeric or \S for non-space. But you must be careful that \d is not being treated as an escape sequence. You might need to write it \\d instead.
{x,y} means the last match must occur between x and y times.
? means the last match must occur once or not at all.
When using (), it is treated as one match. (ABC)? will match ABC or nothing at all.
I’d use this regular expression:
^(?:[1-9]\d{0,2}-(?:M|[1-9]\d{0,2})|A[1-9]\d{0,2}-A?[1-9]\d{0,2})$
This matches either:
<number>-M or <number>-<number>
A<number>-<number> or A<number>-A<number>
Additionally <number> must not begin with a 0.