Regex expression not following the given length - regex

I have the regex as follow:
^[a-z|A-Z]((?!.*--).*[[:alnum:]]|[-]){1,22}[a-z|A-Z|0-9]$
For some reason, the length of the given string if set to 24+ is still accepted. The original capture group needs to be: string between 3-24 alphanumeric characters, must begin with a letter, end with a letter or digit, and cannot contain consecutive hyphens.
Why is the regex not checking the quantifier of length 1-22 in the middle part?

The main pattern should be just ^[a-z].{1,22}[a-z\d]$ to specify that the whole match must be 3-24 characters and have the required beginning and ending characters. You can use the case-insensitive modifier to make a-z match A-Z as well.
Then add a negative lookahead to prohibit .*--. The final result is:
^(?!.*--)[a-z].{1,22}[a-z\d]$
DEMO

Related

How can I use a negative lookahead in an anchored regular-expression pattern?

My web-application allows users to specify custom URI path components which comply with the following restrictions:
All characters must be lowercase.
Be at least 2 characters long.
First character must match [a-z].
The last character must match [0-9a-z].
All other characters must match [0-9a-z_\-].
The - and _ characters must not exist as a consecutive run of 2 or more.
i.e. The string must not contain --, __, _-, or -_.
I've implemented the first 5 rules in a regular-expression easily enough:
^[a-z][0-9_a-z\-]*[0-9a-z]$
...however I don't know how to implement the last rule in a single regex.
I thought I'd start by just trying to change the regex so it won't match -- (as in a--b) - and I was thinking it could be a negative-lookahead, as it's asserting that that regex does not contain -- (right?):
Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors. [...] The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not
But adding (?!\-\-) to the regular expression (on regex101.com) in various spots, or as a lookbehind (?<!\-\-) does not cause strings like a--b to not-match.
i.e. all of these patterns match foo--bar when it shouldn't.
(?!\-\-)^[a-z][0-9_a-z\-]*[0-9a-z]$
^(?!\-\-)[a-z][0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)[0-9_a-z\-]*[0-9a-z]$
^[a-z](?!\-\-)(?:[0-9_a-z\-]*)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?!\-\-)[0-9a-z]$
^[a-z][0-9_a-z\-]*(?<!\-\-)[0-9a-z]$
You can place the negative lookahead right after matching a-z at the start of the string.
As you don't want to match any combination of - and - you can use 2 character classes (?!.*[_-][_-])
As the [_-][_-] part can occur anywhere in the string, you can precede it with .* optionally matching any character.
If you omit .* the assertion only runs on the current position, which in this case would be after matching the a-z at the start of the string.
^[a-z](?!.*[_-][_-])[0-9_a-z-]*[0-9a-z]$

Regex pattern to match string that's not followed by a colon

Using regex, I'm trying to match any string of characters that meets the following conditions (in the order displayed):
Contains a dollar sign $; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, underscores, periods (dots), opening brackets, and/or closing brackets [a-zA-Z0-9_.\[\]]*; then
one pipe character |; then
one at sign #; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, and/or underscores [a-zA-Z0-9_]*; then
zero colons :
In other words, if a colon is found at the end of the string, then it should not count as a match.
Here are some examples of valid matches:
$tmp1|#hello
$x2.h|#hi_th3re
Valid match$here|#in_the middle of other characters
And here are some examples of invalid matches:
$tmp2|#not_a_match:"because there is a colon"
$c.4a|#also_no_match:
Here are some of the patterns I've tried:
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]*)(\|#)([a-zA-Z][a-zA-Z0-9_]*(?!.[:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*(?![:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:])
This pattern will do what you need
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+[\w]*+(?!:)
Regex Demo
I am using possessive quantifiers to cut down the backtracking using [\w]*+. You can also use atomic groups instead of possessive quantifiers like
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+(?>[\w]*)(?!:)
NOTE
\w => [A-Za-z0-9_]
I tested your third pattern in Regex 101 and it appears to be working correctly:
^.*(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:]).*$
The only change I needed to make to the regex to make it work was to add anchors ^ and $ to the start and end of the regex. I also allowed for your pattern to occur as a substring in the middle of a larger string.
By the way, you had the following example as a string which should not match:
$tmp2|#not_a_match:"because there is a colon"
However, even if we remove the colon from this string it will still not match because it contains quotes which are not allowed.
Regex101

Match a String with optional number of hyphens - Java Regex

I am trying to match Strings with optional number of hyphens.
For example,
string1-string2,
string1-string2-string3,
string1-string2-string3 and so on.
Right now, I have something which matches one hyphen. How can I make the regex to match optional number of hyphens?
My current regex is: arn:aws:iam::\d{12}:[a-zA-Z]/?[a-zA-Z]-?[a-zA-Z]*
What do I need to add?
Use this regex:
^\\w+(-\\w+)*$
Explanation:
\\w+ - match any string containing [a-zA-Z_0-9]
(-\\w+)* - match a hyphen followed by a string zero or more times
Regex101
Note that this won't match an empty string, or a string containing weird characters. You could handle these cases manually or you could update the regex.

Regex - alphabetical with hyphen

I would like to have a regular expression that checks if string of up to 14 alpha-numeric chars. can include hyphen, not at the beginning or end.
This what I have so far:
var patt = new RegExp("^([a-zA-Z0-9]+(-[a-zA-Z0-9])*){1,14}$");
But it's not working - http://jsfiddle.net/u6cWs/1/
Any idea?
You need to use positive lookahead (count number of alpha-numeric chars with optional hyphen).
If only single hyphen is allowed:
^(?=([a-zA-Z0-9]-?){1,14}$)[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)?$
Demo
If multiple hyphens are allowed:
^(?=([a-zA-Z0-9]-?){1,14}$)[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*$
Demo
Additional option:
^[a-zA-Z0-9](?:-?[a-zA-Z0-9]){0,13}$
Demo
Here is a simple solution that is faster because it does not use lookaheads:
^[A-Za-z0-9](?:[-A-Za-z0-9]{0,12}[A-Za-z0-9])?$
See demo.
How does it work?
Like your original pattern, this regex is anchored between ^ and $, enforcing our limit on the number of characters.
The first character has to be a letter or digit.
The rest of the string, included in a (?: non-capturing group, is made optional by the ? at the end. This rest of the string, if it is there (more than one character), must end with a letter or digit. In the middle, you can have between 0 and 12 letters, digits or hyphens.
Optionally
If you want your regex to be a little shorter, turn on the case-insensitive option, and remove either the lower-case chars or the upper-case ones, for instance:
^[a-z0-9](?:[-a-z0-9]{0,12}[a-z0-9])?$
Use two regexes for simplicity and readability.
First check that it matches this:
/^[A-Za-z0-9-]{1,14}$/
then check that it does NOT match this:
/^-|-$/

Regular expression doesn't match if a character participated in a previous match

I have this regex:
(?:\S)\++(?:\S)
Which is supposed to catch all the pluses in a query string like this:
?busca=tenis+nike+categoria:"Tenis+e+Squash"&pagina=4&operador=or
It should have been 4 matches, but there are only 3:
s+n
e+c
s+e
It is missing the last one:
e+S
And it seems to happen because the "e" character has participated in a previous match (s+e), because the "e" character is right in the middle of two pluses (Teni s+e+S quash).
If you test the regex with the following input, it matches the last "+":
?busca=tenis+nike+categoria:"Tenis_e+Squash"&pagina=4&operador=or
(changed "s+e" for "s_e" in order not to cause the "e" character to participate in the match).
Would someone please shed a light on that?
Thanks in advance!
In a consecutive match the search for the next match starts at the position of the end of the previous match. And since the the non-whitespace character after the + is matched too, the search for the next match will start after that non-whitespace character. So a sequence like s+e+S you will only find one match:
s+e+S
\_/
You can fix that by using look-around assertions that don’t match the characters of the assumption like:
\S\++(?=\S)
This will match any non-whitespace character followed by one or more + only if it is followed by another non-whitespace character.
But tince whitespace is not allowed in a URI query, you don’t need the surrounding \S at all as every character is non-whitespace. So the following will already match every sequence of one or more + characters:
\++
You are correct: The fourth match doesn't happen because the surrounding character has already participated in the previous match. The solution is to use lookaround (if your regex implementation supports it - JavaScript doesn't support lookbehind, for example).
Try
(?<!\s)\++(?!\s)
This matches one or more + unless they are surrounded by whitespace. This also works if the plus is at the start or the end of the string.
Explanation:
(?<!\s) # assert that there is no space before the current position
# (but don't make that character a part of the match itself)
\++ # match one or more pluses
(?!\s) # assert that there is no space after the current position
If your regex implementation doesn't support lookbehind, you could also use
\S\++(?!\s)
That way, your match would contain the character before the plus, but not after it, and therefore there will be no overlapping matches (Thanks Gumbo!). This will fail to match a plus at the start of the string, though (because the \S does need to match a character). But this is probably not a problem.
You can use the regex:
(?<=\S)\++(?=\S)
To match only the +'s that are surrounded by non-whitespace.