regexEXR V2.1 character set mismatch - regex

I was working on regex and studying the applications of character sets.
I tried the regex /[64-bit]/g, but the highlighted answer was contradictory; it highlighted uppercase letters, numbers and certain operators.
Why is that?

It's obvious that you're not using the right construct. Once you fix that, everything falls into place.
It doesn't make sense to use a character class if you want to match 64-bit literally. You should just use /64-bit/g as your regex in this case.
Character classes (specified by []) have different rules than the rest of the regex. They match a single character listed within (or not listed, if it's a negated char class).
A range of characters can also be specified to match, and that is where you have your problem. According to any online ASCII chart, 4 is #52 in the table, and b is #98. (Note that [4-bit] is actually an equivalent regex.) Between those two points, there are many characters, including the uppercase letters. That is why you are getting unexpected matches.

Related

Regex to have two out of three character types [duplicate]

My client has requested that passwords on their system must following a specific set of validation rules, and I'm having great difficulty coming up with a "nice" regular expression.
The rules I have been given are...
Minimum of 8 character
Allow any character
Must have at least one instance from three of the four following character types...
Upper case character
Lower case character
Numeric digit
"Special Character"
When I pressed more, "Special Characters" are literally everything else (including spaces).
I can easily check for at least one instance for all four, using the following...
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9]).{8,}$
The following works, but it's horrible and messy...
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])).{8,}$
So you don't have to work it out yourself, the above is checking for (1,2,3|1,2,4|1,3,4|2,3,4) which are the 4 possible combinations of the 4 groups (where the number relates to the "types" in the set of rules).
Is there a "nicer", cleaner or easier way of doing this?
(Please note, this is going to be used in an <asp:RegularExpressionValidator> control in an ASP.NET website, so therefore needs to be a valid regex for both .NET and javascript.)
It's not much of a better solution, but you can reduce [^a-zA-Z0-9] to [\W_], since a word character is all letters, digits and the underscore character. I don't think you can avoid the alternation when trying to do this in a single regex. I think you have pretty much have the best solution.
One slight optimization is that \d*[a-z]\w_*|\d*[A-Z]\w_* ~> \d*[a-zA-Z]\w_*, so I could remove one of the alternation sets. If you only allowed 3 out of 4 this wouldn't work, but since \d*[A-Z][a-z]\w_* was implicitly allowed it works.
(?=.{8,})((?=.*\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])).*
Extended version:
(?=.{8,})(
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])|
(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|
(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])
).*
Because of the fourth condition specified by the OP, this regular expression will match even unprintable characters such as new lines. If this is unacceptable then modify the set that contains \W to allow for more specific set of special characters.
I'd like to improve the accepted solution with this one
^(?=.{8,})(
(?=.*[^a-zA-Z\s])(?=.*[a-z])(?=.*[A-Z])|
(?=.*[^a-zA-Z0-9\s])(?=.*\d)(?=.*[a-zA-Z])
).*$
The above Regex worked well for most scenarios except for strings such as "AAAAAA1$", "$$$$$$1a"
This could be an issue only in iOS ( Objective C and Swift) that the regex "\d" has issues
The following fix worked in iOS, i.e changing to [0-9] for digits
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])).{8,}$
Password must meet at least 3 out of the following 4 complexity rules,
[at least 1 uppercase character (A-Z) at least 1 lowercase character (a-z) at least 1 digit (0-9) at least 1 special character — do not forget to treat space as special characters too]
at least 10 characters
at most 128 characters
not more than 2 identical characters in a row (e.g., 111 not allowed)
'^(?!.(.)\1{2}) ((?=.[a-z])(?=.[A-Z])(?=.[0-9])|(?=.[a-z])(?=.[A-Z])(?=.[^a-zA-Z0-9])|(?=.[A-Z])(?=.[0-9])(?=.[^a-zA-Z0-9])|(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])).{10,127}$'
(?!.*(.)\1{2})
(?=.[a-z])(?=.[A-Z])(?=.*[0-9])
(?=.[a-z])(?=.[A-Z])(?=.*[^a-zA-Z0-9])
(?=.[A-Z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
.{10,127}

re compile error: sre_constants.error: bad character range [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

RegEx: Non-repeating patterns?

I'm wrestling with how to write a specific regex, and thought I'd come here for a little guidance.
What I'm looking for is an expression that does the following:
Character length of 7 or more
Any single character is one of four patterns (uppercase letters, lowercase letters, numbers and a specific set of special characters. Let's say #$%#).
(Now, here's where I'm having problems):
Another single character would also match with one of the patterns described above EXCEPT for the pattern that was already matched. So, if the first pattern matched is an uppercase letter, the second character match should be a lowercase letter, number or special character from the pattern.
To give you an example, the string AAAAAA# would match, as would the string AAAAAAa. However, the string AAAAAAA, nor would the string AAAAAA& (as the ampersand was not part of the special character pattern).
Any ideas? Thanks!
If you only need two different kinds of characters, you can use the possessive quantifier feature (available in Objective C):
^(?:[a-z]++|[A-Z]++|[0-9]++|[#$%#]++)[a-zA-Z0-9#$%#]+$
or more concise with an atomic group:
^(?>[a-z]+|[A-Z]+|[0-9]+|[#$%#]+)[a-zA-Z0-9#$%#]+$
Since each branch of the alternation is a character class with a possessive quantifier, you can be sure that the first character matched by [a-zA-Z0-9#$%#]+ is from a different class.
About the string size, check it first separately with the appropriate function, if the size is too small, you will avoid the cost of a regex check.
First you need to do a negative lookahead to make sure the entire string doesn't consist of characters from a single group:
(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)
Then check that it does contain at least 7 characters from the list of legal characters (and nothing else):
^[a-zA-Z0-9#$%#]{7,}$
Combining them (thanks to Shlomo for pointing that out):
^(?!(?:[a-z]*|[A-Z]*|[0-9]*|[#$%#]*)$)[a-zA-Z0-9#$%#]{7,}$

htaccess regular expression explaination

I have been tasked with changing an .htaccess file. Unfortunately, I know very little about regular expressions, and so most of the file is unreadable for me. In particular, I have these two REs...
1: ^(?!((www|web3|web4|web5|web6|cm|test)\.mydomain\.com)|(?:(?:\d+\.){3}(?:\d+))$).*$
2: ^/([^/][^/])/([^/][^/])/([^/]+)/Job-Posting/$ /Misc/jobposting\.asp\?country=$1&state=$2&city=$3
For the first one, I understand the first half, more or less. it's trying to match against something that ISN'T www.mydomain.com, or web3.mydomain.com, etc., and that it may match that zero or one times. What I'm not clear on is what the second half of that does. My research suggests that ?: implies some sort of flag, but I didn't see any example that explained what exactly that meant. Please explain what this part means, as well as provide an example that would match it.
For the second one, the comments say this is applicable for a url containing /US/NY/Rochester/Job-Posting/. From this I can infer that ^/ means one character, but again, I couldnt find that in my research so far. What is the formal definition of ^/ ? What is the significance of putting it into square brackets [^/] ?
If I can get a handle on these two RE I should be able to adapt them to my needs. Your help is much appreciated.
?: doesn't match anything in particular, it modifies the behavior of the parenthesis. The ?: means the parenthesis are non-capturing, and thus cannot be referenced in the rule. Non capturing parens are good to use when you don't need to reference the captured text because the system doesn't have to 'remember' the text, which saves resources.
the code in question:
(?:(?:\d+\.){3}(?:\d+))
matches one or more digits followed by a period times three, then one or more digit. This will match IP addresses (ex 127.0.0.1). This will also match 123456.1.1.3456789, so you might want to restrict the number of characters allowed (?:(?:\d{1,3}.){3}(?:\d{1,3})), thought I haven't tested this so take it with a grain of salt.
Info on non capturing groupings.
The second item revolves around using square brackets as a character set. Square brackets match anything noted inside them, with ^ negating the match. So [ad02] will match any of the four characters a,d,0 or 2, while [^ad02] will match any character that is not a,d,0, or 2. So, ^/ means any character that is not /.
One of the tricky things about square brackets is the number of items they will match. [^/] will match one character, but so does [ad02]. It doesn't matter how many characters you have in the set, it still obeys the modifiers on the brackets. So [^/]{3} will match any series of 3 characters that does not contain a forward slash, while [^/]{2} will match a 2 character string with the same restriction.
For more info on character sets see Character Classes or Character Sets

How to include special chars in this regex

First of all I am a total noob to regular expressions, so this may be optimized further, and if so, please tell me what to do. Anyway, after reading several articles about regex, I wrote a little regex for my password matching needs:
(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(^[A-Z]+[a-z0-9]).{8,20}
What I am trying to do is: it must start with an uppercase letter, must contain a lowercase letter, must contain at least one number must contain at least on special character and must be between 8-20 characters in length.
The above somehow works but it doesn't force special chars(. seems to match any character but I don't know how to use it with the positive lookahead) and the min length seems to be 10 instead of 8. what am I doing wrong?
PS: I am using http://gskinner.com/RegExr/ to test this.
Let's strip away the assertions and just look at your base pattern alone:
(^[A-Z]+[a-z0-9]).{8,20}
This will match one or more uppercase Latin letters, followed by by a single lowercase Latin letter or decimal digit, followed by 8 to 20 of any character. So yes, at minimum this will require 10 characters, but there's no maximum number of characters it will match (e.g. it will allow 100 uppercase letters at the start of the string). Furthermore, since there's no end anchor ($), this pattern would allow any trailing characters after the matched substring.
I'd recommend a pattern like this:
^(?=.*[a-z])(?=.*[0-9])(?=.*[!##$])[A-Z]+[A-Za-z0-9!##$]{7,19}$
Where !##$ is a placeholder for whatever special characters you want to allow. Don't forget to escape special characters if necessary (\, ], ^ at the beginning of the character class, and- in the middle).
Using POSIX character classes, it might look like this:
^(?=.*[:lower:])(?=.*[:digit:])(?=.*[:punct:])[:upper:]+[[:alnum:][:punct:]]{7,19}$
Or using Unicode character classes, it might look like this:
^(?=.*[\p{Ll}])(?=.*\d)(?=.*[\p{P}\p{S}])[\p{Lu}]+[\p{L}\d\p{P}\p{S}]{7,19}$
Note: each of these considers a different set of 'special characters', so they aren't identical to the first pattern.
The following should work:
^(?=.*[a-z])(?=.*[0-9])(?=.*[^a-zA-Z0-9])[A-Z].{7,19}$
I removed the (?=.*[A-Z]) because the requirement that you must start with an uppercase character already covers that. I added (?=.*[^a-zA-Z0-9]) for the special characters, this will only match if there is at least one character that is not a letter or a digit. I also tweaked the length checking a little bit, the first step here was to remove the + after the [A-Z] so that we know exactly one character has been matched so far, and then changing the .{8,20} to .{7,19} (we can only match between 7 and 19 more characters if we already matched 1).
Well, here is how I would write it, if I had such requirements - excepting situations where it's absolutely not possible or practical, I prefer to break up complex regular expressions. Note that this is English-specific, so a Unicode or POSIX character class (where supported) may make more sense:
/^[A-Z]/ && /[a-z]/ && /[1-9]/ && /[whatever special]/ && ofCorrectLength(x)
That is, I would avoid trying to incorporate all the rules at once.