Regex only allow letters and some characters - regex

I am attempting to create a regex that only allows letters upper or lowercase, and the characters of space, '-', ',' '.', '(', and ')'. This is what I have so far but for some reason it is still letting me enter numbers
^[a-zA-Z -,.()]*$
any help would be great! Thanks.

- is special in character class. It is used to define a range as you've done with a-z.
To match a literal - you need to either escape it or place it such that it'll not function as range operator:
^[a-zA-Z \-,.()]*$
^^ escaping \
or
^[-a-zA-Z ,.()]*$
^ placing it at the beginning.
or
^[a-zA-Z -,.()-]*$
^ placing it at the end.
and interestingly
^[a-z-A-Z -,.()]*$
^ placing in the middle of two ranges.
In the final case - is place between a-z and A-Z since both the characters surrounding the -(the one which we want to treat literally) that is z and A are already involved in ranges, the - is treated literally again.
Of all the mentioned methods, the escaping method is recommended as it makes your code easier to read and understand. Anyone seeing the \ would expect that an escape is intended. Placing the - at the beginning(end) will create problems if you later add a character before(after) it in the character class without escaping the - thus forming a range.

Well, there is an issue in that -, is being interpreted as a range, like a-z, allowing all characters from space to comma. Escape that and at least some of the bugs should be fixed.
^[a-zA-Z \-,.()]*$
Strictly speaking, you should probably also escape the . and (), too, since those have special meaning in regular expressions. The Javascript regex engine (where I was testing) seems to interpret them literally within a [] context, anyway, but it's always far better to be explicit.
^[a-zA-Z \-,\.\(\)]*$
However, this still shouldn't be allowing 0-9 digits, so your actual code that uses this regular expression probably has an issue, as well.

The  -, in [a-zA-Z -,.()] describes a range from   (0x20) to , (0x2C). And that is equivalent to [ !"#$%'()*+,]. You should either escape the - or place it somewhere else where it is not interpreted as a range indicator.
But that’s not the cause of this issue as the digits are from 0x30 to 0x39.

I tried that with javascript and it works fine. The others are correct, though. If in javascript, check if everything works fine or else the check will not happen at all.

Related

re compile error: sre_constants.error: bad character range [duplicate]

How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");

Period in .Net 3.5 Regex.IsMatch

I came across this regular expression in vb.net 3.5 code:
Regex.IsMatch(strString, "^[\w\s.+'\-\(\)\/\,\&\#]+$")
What is really confusing me is the ".+" part. I was under the impression that the period means any character and the plus sign means one or more. Following this, I feel like this regular expression should allow anything! But it doesn't, so I must be misunderstanding something. In testing it, it seems like the period and the plus sign are being taken as literals.
Could somebody help explain this to me?
Thanks!
The issue is that all of those characters are enclosed in a [character-group]. The escaping rules are different in character-groups than they are elsewhere in a RegEx expression. For instance, according to the MSDN documentation, \b inside a character-group means a backspace character whereas, outside of a character-group, it is an anchor that matches a word boundary.
According to the Regular-Expressions.info documentation:
In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (), the caret (^), and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash.
Therefore, in your example RegEx expression, it looks for any one of the characters in that bracketed list, including either the literal . or + character. If you think about it, it wouldn't make any sense to use a . to mean "any character" inside of a character-group. Doing so would make the group, itself, moot. And certainly, using the + character to mean "one or more times" inside of a character-group really makes no sense.
.+ is mean any symbol in an amount of one or more. Maybe you need to escape dot like \.+?
Within the square parenthesis, dot and plus don't have their special meaning. The square brackets define a "character class". It does not contain a string but a set of characters allowed at this position.
So the expression [\w\s.+'-()/\,\&#] creates a character class of letters, digits, underscore, spaces, dots, pluses, single quotes, minuses, opening round brackets, closing round brackets, slashes, commas, ampersands and hashmarks.
The + behind the square parenthesis means you expect one or more characters of this character class.

How to include special chars in this regex

First of all I am a total noob to regular expressions, so this may be optimized further, and if so, please tell me what to do. Anyway, after reading several articles about regex, I wrote a little regex for my password matching needs:
(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(^[A-Z]+[a-z0-9]).{8,20}
What I am trying to do is: it must start with an uppercase letter, must contain a lowercase letter, must contain at least one number must contain at least on special character and must be between 8-20 characters in length.
The above somehow works but it doesn't force special chars(. seems to match any character but I don't know how to use it with the positive lookahead) and the min length seems to be 10 instead of 8. what am I doing wrong?
PS: I am using http://gskinner.com/RegExr/ to test this.
Let's strip away the assertions and just look at your base pattern alone:
(^[A-Z]+[a-z0-9]).{8,20}
This will match one or more uppercase Latin letters, followed by by a single lowercase Latin letter or decimal digit, followed by 8 to 20 of any character. So yes, at minimum this will require 10 characters, but there's no maximum number of characters it will match (e.g. it will allow 100 uppercase letters at the start of the string). Furthermore, since there's no end anchor ($), this pattern would allow any trailing characters after the matched substring.
I'd recommend a pattern like this:
^(?=.*[a-z])(?=.*[0-9])(?=.*[!##$])[A-Z]+[A-Za-z0-9!##$]{7,19}$
Where !##$ is a placeholder for whatever special characters you want to allow. Don't forget to escape special characters if necessary (\, ], ^ at the beginning of the character class, and- in the middle).
Using POSIX character classes, it might look like this:
^(?=.*[:lower:])(?=.*[:digit:])(?=.*[:punct:])[:upper:]+[[:alnum:][:punct:]]{7,19}$
Or using Unicode character classes, it might look like this:
^(?=.*[\p{Ll}])(?=.*\d)(?=.*[\p{P}\p{S}])[\p{Lu}]+[\p{L}\d\p{P}\p{S}]{7,19}$
Note: each of these considers a different set of 'special characters', so they aren't identical to the first pattern.
The following should work:
^(?=.*[a-z])(?=.*[0-9])(?=.*[^a-zA-Z0-9])[A-Z].{7,19}$
I removed the (?=.*[A-Z]) because the requirement that you must start with an uppercase character already covers that. I added (?=.*[^a-zA-Z0-9]) for the special characters, this will only match if there is at least one character that is not a letter or a digit. I also tweaked the length checking a little bit, the first step here was to remove the + after the [A-Z] so that we know exactly one character has been matched so far, and then changing the .{8,20} to .{7,19} (we can only match between 7 and 19 more characters if we already matched 1).
Well, here is how I would write it, if I had such requirements - excepting situations where it's absolutely not possible or practical, I prefer to break up complex regular expressions. Note that this is English-specific, so a Unicode or POSIX character class (where supported) may make more sense:
/^[A-Z]/ && /[a-z]/ && /[1-9]/ && /[whatever special]/ && ofCorrectLength(x)
That is, I would avoid trying to incorporate all the rules at once.

Regex to check if a string contains at least A-Za-z0-9 but not an &

I am trying to check if a string contains at least A-Za-z0-9 but not an &.
My experience with regexes is limited, so I started with the easy part and got:
.*[a-zA-Z0-9].*
However I am having troubling combining this with the does not contain an & portion.
I was thinking along the lines of ^(?=.*[a-zA-Z0-9].*)(?![&()]).* but that does not seem to do the trick.
Any help would be appreciated.
I'm not sure if this what you meant, but here is a regular expression that will match any string that:
contains at least one alpha-numeric character
does not contain a &
This expression ensures that the entire string is always matched (the ^ and $ at beginning and end), and that none of the characters matched are a "&" sign (the [^&]* sections):
^[^&]*[a-zA-Z0-9][^&]*$
However, it might be clearer in code to simply perform two checks, if you are not limited to a single expression.
Also, check out the \w class in regular expressions (it might be the better solution for catching alphanumeric chars if you want to allow non-ASCII characters).

Regex doesn't recognize underscore as special character

/(?=^.{8,}$)(?=.*[_!##$%^&*-])(?=.*\d)(?=.*\W+)(?![.\n])(?=.*[a-z])(?=.*[A-Z]).*$/
I'm trying to make a regex for password validation such that the password must be at least 8 chars and include one uppercase, one lowercase, one number, and one special char. It works fine except it won't recognize the underscore (_) as a special character. I.e., Pa$$w0rd matches, but Pass_w0rd doesn't. Thoughts?
This portion of the regex seems to be looking for special characters:
(?=.*[!##$%^&*-])
Note that the character class does not include an underscore, try changing this to the following:
(?=.*[_!##$%^&*-])
You will also need to modify or remove this portion of the regex:
(?=.*\W+)
\W is equivalent to [^a-zA-Z0-9_], so if an underscore is your only special character this portion of the regex will cause it to fail. Instead, change it to the following (or remove it, it is redundant since you already check for special characters earlier):
(?=.*[^\w_])
Complete regex:
/(?=^.{8,}$)(?=.*[_!##$%^&*-])(?=.*\d)(?=.*[^\w_])(?![.\n])(?=.*[a-z])(?=.*[A-Z]).*$/
This one here works as well. It defines a special character as by excluding alphanumerical characters and whitespace, so it includes the underscore:
(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[\d])(?=.*?[^\sa-zA-Z0-9]).{8,}
The problem is that the only thing that could possibly satisfy the \W, by definition, is something other than [a-zA-Z0-9_]. The underscore is specifically not matched by \W, and in Pass_w0rd, nothing else is matched by it, either.
I suspect that having both your specific list of special characters and the \W is overkill. Pick one and you're likely to be happier. I also recommend splitting this whole thing up into several separate tests for much better maintainability.
A much simpler regex that works for you is this:
/(?=.*[_!##$%^&*-])(?=.*\d)(?!.*[.\n])(?=.*[a-z])(?=.*[A-Z])^.{8,}$/
There were few mistakes in your original regex eg:
You don't need to use lookahead for making sure there are 8 chars in input
negative lookahead [.\n] was missing .*
(?=.*\W+) is superfluous and probably not serving any purpose