Regular expression for alphanumeric and underscores - regex

Is there a regular expression which checks if a string contains only upper and lowercase letters, numbers, and underscores?

To match a string that contains only those characters (or an empty string), try
"^[a-zA-Z0-9_]*$"
This works for .NET regular expressions, and probably a lot of other languages as well.
Breaking it down:
^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string
If you don't want to allow empty strings, use + instead of *.
As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:
/^\w+$/
\w is equivalent to [A-Za-z0-9_], which is pretty much what you want (unless we introduce Unicode to the mix).
Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.

You want to check that each character matches your requirements, which is why we use:
[A-Za-z0-9_]
And you can even use the shorthand version:
\w
Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:
^
To indicate the string must start with that character, then use
$
To indicate the string must end with that character. Then use
\w+ or \w*
To indicate "1 or more", or "0 or more". Putting it all together, we have:
^\w*$

Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:
^[[:alnum:]_]+$
However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:
^\w+$
Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.

Um...question: Does it need to have at least one character or no? Can it be an empty string?
^[A-Za-z0-9_]+$
Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *:
^[A-Za-z0-9_]*$
If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:
^\w+$
Or
^\w*$

Use
^([A-Za-z]|[0-9]|_)+$
...if you want to be explicit, or:
^\w+$
...if you prefer concise (Perl syntax).

In computer science, an alphanumeric value often means the first character is not a number, but it is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).
Here is how you would do that:
Tested under PHP:
$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'
Or take
^[A-Za-z_][A-Za-z\d_]*$
and place it in your development language.

Use lookaheads to do the "at least one" stuff. Trust me, it's much easier.
Here's an example that would require 1-10 characters, containing at least one digit and one letter:
^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$
Note: I could have used \w, but then ECMA/Unicode considerations come into play, increasing the character coverage of the \w "word character".

This works for me. I found this in the O'Reilly's "Mastering Regular Expressions":
/^\w+$/
Explanation:
^ asserts position at start of the string
\w+ matches any word character (equal to [a-zA-Z0-9_])
"+" Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
Verify yourself:
const regex = /^\w+$/;
const str = `nut_cracker_12`;
let m;
if ((m = regex.exec(str)) !== null) {
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Try these multi-lingual extensions I have made for string.
IsAlphaNumeric - The string must contain at least one alpha (letter in Unicode range, specified in charSet) and at least one number (specified in numSet). Also, the string should consist only of alpha and numbers.
IsAlpha - The string should contain at least one alpha (in the language charSet specified) and consist only of alpha.
IsNumeric - The string should contain at least one number (in the language numSet specified) and consist only of numbers.
The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on Unicode Chart.
API:
public static bool IsAlphaNumeric(this string stringToTest)
{
// English
const string charSet = "a-zA-Z";
const string numSet = #"0-9";
// Greek
//const string charSet = #"\u0388-\u03EF";
//const string numSet = #"0-9";
// Bengali
//const string charSet = #"\u0985-\u09E3";
//const string numSet = #"\u09E6-\u09EF";
// Hindi
//const string charSet = #"\u0905-\u0963";
//const string numSet = #"\u0966-\u096F";
return Regex.Match(stringToTest, #"^(?=[" + numSet + #"]*?[" + charSet + #"]+)(?=[" + charSet + #"]*?[" + numSet + #"]+)[" + charSet + numSet +#"]+$").Success;
}
public static bool IsNumeric(this string stringToTest)
{
//English
const string numSet = #"0-9";
//Hindi
//const string numSet = #"\u0966-\u096F";
return Regex.Match(stringToTest, #"^[" + numSet + #"]+$").Success;
}
public static bool IsAlpha(this string stringToTest)
{
//English
const string charSet = "a-zA-Z";
return Regex.Match(stringToTest, #"^[" + charSet + #"]+$").Success;
}
Usage:
// English
string test = "AASD121asf";
// Greek
//string test = "Ϡϛβ123";
// Bengali
//string test = "শর৩৮";
// Hindi
//string test = #"क़लम३७ख़";
bool isAlphaNum = test.IsAlphaNumeric();

The following regex matches alphanumeric characters and underscore:
^[a-zA-Z0-9_]+$
For example, in Perl:
#!/usr/bin/perl -w
my $arg1 = $ARGV[0];
# Check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
print "Failed.\n";
} else {
print "Success.\n";
}

This should work in most of the cases.
/^[\d]*[a-z_][a-z\d_]*$/gi
And by most I mean,
abcd True
abcd12 True
ab12cd True
12abcd True
1234 False
Explanation
^ ... $ - match the pattern starting and ending with
[\d]* - match zero or more digits
[a-z_] - match an alphabet or underscore
[a-z\d_]* - match an alphabet or digit or underscore
/gi - match globally across the string and case-insensitive

For those of you looking for unicode alphanumeric matching, you might want to do something like:
^[\p{L} \p{Nd}_]+$
Further reading is at Unicode Regular Expressions (Unicode Consortium) and at Unicode Regular Expressions (Regular-Expressions.info).

For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :
^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters
[^a-zA-Z0-9 _]{1,255}

I believe you are not taking Latin and Unicode characters in your matches.
For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.
You can, alternatively, use this approach:
^[A-ZÀ-Ýa-zà-ý0-9_]+$

^\w*$ will work for the below combinations:
1
123
1av
pRo
av1

For Java, only case insensitive alphanumeric and underscore are allowed.
^ Matches the string starting with any characters
[a-zA-Z0-9_]+ Matches alpha-numeric character and underscore.
$ Matches the string ending with zero or more characters.
public class RegExTest {
public static void main(String[] args) {
System.out.println("_C#".matches("^[a-zA-Z0-9_]+$"));
}
}

To check the entire string and not allow empty strings, try
^[A-Za-z0-9_]+$

This works for me. You can try:
[\\p{Alnum}_]

Required Format
Allow these three:
0142171547295
014-2171547295
123abc
Don't allow other formats:
validatePnrAndTicketNumber(){
let alphaNumericRegex=/^[a-zA-Z0-9]*$/;
let numericRegex=/^[0-9]*$/;
let numericdashRegex=/^(([1-9]{3})\-?([0-9]{10}))$/;
this.currBookingRefValue = this.requestForm.controls["bookingReference"].value;
if(this.currBookingRefValue.length == 14 && this.currBookingRefValue.match(numericdashRegex)){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else if(this.currBookingRefValue.length ==6 && this.currBookingRefValue.match(alphaNumericRegex)){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else if(this.currBookingRefValue.length ==13 && this.currBookingRefValue.match(numericRegex) ){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else{
this.requestForm.controls["bookingReference"].setErrors({'pattern': true});
}
}
<input name="booking_reference" type="text" [class.input-not-empty]="bookingRef.value"
class="glyph-input form-control floating-label-input" id="bookings_bookingReference"
value="" maxlength="14" aria-required="true" role="textbox" #bookingRef
formControlName="bookingReference" (focus)="resetMessageField()" (blur)="validatePnrAndTicketNumber()"/>

Related

Regex: Only matching at the end of String not anywhere in elastic [duplicate]

The following should be matched:
AAA123
ABCDEFGH123
XXXX123
can I do: ".*123" ?
Yes, you can. That should work.
. = any char except newline
\. = the actual dot character
.? = .{0,1} = match any char except newline zero or one times
.* = .{0,} = match any char except newline zero or more times
.+ = .{1,} = match any char except newline one or more times
Yes that will work, though note that . will not match newlines unless you pass the DOTALL flag when compiling the expression:
Pattern pattern = Pattern.compile(".*123", Pattern.DOTALL);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.matches();
Use the pattern . to match any character once, .* to match any character zero or more times, .+ to match any character one or more times.
The most common way I have seen to encode this is with a character class whose members form a partition of the set of all possible characters.
Usually people write that as [\s\S] (whitespace or non-whitespace), though [\w\W], [\d\D], etc. would all work.
.* and .+ are for any chars except for new lines.
Double Escaping
Just in case, you would wanted to include new lines, the following expressions might also work for those languages that double escaping is required such as Java or C++:
[\\s\\S]*
[\\d\\D]*
[\\w\\W]*
for zero or more times, or
[\\s\\S]+
[\\d\\D]+
[\\w\\W]+
for one or more times.
Single Escaping:
Double escaping is not required for some languages such as, C#, PHP, Ruby, PERL, Python, JavaScript:
[\s\S]*
[\d\D]*
[\w\W]*
[\s\S]+
[\d\D]+
[\w\W]+
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex_1 = "[\\s\\S]*";
final String regex_2 = "[\\d\\D]*";
final String regex_3 = "[\\w\\W]*";
final String string = "AAA123\n\t"
+ "ABCDEFGH123\n\t"
+ "XXXX123\n\t";
final Pattern pattern_1 = Pattern.compile(regex_1);
final Pattern pattern_2 = Pattern.compile(regex_2);
final Pattern pattern_3 = Pattern.compile(regex_3);
final Matcher matcher_1 = pattern_1.matcher(string);
final Matcher matcher_2 = pattern_2.matcher(string);
final Matcher matcher_3 = pattern_3.matcher(string);
if (matcher_1.find()) {
System.out.println("Full Match for Expression 1: " + matcher_1.group(0));
}
if (matcher_2.find()) {
System.out.println("Full Match for Expression 2: " + matcher_2.group(0));
}
if (matcher_3.find()) {
System.out.println("Full Match for Expression 3: " + matcher_3.group(0));
}
}
}
Output
Full Match for Expression 1: AAA123
ABCDEFGH123
XXXX123
Full Match for Expression 2: AAA123
ABCDEFGH123
XXXX123
Full Match for Expression 3: AAA123
ABCDEFGH123
XXXX123
If you wish to explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
There are lots of sophisticated regex testing and development tools, but if you just want a simple test harness in Java, here's one for you to play with:
String[] tests = {
"AAA123",
"ABCDEFGH123",
"XXXX123",
"XYZ123ABC",
"123123",
"X123",
"123",
};
for (String test : tests) {
System.out.println(test + " " +test.matches(".+123"));
}
Now you can easily add new testcases and try new patterns. Have fun exploring regex.
See also
regular-expressions.info/Tutorial
No, * will match zero-or-more characters. You should use +, which matches one-or-more instead.
This expression might work better for you: [A-Z]+123
Specific Solution to the example problem:-
Try [A-Z]*123$ will match 123, AAA123, ASDFRRF123. In case you need at least a character before 123 use [A-Z]+123$.
General Solution to the question (How to match "any character" in the regular expression):
If you are looking for anything including whitespace you can try [\w|\W]{min_char_to_match,}.
If you are trying to match anything except whitespace you can try [\S]{min_char_to_match,}.
Try the regex .{3,}. This will match all characters except a new line.
[^] should match any character, including newline. [^CHARS] matches all characters except for those in CHARS. If CHARS is empty, it matches all characters.
JavaScript example:
/a[^]*Z/.test("abcxyz \0\r\n\t012789ABCXYZ") // Returns ‘true’.
I like the following:
[!-~]
This matches all char codes including special characters and the normal A-Z, a-z, 0-9
https://www.w3schools.com/charsets/ref_html_ascii.asp
E.g. faker.internet.password(20, false, /[!-~]/)
Will generate a password like this: 0+>8*nZ\\*-mB7Ybbx,b>
I work this Not always dot is means any char. Exception when single line mode. \p{all} should be
String value = "|°¬<>!\"#$%&/()=?'\\¡¿/*-+_#[]^^{}";
String expression = "[a-zA-Z0-9\\p{all}]{0,50}";
if(value.matches(expression)){
System.out.println("true");
} else {
System.out.println("false");
}

Powershell find non printing characters [duplicate]

i would appreciate your help on this, since i do not know which range of characters to use, or if there is a character class like [[:cntrl:]] that i have found in ruby?
by means of non printable, i mean delete all characters that are not shown in ie output, when one prints the input string. Please note, i look for a c# regex, i do not have a problem with my code
You may remove all control and other non-printable characters with
s = Regex.Replace(s, #"\p{C}+", string.Empty);
The \p{C} Unicode category class matches all control characters, even those outside the ASCII table because in .NET, Unicode category classes are Unicode-aware by default.
Breaking it down into subcategories
To only match basic control characters you may use \p{Cc}+, see 65 chars in the Other, Control Unicode category. It is equal to a [\u0000-\u0008\u000E-\u001F\u007F-\u0084\u0086-\u009F \u0009-\u000D \u0085]+ regex.
To only match 161 other format chars including the well-known soft hyphen (\u00AD), zero-width space (\u200B), zero-width non-joiner (\u200C), zero-width joiner (\u200D), left-to-right mark (\u200E) and right-to-left mark (\u200F) use \p{Cf}+. The equivalent including astral place code points is a (?:[\xAD\u0600-\u0605\u061C\u06DD\u070F\u08E2\u180E\u200B-\u200F\u202A-\u202E\u2060-\u2064\u2066-\u206F\uFEFF\uFFF9-\uFFFB]|\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0-\uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F])+ regex.
To match 137,468 Other, Private Use control code points you may use \p{Co}+, or its equivalent including astral place code points, (?:[\uE000-\uF8FF]|[\uDB80-\uDBBE\uDBC0-\uDBFE][\uDC00-\uDFFF]|[\uDBBF\uDBFF][\uDC00-\uDFFD])+.
To match 2,048 Other, Surrogate code points that include some emojis, you may use \p{Cs}+, or [\uD800-\uDFFF]+ regex.
You can try with :
string s = "Täkörgåsmrgås";
s = Regex.Replace(s, #"[^\u0000-\u007F]+", string.Empty);
Updated answer after comments:
Documentation about non-printable character:
https://en.wikipedia.org/wiki/Control_character
Char.IsControl Method:
https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx
Maybe you can try:
string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());
To remove all control and other non-printable characters
Regex.Replace(s, #"\p{C}+", String.Empty);
To remove the control characters only (if you don't want to remove the emojis 😎)
Regex.Replace(s, #"\p{Cc}+", String.Empty);
you can try this:
public static string TrimNonAscii(this string value)
{
string pattern = "[^ -~]*";
Regex reg_exp = new Regex(pattern);
return reg_exp.Replace(value, "");
}

Regex: not all BLANKS but allow certain characters, with limit

Trying to come up with a Regex, or combination of Regex, that returns False if a) they have only entered only BLANK(s), or they b) entered "non-legal" characters. Lastly, the number of characters has a set limit.
The closest I have thus far is below. Where it fails is that it does not count any leading spaces; only the non-BLANKs are counted, and so it fails. Using js.
const reg = /^(**[ ]***[!-~\u2018-\u201d\u2013\u2014]){1,10}$/;
EDIT: I think the above is incorrect, and I meant to post this:
const re4 = /^(?!\s*$)[!-~\u2018-\u201d\u2013\u2014]{1,10}$/;
EDIT 2: this has less clutter; allow space and all other 'standard' keyboard chars:
const re5 = /^(?!\s*$)[!-~]{1,10}$/;
So, this says you can enter a bunch of spaces, and must include at least 1 other character from the list following; but the {1,10} only counts the non-spaces and so I can end up with too many in total.
EDIT:
So, using re5 above --
s = ' '; // should fail
s = ' blah blah'; // should pass
s = ' blah blah'; // should fail, as there are 11 characters
Try ^(?:\s*\S){1,10}\s*$
Allow 1-10 non whiter, change \S to allow chars
Update 2: After learning that you cannot invert the match result in code, here's one last suggestion using negative lookahead (like you already tried yourself).
This regex matches only strings of 1-10 non-banned characters that are not all whitespace:
const re4 = /^(?!\s+$)[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/
Update 1: Use this regex to match all-whitespace string OR strings longer than 10 chars OR strings containing bad characters:
const re4 = /(^\s+$|^.{11,}$|[\!-\~\u2018-\u201d\u2013\u2014])/
I understand that you want to impose a length restriction via regex. I would suggest against that and recommend using str.length instead.
This regex will match whitespace-only strings and strings containing one or more bad characters:
const re4 = /(^\s+$|[\!-\~\u2018-\u201d\u2013\u2014])/;
Regarding prohibition of all-whitespace strings: Instead of packing it into a regex, you might consider using something more explicit like if (s.trim().length == 0). IMO this makes your intention clearer and your code propably more readable, leaving you with this easy to read regex:
# matches any string containing a *bad* character
const re4 = /[\!-\~\u2018-\u201d\u2013\u2014]/;
If you use trim for the all-whitespace check, you might convert your regex into a positive assertion, even with length restriction:
# matches any string consisting of 1-10 characters not considered *bad*
const re4 = /^[^\!-\~\u2018-\u201d\u2013\u2014]{1,10}$/;
To match the input when it’s from 1 to 10 chars long and can't be all blanks, use a negative look ahead to assert not all blanks:
^(?! *$).{1,10}
If you want to restrict allowable chars, change the dot to a suitable character class of allowable chars.

Regular expression for password (at least 2 digits and one special character and minimum length 8)

I have been searching for regular expression which accepts at least two digits and one special character and minimum password length is 8. So far I have done the following: [0-9a-zA-Z!##$%0-9]*[!##$%0-9]+[0-9a-zA-Z!##$%0-9]*
Something like this should do the trick.
^(?=(.*\d){2})(?=.*[a-zA-Z])(?=.*[!##$%])[0-9a-zA-Z!##$%]{8,}
(?=(.*\d){2}) - uses lookahead (?=) and says the password must contain at least 2 digits
(?=.*[a-zA-Z]) - uses lookahead and says the password must contain an alpha
(?=.*[!##$%]) - uses lookahead and says the password must contain 1 or more special characters which are defined
[0-9a-zA-Z!##$%] - dictates the allowed characters
{8,} - says the password must be at least 8 characters long
It might need a little tweaking e.g. specifying exactly which special characters you need but it should do the trick.
There is no reason, whatsoever, to implement all rules in a single regex.
Consider doing it like thus:
Pattern[] pwdrules = new Pattern[] {
Pattern.compile("........"), // at least 8 chars
Pattern.compile("\d.*\d"), // 2 digits
Pattern.compile("[-!"§$%&/()=?+*~#'_:.,;]") // 1 special char
}
String password = ......;
boolean passed = true;
for (Pattern p : pwdrules) {
Matcher m = p.matcher(password);
if (m.find()) continue;
System.err.println("Rule " + p + " violated.");
passed = false;
}
if (passed) { .. ok case.. }
else { .. not ok case ... }
This has the added benefit that passwort rules can be added, removed or changed without effort. They can even reside in some ressource file.
In addition, it is just more readable.
Try this one:
^(?=.*\d{2,})(?=.*[$-/:-?{-~!"^_`\[\]]{1,})(?=.*\w).{8,}$
Here's how it works shortly:
(?=.*\d{2,}) this part saying except at least 2 digits
(?=.*[$-/:-?{-~!"^_[]]{1,})` these are special characters, at least 1
(?=.*\w) and rest are any letters (equals to [A-Za-z0-9_])
.{8,}$ this one says at least 8 characters including all previous rules.
Below is map for current regexp (made with help of Regexper)
UPD
Regexp should look like this ^(?=(.*\d){2,})(?=.*[$-\/:-?{-~!"^_'\[\]]{1,})(?=.*\w).{8,}$
Check out comments for more details.
Try this regex. It uses lookahead to verified there is a least two digits and one of the special character listed by you.
^(?=.*?[0-9].*?[0-9])(?=.*[!##$%])[0-9a-zA-Z!##$%0-9]{8,}$
EXPLANATION
^ #Match start of line.
(?=.*?[0-9].*?[0-9]) #Look ahead and see if you can find at least two digits. Expression will fail if not.
(?=.*[!##$%]) #Look ahead and see if you can find at least one of the character in bracket []. Expression will fail if not.
[0-9a-zA-Z!##$%0-9]{8,} #Match at least 8 of the characters inside bracket [] to be successful.
$ # Match end of line.
Regular expressions define a structure on the string you're trying to match. Unless you define a spatial structure on your regex (e.g. at least two digits followed by a special char, followed by ...) you cannot use a regex to validate your string.
Try this : ^.*(?=.{8,15})(?=.*\d)(?=.*\d)[a-zA-Z0-9!##$%]+$
Please read below link for making password regular expression policy:-
Regex expression for password rules

RegEx for including alphanumeric and special characters

I have requirement to allow alphanumeric and certain other characters for a field. I am using this regular expression:
"^[a-zA-Z0-9!##$&()-`.+,/\"]*$".
The allowed special characters are! # # $ & ( ) - ‘ . / + , “
But when I test the pattern with a string "test_for_extended_alphanumeric" , the string passes the test. I don't have "_" allowed in the pattern. What am I doing wrong?
You need to escape the hyphen:
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"
If you don't escape it then it means a range of characters, like a-z.
In your character class the )-' is interpreted as a range in the same way as e.g. a-z, it therefore refers to any character with a decimal ASCII code from 41 ) to 96 '.
Since _ has code 95, it is within the range and therefore allowed, as are <, =, > etc.
To avoid this you can either escape the -, i.e. \-, or put the - at either the start or end of the character class:
/^[a-zA-Z0-9!##$&()`.+,/"-]*$/
There is no need to escape the ", and note that because you are using the * quantifier, an empty string will also pass the test.
Using this regex you allow all alphanumeric and special characters. Here \w is allowing all digits and \s allowing space
[><?#+'`~^%&\*\[\]\{\}.!#|\\\"$';,:;=/\(\),\-\w\s+]*
The allowed special characters are ! # # $ & ( ) - ‘ . / + , “ = { } [ ] ? / \ |
Hyphens in character classes denote a range unless they are escaped or at the start or end of the character class. If you want to include hyphens, it's typically a good idea to put them at the front so you don't even have to worry about escaping:
^[-a-zA-Z0-9!##$&()`.+,/\"]*$
By the way, _ does indeed fall between ) and the backtick in ASCII:
http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters
How about this.. which allows special characters and as well as alpha numeric
"[-~]*$"
Because I don't know how many special characters exist, it is difficult to check the string contains special character by white list. It may be more efficient to check the string contains only alphabet or numbers.
for kotlin example
fun String.hasOnlyAlphabetOrNumber(): Boolean {
val p = Pattern.compile("[^a-zA-Z0-9]")
return !(p.matcher(this).matches())
}
for swift4
func hasOnlyAlphabetOrNumber() -> Bool {
if self.isEmpty { return false }
do {
let pattern = "[^a-zA-Z0-9]"
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: self.count)).count == 0
} catch {
return false
}
}
Regex sucks. Here is mine
/^[a-zA-Z\d-!##$%^&._"'()+,/;<>=|?[]\`~{}]$/
Mine is a little different than others but it is more self explanatory. You use \ in front of any special symbol like ] or . I had issues with -, , and ] so I had to put ], \, and move the - to the left. I also had issues with | but I moved it left and it fixed it.