RegEx for including alphanumeric and special characters - regex

I have requirement to allow alphanumeric and certain other characters for a field. I am using this regular expression:
"^[a-zA-Z0-9!##$&()-`.+,/\"]*$".
The allowed special characters are! # # $ & ( ) - ‘ . / + , “
But when I test the pattern with a string "test_for_extended_alphanumeric" , the string passes the test. I don't have "_" allowed in the pattern. What am I doing wrong?

You need to escape the hyphen:
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"
If you don't escape it then it means a range of characters, like a-z.

In your character class the )-' is interpreted as a range in the same way as e.g. a-z, it therefore refers to any character with a decimal ASCII code from 41 ) to 96 '.
Since _ has code 95, it is within the range and therefore allowed, as are <, =, > etc.
To avoid this you can either escape the -, i.e. \-, or put the - at either the start or end of the character class:
/^[a-zA-Z0-9!##$&()`.+,/"-]*$/
There is no need to escape the ", and note that because you are using the * quantifier, an empty string will also pass the test.

Using this regex you allow all alphanumeric and special characters. Here \w is allowing all digits and \s allowing space
[><?#+'`~^%&\*\[\]\{\}.!#|\\\"$';,:;=/\(\),\-\w\s+]*
The allowed special characters are ! # # $ & ( ) - ‘ . / + , “ = { } [ ] ? / \ |

Hyphens in character classes denote a range unless they are escaped or at the start or end of the character class. If you want to include hyphens, it's typically a good idea to put them at the front so you don't even have to worry about escaping:
^[-a-zA-Z0-9!##$&()`.+,/\"]*$
By the way, _ does indeed fall between ) and the backtick in ASCII:
http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters

How about this.. which allows special characters and as well as alpha numeric
"[-~]*$"

Because I don't know how many special characters exist, it is difficult to check the string contains special character by white list. It may be more efficient to check the string contains only alphabet or numbers.
for kotlin example
fun String.hasOnlyAlphabetOrNumber(): Boolean {
val p = Pattern.compile("[^a-zA-Z0-9]")
return !(p.matcher(this).matches())
}
for swift4
func hasOnlyAlphabetOrNumber() -> Bool {
if self.isEmpty { return false }
do {
let pattern = "[^a-zA-Z0-9]"
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: self.count)).count == 0
} catch {
return false
}
}

Regex sucks. Here is mine
/^[a-zA-Z\d-!##$%^&._"'()+,/;<>=|?[]\`~{}]$/
Mine is a little different than others but it is more self explanatory. You use \ in front of any special symbol like ] or . I had issues with -, , and ] so I had to put ], \, and move the - to the left. I also had issues with | but I moved it left and it fixed it.

Related

Regex to match certain characters anywhere between two characters

I want to detect (and return) any punctuation within brackets. The line of text I'm looking at will have multiple sets of brackets (which I can assume are properly formatted). So given something like this:
[abc.] [!bc]. [.e.g] [hi]
I'd want to detect all those cases and return something like [[.], [!], [..]].
I tried to do /{.*?([.,!?]+).*?}/g but then it returns true for [hello], [hi] which I don't want to match!
I'm using JS!
You can match substrings between square brackets and then remove all chars that are not punctuation:
const text = '[abc.] [!bc]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^.,?!]/g, '')}]`)
console.log(matches);
If you need to make your regex fully Unicode aware you can leverage ECMAScript 2018+ compliant solution like
const text = '[abc.] [!bc、]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^\p{P}\p{S}]/gu, '')}]`)
console.log(matches);
So,
\[([^\][]*)] matches a string between [ and ] with no other [ and ] inside
.replace(/[^.,?!]/g, '') removes all chars other than ., ,, ? and !
.replace(/[^\p{P}\p{S}]/gu, '') removes all chars other than Unicode punctuation proper and symbols.

Regular expression not working with \ and ]

I have a regex for validating a password which has to be at least 8 chars and must contain letter(upper and lower case) number and a special character from set ^ $ * . [ ] { } ( ) ? - " ! # # % & / \ , > < ' : ; | _ ~ ` .
I face 2 problems, after adding the / to the reg exp its not recognized (other characters are still working OK. If I add the /] as well the expression no longer works (everything is invalid though the pattern seems to be ok in the browser debug mode).
The regex string
static get PASSWORD_VALIDATION_REGEX(): string {
return '(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])' + // contains lowercase number uppercase
'(?=.*[\-~\$#!%#<>\|\`\\\/\[;:=\+\{\}\.\(\)*^\?&\"\,\'])' + // special
'.{8,}'; // more than allowed char
}
I used the regexp as a form validator and as a match in function
password: ['', {validators: [Validators.required,
Validators.pattern(StringUtils.PASSWORD_VALIDATION_REGEX)
],
updateOn: 'change'
}
]
//....
value.match(StringUtils.PASSWORD_VALIDATION_REGEX)
Tried to use only (?=.*[\\]) for the special chars list, in that case I've received a console error
Invalid regular expression: /^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[\]).{8,}$/: Unterminated character class
For '(?=.*[\]])' no console error but the following error is present in the form validation 'pattern'
actualValue: "AsasassasaX000[[][]"
requiredPattern: "^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[]]).{8,}$"
The same value and pattern fails on https://regex101.com/
Thanks for your help / suggestions in advance!
You have overescaped your pattern and failed to escape the ] char correctly. In JavaScript regex, ] inside a character class must be escaped.
If you are confused with how to define escapes inside a string literal (and it is rather confusing indeed), you should use a regex literal. One thing to remember about the regex use with Validators.pattern is that the string pattern is anchored by the Angular framework by enclosing the whole pattern with ^ and $, so these anchors must be present when you define the pattern as a regex literal.
Use
static get PASSWORD_VALIDATION_REGEX(): string {
return /^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[-~$#!%#<>|`\\\/[\];:=+{}.()*^?&",']).{8,}$/;
}
Note the \] that matches a ] char and \\ to match \ inside [...].

Fix the regular expression

I need help with writing a regular expression for a string that should contain alphanumeric signs and only one of these: #,?,$,%
for example: abx12A#we is ok but fsa?#ds is not ok
I tried with something like /[a-zA-z0-9][#?$%]{0,1}/ but its not working.
Any ideas?
Thanks
Something like:
^[a-zA-Z0-9]*[#?$%]?[a-zA-Z0-9]*$
should do the trick (and, depending on your regex engine, you may need to escape one or more of those special characters).
It's basically zero or more from the "alpha"-type class, followed by zero or one from the "special"-type class, followed once again by zero or more "alpha".
You can adjust what's contained in each character class as you see fit, but this is the general way to get one of something within a sea of other things.
If you need to match an empty string, too, you need to use
^[a-zA-Z0-9]*(?:[#?$%][a-zA-Z0-9]*)?$
See the regex demo
Details:
^ - start of string
[a-zA-Z0-9]* - zero or more alphanumeric
(?:[#?$%][a-zA-Z0-9]*)? - exactly 1 occurrence of:
[#?$%] - a char from the set
[a-zA-Z0-9]* - zero or more alphanumeric
$ - end of string
NOTE: [A-z] is a common typo resulting in serious issues.
If you do not want to allow an empty string, replace * with +:
^[a-zA-Z0-9]+(?:[#?$%][a-zA-Z0-9]+)?$
^ ^
try this
const regex = /^[a-zA-z0-9]*[#?$%]?[a-zA-z0-9]*$/
const perfectInputs = [
'abx12A#we',
'a#',
'#a',
'a#a'
]
const failedInputs = [
'fsa?#ds'
]
console.log('=========== test should be success ============')
for (const perfectInput of perfectInputs) {
const result = regex.test(perfectInput)
console.log(`test ${perfectInput}: ${result}`)
console.log()
}
console.log('=========== test should be failed ============')
for (const failedInput of failedInputs) {
const result = regex.test(failedInput)
console.log(`test ${failedInput}: ${result}`)
console.log()
}
If the special character can be at the begining or at the end of the string, you could use lookahead:
^(?=[^#?$%]*[#?$%]?[^#?$%]*$)[a-zA-Z0-9#?$%]+$
/^(?=[^#?$%]*[#?$%]?[^#?$%]*$)[a-zA-Z0-9#?$%]*$/
\__________^^^^^^^_________/ -------------------- not more than once
\_____________/ ----- other conditions

Regex for ASCII printable characters excluding some special characters

Can someone please help me with the regex for english characters, numbers and excluding few special characters?
The regex should be between ASCII>=32 and <127 and must not include special characters like
` ~ ! $ % ^ & * ( ) + = [ ] { } < > ? ; : \ |.
I created a simple regex for string only (^\p{L}+$) but how do I include all the characters and numbers but avoid these special ones listed above and others outside that ASCII code?
Thank You
Instead of allowing "everything except x", you should go for a whitelist since you have a defined set of characters that you want to allow.
^[0-9a-zA-Z'"#,\-/_ .#]+$
Please review the regex, I might have missed some special characters. It should give you the right idea!
Consider the following Regex...
^[\x20-\x7F]+$
Good Luck!

Regular expression for alphanumeric and underscores

Is there a regular expression which checks if a string contains only upper and lowercase letters, numbers, and underscores?
To match a string that contains only those characters (or an empty string), try
"^[a-zA-Z0-9_]*$"
This works for .NET regular expressions, and probably a lot of other languages as well.
Breaking it down:
^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string
If you don't want to allow empty strings, use + instead of *.
As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.
There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:
/^\w+$/
\w is equivalent to [A-Za-z0-9_], which is pretty much what you want (unless we introduce Unicode to the mix).
Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.
You want to check that each character matches your requirements, which is why we use:
[A-Za-z0-9_]
And you can even use the shorthand version:
\w
Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:
^
To indicate the string must start with that character, then use
$
To indicate the string must end with that character. Then use
\w+ or \w*
To indicate "1 or more", or "0 or more". Putting it all together, we have:
^\w*$
Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:
^[[:alnum:]_]+$
However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:
^\w+$
Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.
Um...question: Does it need to have at least one character or no? Can it be an empty string?
^[A-Za-z0-9_]+$
Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *:
^[A-Za-z0-9_]*$
If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:
^\w+$
Or
^\w*$
Use
^([A-Za-z]|[0-9]|_)+$
...if you want to be explicit, or:
^\w+$
...if you prefer concise (Perl syntax).
In computer science, an alphanumeric value often means the first character is not a number, but it is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).
Here is how you would do that:
Tested under PHP:
$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'
Or take
^[A-Za-z_][A-Za-z\d_]*$
and place it in your development language.
Use lookaheads to do the "at least one" stuff. Trust me, it's much easier.
Here's an example that would require 1-10 characters, containing at least one digit and one letter:
^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$
Note: I could have used \w, but then ECMA/Unicode considerations come into play, increasing the character coverage of the \w "word character".
This works for me. I found this in the O'Reilly's "Mastering Regular Expressions":
/^\w+$/
Explanation:
^ asserts position at start of the string
\w+ matches any word character (equal to [a-zA-Z0-9_])
"+" Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
Verify yourself:
const regex = /^\w+$/;
const str = `nut_cracker_12`;
let m;
if ((m = regex.exec(str)) !== null) {
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Try these multi-lingual extensions I have made for string.
IsAlphaNumeric - The string must contain at least one alpha (letter in Unicode range, specified in charSet) and at least one number (specified in numSet). Also, the string should consist only of alpha and numbers.
IsAlpha - The string should contain at least one alpha (in the language charSet specified) and consist only of alpha.
IsNumeric - The string should contain at least one number (in the language numSet specified) and consist only of numbers.
The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on Unicode Chart.
API:
public static bool IsAlphaNumeric(this string stringToTest)
{
// English
const string charSet = "a-zA-Z";
const string numSet = #"0-9";
// Greek
//const string charSet = #"\u0388-\u03EF";
//const string numSet = #"0-9";
// Bengali
//const string charSet = #"\u0985-\u09E3";
//const string numSet = #"\u09E6-\u09EF";
// Hindi
//const string charSet = #"\u0905-\u0963";
//const string numSet = #"\u0966-\u096F";
return Regex.Match(stringToTest, #"^(?=[" + numSet + #"]*?[" + charSet + #"]+)(?=[" + charSet + #"]*?[" + numSet + #"]+)[" + charSet + numSet +#"]+$").Success;
}
public static bool IsNumeric(this string stringToTest)
{
//English
const string numSet = #"0-9";
//Hindi
//const string numSet = #"\u0966-\u096F";
return Regex.Match(stringToTest, #"^[" + numSet + #"]+$").Success;
}
public static bool IsAlpha(this string stringToTest)
{
//English
const string charSet = "a-zA-Z";
return Regex.Match(stringToTest, #"^[" + charSet + #"]+$").Success;
}
Usage:
// English
string test = "AASD121asf";
// Greek
//string test = "Ϡϛβ123";
// Bengali
//string test = "শর৩৮";
// Hindi
//string test = #"क़लम३७ख़";
bool isAlphaNum = test.IsAlphaNumeric();
The following regex matches alphanumeric characters and underscore:
^[a-zA-Z0-9_]+$
For example, in Perl:
#!/usr/bin/perl -w
my $arg1 = $ARGV[0];
# Check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
print "Failed.\n";
} else {
print "Success.\n";
}
This should work in most of the cases.
/^[\d]*[a-z_][a-z\d_]*$/gi
And by most I mean,
abcd True
abcd12 True
ab12cd True
12abcd True
1234 False
Explanation
^ ... $ - match the pattern starting and ending with
[\d]* - match zero or more digits
[a-z_] - match an alphabet or underscore
[a-z\d_]* - match an alphabet or digit or underscore
/gi - match globally across the string and case-insensitive
For those of you looking for unicode alphanumeric matching, you might want to do something like:
^[\p{L} \p{Nd}_]+$
Further reading is at Unicode Regular Expressions (Unicode Consortium) and at Unicode Regular Expressions (Regular-Expressions.info).
For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :
^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$
Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters
[^a-zA-Z0-9 _]{1,255}
I believe you are not taking Latin and Unicode characters in your matches.
For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.
You can, alternatively, use this approach:
^[A-ZÀ-Ýa-zà-ý0-9_]+$
^\w*$ will work for the below combinations:
1
123
1av
pRo
av1
For Java, only case insensitive alphanumeric and underscore are allowed.
^ Matches the string starting with any characters
[a-zA-Z0-9_]+ Matches alpha-numeric character and underscore.
$ Matches the string ending with zero or more characters.
public class RegExTest {
public static void main(String[] args) {
System.out.println("_C#".matches("^[a-zA-Z0-9_]+$"));
}
}
To check the entire string and not allow empty strings, try
^[A-Za-z0-9_]+$
This works for me. You can try:
[\\p{Alnum}_]
Required Format
Allow these three:
0142171547295
014-2171547295
123abc
Don't allow other formats:
validatePnrAndTicketNumber(){
let alphaNumericRegex=/^[a-zA-Z0-9]*$/;
let numericRegex=/^[0-9]*$/;
let numericdashRegex=/^(([1-9]{3})\-?([0-9]{10}))$/;
this.currBookingRefValue = this.requestForm.controls["bookingReference"].value;
if(this.currBookingRefValue.length == 14 && this.currBookingRefValue.match(numericdashRegex)){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else if(this.currBookingRefValue.length ==6 && this.currBookingRefValue.match(alphaNumericRegex)){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else if(this.currBookingRefValue.length ==13 && this.currBookingRefValue.match(numericRegex) ){
this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
}else{
this.requestForm.controls["bookingReference"].setErrors({'pattern': true});
}
}
<input name="booking_reference" type="text" [class.input-not-empty]="bookingRef.value"
class="glyph-input form-control floating-label-input" id="bookings_bookingReference"
value="" maxlength="14" aria-required="true" role="textbox" #bookingRef
formControlName="bookingReference" (focus)="resetMessageField()" (blur)="validatePnrAndTicketNumber()"/>