Regular expression to display input excluding characters and white spaces using javascript? - regex

I want to display whatever we inputted to text-field excluding special characters and white spaces. Is there any regular expression for that.
For example:- If we given KA13#B74$5, then we need to display
KA13B745

Remove anything other than you required using a negated character class regex with String#replace method.
console.log(
'KA13#B74$5'.replace(/[^a-z\d]+/ig, '')
)

In java code is bit simple
Scanner sc=new Scanner(System.in);
String input=sc.nextLine();
String newstr="";
for(int i=0;i<input.length();i++)
{
char ch=input.charAt(i);
if(Character.isLetter(ch)|| Character.isDigit(ch))
{
newstr=newstr+ch;
}
}
System.out.print(newstr);//string without spaces and special characters

Try below snippet
<input type=text onkeyup="this.value = this.value.replace(/[^a-z\d]+/ig, '')">

'KA13#B74$5'.replace(/[\W]+/ig, '')
\W is any special character

Related

Powershell find non printing characters [duplicate]

i would appreciate your help on this, since i do not know which range of characters to use, or if there is a character class like [[:cntrl:]] that i have found in ruby?
by means of non printable, i mean delete all characters that are not shown in ie output, when one prints the input string. Please note, i look for a c# regex, i do not have a problem with my code
You may remove all control and other non-printable characters with
s = Regex.Replace(s, #"\p{C}+", string.Empty);
The \p{C} Unicode category class matches all control characters, even those outside the ASCII table because in .NET, Unicode category classes are Unicode-aware by default.
Breaking it down into subcategories
To only match basic control characters you may use \p{Cc}+, see 65 chars in the Other, Control Unicode category. It is equal to a [\u0000-\u0008\u000E-\u001F\u007F-\u0084\u0086-\u009F \u0009-\u000D \u0085]+ regex.
To only match 161 other format chars including the well-known soft hyphen (\u00AD), zero-width space (\u200B), zero-width non-joiner (\u200C), zero-width joiner (\u200D), left-to-right mark (\u200E) and right-to-left mark (\u200F) use \p{Cf}+. The equivalent including astral place code points is a (?:[\xAD\u0600-\u0605\u061C\u06DD\u070F\u08E2\u180E\u200B-\u200F\u202A-\u202E\u2060-\u2064\u2066-\u206F\uFEFF\uFFF9-\uFFFB]|\uD804[\uDCBD\uDCCD]|\uD80D[\uDC30-\uDC38]|\uD82F[\uDCA0-\uDCA3]|\uD834[\uDD73-\uDD7A]|\uDB40[\uDC01\uDC20-\uDC7F])+ regex.
To match 137,468 Other, Private Use control code points you may use \p{Co}+, or its equivalent including astral place code points, (?:[\uE000-\uF8FF]|[\uDB80-\uDBBE\uDBC0-\uDBFE][\uDC00-\uDFFF]|[\uDBBF\uDBFF][\uDC00-\uDFFD])+.
To match 2,048 Other, Surrogate code points that include some emojis, you may use \p{Cs}+, or [\uD800-\uDFFF]+ regex.
You can try with :
string s = "Täkörgåsmrgås";
s = Regex.Replace(s, #"[^\u0000-\u007F]+", string.Empty);
Updated answer after comments:
Documentation about non-printable character:
https://en.wikipedia.org/wiki/Control_character
Char.IsControl Method:
https://msdn.microsoft.com/en-us/library/system.char.iscontrol.aspx
Maybe you can try:
string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());
To remove all control and other non-printable characters
Regex.Replace(s, #"\p{C}+", String.Empty);
To remove the control characters only (if you don't want to remove the emojis 😎)
Regex.Replace(s, #"\p{Cc}+", String.Empty);
you can try this:
public static string TrimNonAscii(this string value)
{
string pattern = "[^ -~]*";
Regex reg_exp = new Regex(pattern);
return reg_exp.Replace(value, "");
}

Regular Expression Arabic characters and numbers only

I want Regular Expression to accept only Arabic characters, Spaces and Numbers.
Numbers are not required to be in Arabic.
I found the following expression:
^[\u0621-\u064A]+$
which accepts only only Arabic characters while I need Arabic characters, Spaces and Numbers.
Just add 1-9 (in Unicode format) to your character-class:
^[\u0621-\u064A0-9 ]+$
OR add \u0660-\u0669 to the character-class which is the range of Arabic numbers :
^[\u0621-\u064A\u0660-\u0669 ]+$
You can use:
^[\u0621-\u064A\s\p{N}]+$
\p{N} will match any unicode numeric digit.
To match only ASCII digit use:
^[\u0621-\u064A\s0-9]+$
EDIT: Better to use this regex:
^[\p{Arabic}\s\p{N}]+$
RegEx Demo
you can use
[ء-ي]
it worked for me in javascript Jquery forme.validate rules
for my example I want to force user to insert 3 characters
[a-zA-Zء-ي]
use this
[\u0600-\u06FF]
it worked for me on visual studio
With a lot of try and edit i got this for Persian names:
[گچپژیلفقهمو ء-ي]+$
^[\u0621-\u064Aa-zA-Z\d\-_\s]+$
This regex must accept Arabic letters,English letters, spaces and numbers
Simple, use this code:
^[؀-ۿ]+$
This works for Arabic/Persian even numbers.
function HasArabicCharacters(string text)
{
var regex = new RegExp(
"[\u0600-\u06ff]|[\u0750-\u077f]|[\ufb50-\ufc3f]|[\ufe70-\ufefc]");
return regex.test(text);
}
To allow Arabic + English Letters with min&max allowed number of characters in a field, try this, tested 100%:
^[\u0621-\u064A\u0660-\u0669a-zA-Z\-_\s]{4,35}$
A- Arabic English letters Allowed.
B- Numbers not allowed.
C- {4,35} means the Min,Max characters allowed.
Update: On submit: Accepted English words with spaces, but the Arabic words with spaces could not be submitted!
All cases tested
Regex for English and Arabic Numbers only
function HasArabicEnglishNumbers(text)
{
var regex = new RegExp(
"^[\u0621-\u064A0-9]|[\u0621-\u064A\u0660-\u0669]+$");
return regex.test(text);
}
#Pattern(regexp = "^[\\p{InArabic}\\s]+$")
Accept arabic digit and character
This one allows Arabic letters, Arabic numbers and English numbers
var arabic = RegExp("^[\u0621-\u064A\u0660-\u0669 1-9]+\$");
In PHP, use this:
preg_replace("/\p{Arabic}/u", 'x', 'abc123ابت');// will replace arabic letters with "x".
Note: For \p{Arabic} to match arabic letters, you need to pass u modifier (for unicode) at the end.
The posts above include much more than arabic (MSA) characters, it includes persian, urdu, quranic symbols, and some other symbols. The arabic MSA characters are only (see Arabic Unicode)
[\u0621-\u063A\u0641-\u0652]
I always use these to control user input in my apps
public static Regex IntegerString => new(#"^[\s\da-zA-Zء-ي]+[^\.]*$");
public static Regex String => new(#"^[\sa-zA-Zء-ي]*$");
public static Regex Email => new(#"^[\d\#\.a-z]*$");
public static Regex Phone => new(#"^[\d\s\(\)\-\+]+[^\.]*$");
public static Regex Address => new(#"^[\s\d\.\,\،\-a-zA-Zء-ي]*$");
public static Regex Integer => new(#"^[\d]+[^\.]*$");
public static Regex Double => new(#"^[\d\.]*$");
This is useful example
public class Test {
public static void main(String[] args) {
String thai = "1ประเทศไทย1ประเทศไทย";
String arabic = "1عربي1عربي";
//correct inputs
System.out.println(thai.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.THAI.toString() + "}*]*"));
System.out.println(arabic.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.ARABIC.toString() + "}*]*"));
//incorrect inputs
System.out.println(arabic.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.THAI.toString() + "}*]*"));
System.out.println(thai.matches("[[0-9]*\\p{In" + Character.UnicodeBlock.ARABIC.toString() + "}*]*"));
}
}
[\p{IsArabic}-[\D]]
An Arabic character that is not a non-digit

MVC Model regex allow five whitespace character

[RegularExpression("^\\d{5}$||d{0}", ErrorMessage = "Girdiğiniz değer 5 karakter uzunluğunda olmalıdır ve rakamlardan oluşmalıdır")]
public string PostaKodu { get; set; }
When I get PostaKodu from old database , it returns value as five space character if it is null. In form view, it gives validation error. How can i add five white space character to my regular expression
You should modify your expression like use spaces like \s, this is the character for white space in regular expression. Something like \\d{5}$||d{0}||^\\s{5}$ would do the job :D

Regex to Strip Special Characters

I am trying to use regex.replace to strip out unwanted characters, but I need to account for spaces:
string asdf = "doésn't work?";
string regie = #"([{}\(\)\^$&._%#!#=<>:;,~`'\’ \*\?\/\+\|\[\\\\]|\]|\-)";
Response.Write(Regex.Replace(asdf,regie,"").Replace(" ","-"));
returns doésntwork instead of doésnt-work
Ideas?
Thanks!
Your regular expression includes a space, so the space gets stripped out before the string.Replace is called.
string regie = #"([{}\(\)\^$&._%#!#=<>:;,~`'\’ \*\?\/\+\|\[\\\\]|\]|\-)";
^ here
Remove it from the regular expression and your code should do what you expect:
string regie = #"([{}\(\)\^$&._%#!#=<>:;,~`'\’\*\?\/\+\|\[\\\\]|\]|\-)";
You have a space inside your regex, right here: \’ \*.

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression:
H3Y5NC8E-TGA5B6SB-2NVAQ4E0
and return the following using Split:
H3Y5NC8E
TGA5B6SB
2NVAQ4E0
I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:
([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}
This will match exactly 3 groups of 8 characters each. Any more or less will fail the match.
This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:
(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}
But this does not work.
Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.
Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.
The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.
BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.
[[A-Z\d]-[IOUW]]
If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?
([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}
In PHP I would return (I don't know .NET)
return "$1 $2 $3";
I have discovered the answer I was after. Here is my working code:
static void Main(string[] args)
{
string pattern = #"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
Regex re = new Regex(pattern);
Match m = re.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["group"].Captures)
Console.WriteLine(c.Value);
}
After reviewing your question and the answers given, I came up with this:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = #"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
string match = matches[i].Value;
}
Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.
Why use Regex? If the groups are always split by a -, can't you use Split()?
Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?
Dim stringArray As Array = someString.Split("-")
What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.
My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.
If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.
system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value
Mike,
You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])
You can use this pattern:
Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")
But you will need to filter out empty strings from resulting array.
Citation from MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array.