Hi i am learning regex..
I was trying to make a regex expression for following conditon:
any letter in the sequence given below - C-MPSTV-XZ condition is that it should not be repeated.
This letter can have one blank space in front or back ie it can be " C" or "C "
[C-MPSTV-XZ{1} ]{2}
I was trying the above expression {1} expected one character only and space after that allowing one space only. At the end of string i put {2} to get only 2 character .
I was expecting regex_match to be false for input "XX" but its not working.
Appreciate your help.
\s?[C-MPSTV-XZ]\s?. If you are using std::regex_match,
you shouldn't need anything else, since regex_match requires
a match over the entire string.
Your posted regex will match two characters which are both not spaces, because you're asking for any two from inside the character class. You're also going to accept {, 1 and } as characters because quantifiers act as literal characters inside a character class.
The simple alternative is to just spell out the two conditions explicitly:
( [C-MPRSTV-XZ]|[C-MPRSTV-XZ] )
This assumes that your regex engine is treating whitespace within regexes as significant. If not, or if you don't like that, replace the spaces with a suitable escape sequence.
Related
I need to write a regular expression for form validation that allows spaces within a string, but doesn't allow only white space.
For example - 'Chicago Heights, IL' would be valid, but if a user just hit the space bar any number of times and hit enter the form would not validate. Preceding the validation, I've tried running an if (foo != null) then run the regex, but hitting the space bar still registers characters, so that wasn't working. Here is what I'm using right now which allows the spaces:
^[-a-zA-Z0-9_:,.' ']{1,100}$
It's very simple: .*\S.*
This requires one non-space character, at any place. The regular expression syntax is for Perl 5 compatible regular expressions, if you have another language, the syntax may differ a bit.
The following will answer your question as written, but see my additional note afterward:
^(?!\s*$)[-a-zA-Z0-9_:,.' ']{1,100}$
Explanation: The (?!\s*$) is a negative lookahead. It means: "The following characters cannot match the subpattern \s*$." When you take the subpattern into account, it means: "The following characters can neither be an empty string, nor a string of whitespace all the way to the end. Therefore, there must be at least one non-whitespace character after this point in the string." Once you have that rule out of the way, you're free to allow spaces in your character class.
Extra note: I don't think your ' ' is doing what you intend. It looks like you were trying to represent a space character, but regex interprets ' as a literal apostrophe. Inside a character class, ' ' would mean "match any character that is either ', a space character, or '" (notice that the second ' character is redundant). I suspect what you want is more like this:
^(?!\s*$)[-a-zA-Z0-9_:,.\s]{1,100}$
You could use simple:
^(?=.*\S).+$
if your regex engine supports positive lookaheads. This expression requires at least one non-space character.
See it on rubular.
If we wanted to apply validations only with allowed character set then I tried with USERNAME_REGEX = /^(?:\s*[.\-_]*[a-zA-Z0-9]{1,}[.\-_]*\s*)$/;
A string can contain any number of spaces at the beginning or ending or in between but will contain at least one alphanumeric character.
Optional ., _ , - characters are also allowed but string must have one alphanumeric character.
Try this regular expression:
^[^\s]+(\s.*)?$
It means one or more characters that are not space, then, optionally, a space followed by anything.
Just use \s* to avoid one or more blank spaces in the regular expression between two words.
For example, "Mozilla/ 4.75" and "Mozilla/4.75" both can be matched by the following regular expression:
[A-Z][a-z]*/\s*[0-9]\.[0-9]{1,2}
Adding \s* matches on zero, one or more blank spaces between two words.
I want a regular expression that prevents symbols and only allows letters and numbers. The regex below works great, but it doesn't allow for spaces between words.
^[a-zA-Z0-9_]*$
For example, when using this regular expression "HelloWorld" is fine, but "Hello World" does not match.
How can I tweak it to allow spaces?
tl;dr
Just add a space in your character class.
^[a-zA-Z0-9_ ]*$
Now, if you want to be strict...
The above isn't exactly correct. Due to the fact that * means zero or more, it would match all of the following cases that one would not usually mean to match:
An empty string, "".
A string comprised entirely of spaces, " ".
A string that leads and / or trails with spaces, " Hello World ".
A string that contains multiple spaces in between words, "Hello World".
Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...
...use #stema's answer.
Which, in my flavor (without using \w) translates to:
^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$
(Please upvote #stema regardless.)
Some things to note about this (and #stema's) answer:
If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a + after the space:
^\w+( +\w+)*$
If you want to allow tabs and newlines (whitespace characters), then replace the space with a \s+:
^\w+(\s+\w+)*$
Here I suggest the + by default because, for example, Windows linebreaks consist of two whitespace characters in sequence, \r\n, so you'll need the + to catch both.
Still not working?
Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w and \\s. In older or more basic languages and utilities, like sed, \w and \s aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_] and [\f\n\p\r\t], respectively.
* I know this question is tagged vb.net, but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.
One possibility would be to just add the space into you character class, like acheong87 suggested, this depends on how strict you are on your pattern, because this would also allow a string starting with 5 spaces, or strings consisting only of spaces.
The other possibility is to define a pattern:
I will use \w this is in most regex flavours the same than [a-zA-Z0-9_] (in some it is Unicode based)
^\w+( \w+)*$
This will allow a series of at least one word and the words are divided by spaces.
^ Match the start of the string
\w+ Match a series of at least one word character
( \w+)* is a group that is repeated 0 or more times. In the group it expects a space followed by a series of at least one word character
$ matches the end of the string
This one worked for me
([\w ]+)
Try with:
^(\w+ ?)*$
Explanation:
\w - alias for [a-zA-Z_0-9]
"whitespace"? - allow whitespace after word, set is as optional
I assume you don't want leading/trailing space. This means you have to split the regex into "first character", "stuff in the middle" and "last character":
^[a-zA-Z0-9_][a-zA-Z0-9_ ]*[a-zA-Z0-9_]$
or if you use a perl-like syntax:
^\w[\w ]*\w$
Also: If you intentionally worded your regex that it also allows empty Strings, you have to make the entire thing optional:
^(\w[\w ]*\w)?$
If you want to only allow single space chars, it looks a bit different:
^((\w+ )*\w+)?$
This matches 0..n words followed by a single space, plus one word without space. And makes the entire thing optional to allow empty strings.
This regular expression
^\w+(\s\w+)*$
will only allow a single space between words and no leading or trailing spaces.
Below is the explanation of the regular expression:
^ Assert position at start of the string
\w+ Match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
1st Capturing group (\s\w+)*
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s Match any white space character [\r\n\t\f ]
\w+ Match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
$ Assert position at end of the string
Just add a space to end of your regex pattern as follows:
[a-zA-Z0-9_ ]
This does not allow space in the beginning. But allowes spaces in between words. Also allows for special characters between words. A good regex for FirstName and LastName fields.
\w+.*$
For alphabets only:
^([a-zA-Z])+(\s)+[a-zA-Z]+$
For alphanumeric value and _:
^(\w)+(\s)+\w+$
If you are using JavaScript then you can use this regex:
/^[a-z0-9_.-\s]+$/i
For example:
/^[a-z0-9_.-\s]+$/i.test("") //false
/^[a-z0-9_.-\s]+$/i.test("helloworld") //true
/^[a-z0-9_.-\s]+$/i.test("hello world") //true
/^[a-z0-9_.-\s]+$/i.test("none alpha: ɹqɯ") //false
The only drawback with this regex is a string comprised entirely of spaces. " " will also show as true.
It was my regex: #"^(?=.{3,15}$)(?:(?:\p{L}|\p{N})[._()\[\]-]?)*$"
I just added ([\w ]+) at the end of my regex before *
#"^(?=.{3,15}$)(?:(?:\p{L}|\p{N})[._()\[\]-]?)([\w ]+)*$"
Now string is allowed to have spaces.
This regex allow only alphabet and spaces:
^[a-zA-Z ]*$
Try with this one:
result = re.search(r"\w+( )\w+", text)
I'd like a regular expression to match a string only if it contains a character that occurs a predefined number of times.
For example:
I want to match all strings that contain the character "_" 3 times;
So
"a_b_c_d" would pass
"a_b" would fail
"a_b_c_d_e" would fail
Does someone know a simple regular expression that would satisfy this?
Thank you
For your example, you could do:
\b[a-z]*(_[a-z]*){3}[a-z]*\b
(with an ignore case flag).
You can play with it here
It says "match 0 or more letters, followed by '_[a-z]*' exactly three times, followed by 0 or more letters". The \b means "word boundary", ie "match a whole word".
Since I've used '*' this will match if there are exactly three "_" in the word regardless of whether it appears at the start or end of the word - you can modify it otherwise.
Also, I've assumed you want to match all words in a string with exactly three "_" in it.
That means the string "a_b a_b_c_d" would say that "a_b_c_d" passed (but "a_b" fails).
If you mean that globally across the entire string you only want three "_" to appear, then use:
^[^_]*(_[^_]*){3}[^_]*$
This anchors the regex at the start of the string and goes to the end, making sure there are only three occurences of "_" in it.
Elaborating on Rado's answer, which is so far the most polyvalent but could be a pain to write if there are more occurrences to match :
^([^_]*_){3}[^_]*$
It will match entire strings (from the beginning ^ to the end $) in which there are exactly 3 ({3}) times the pattern consisting of 0 or more (*) times any character not being underscore ([^_]) and one underscore (_), the whole being followed by 0 ore more times any character other than underscore ([^_]*, again).
Of course one could alternatively group the other way round, as in our case the pattern is symmetric :
^[^_]*(_[^_]*){3}$
This should do it:
^[^_]*_[^_]*_[^_]*_[^_]*$
If you're examples are the only possibilities (like a_b_c_...), then the others are fine, but I wrote one that will handle some other possibilities. Such as:
a__b_adf
a_b_asfdasdfasfdasdfasf_asdfasfd
___
_a_b_b
Etc.
Here's my regex.
\b(_[^_]*|[^_]*_|_){3}\b
I need to write a regular expression for form validation that allows spaces within a string, but doesn't allow only white space.
For example - 'Chicago Heights, IL' would be valid, but if a user just hit the space bar any number of times and hit enter the form would not validate. Preceding the validation, I've tried running an if (foo != null) then run the regex, but hitting the space bar still registers characters, so that wasn't working. Here is what I'm using right now which allows the spaces:
^[-a-zA-Z0-9_:,.' ']{1,100}$
It's very simple: .*\S.*
This requires one non-space character, at any place. The regular expression syntax is for Perl 5 compatible regular expressions, if you have another language, the syntax may differ a bit.
The following will answer your question as written, but see my additional note afterward:
^(?!\s*$)[-a-zA-Z0-9_:,.' ']{1,100}$
Explanation: The (?!\s*$) is a negative lookahead. It means: "The following characters cannot match the subpattern \s*$." When you take the subpattern into account, it means: "The following characters can neither be an empty string, nor a string of whitespace all the way to the end. Therefore, there must be at least one non-whitespace character after this point in the string." Once you have that rule out of the way, you're free to allow spaces in your character class.
Extra note: I don't think your ' ' is doing what you intend. It looks like you were trying to represent a space character, but regex interprets ' as a literal apostrophe. Inside a character class, ' ' would mean "match any character that is either ', a space character, or '" (notice that the second ' character is redundant). I suspect what you want is more like this:
^(?!\s*$)[-a-zA-Z0-9_:,.\s]{1,100}$
You could use simple:
^(?=.*\S).+$
if your regex engine supports positive lookaheads. This expression requires at least one non-space character.
See it on rubular.
If we wanted to apply validations only with allowed character set then I tried with USERNAME_REGEX = /^(?:\s*[.\-_]*[a-zA-Z0-9]{1,}[.\-_]*\s*)$/;
A string can contain any number of spaces at the beginning or ending or in between but will contain at least one alphanumeric character.
Optional ., _ , - characters are also allowed but string must have one alphanumeric character.
Try this regular expression:
^[^\s]+(\s.*)?$
It means one or more characters that are not space, then, optionally, a space followed by anything.
Just use \s* to avoid one or more blank spaces in the regular expression between two words.
For example, "Mozilla/ 4.75" and "Mozilla/4.75" both can be matched by the following regular expression:
[A-Z][a-z]*/\s*[0-9]\.[0-9]{1,2}
Adding \s* matches on zero, one or more blank spaces between two words.
I have the following regex:
(?!^[&#]*$)^([A-Za-z0-9-'.,&#:?!()$#/\\]*)$
So allow A-Z, a-Z, 0-9, and these special chars '.,&#:?!()$#/\
I want to NOT match if the following set of chars is encountered anywhere in the string in this order:
&#
When I run this regex with just "&#" as input, it does not match my pattern, I get an error, great. When I run the regex with '.,&#:?!()$#/\ABC123 It does match my pattern, no errors.
However when I run it with:
'.,&##:?!()$#/\ABC123
It does not error either. I'm doing something wrong with the check for the &# sequence.
Can someone tell me what I've done wrong, I'm not great with these things.
Borrowing a technique for matching quoted strings, remove & from your character class, add an alternative for & not followed by #, and allow the string to optionally end with &:
^((?:[A-Za-z0-9-'.,#:?!()$#/\\]+|&[^#])*&?)$
I would actually do it in two parts:
Check your allowed character set. To do this I would look for characters that are not allowed, and return false if there's a match. That means I have a nice simple expression:
[^A-Za-z0-9'\.&#:?!()$#^]
Check your banned substring. And since it is just a substring, I probably wouldn't even use a regex for that part.
You didn't mention your language, but if in C#:
bool IsValid(string input)
{
return !( input.Contains("&#")
|| Regex.IsMatch(#"[^A-Za-z0-9'\.&#:?!()$#^]", input)
);
}
^((?!&#)[A-Za-z0-9-'.,&#:?!()$#/\\])*$
note that the last \ is escaped (doubled)
SO automatically turns \\ into \ if not in backticks
Assuming Perl compatible RegExp
To not match on the string '&#':
(?![^&]*&#)^([A-Za-z0-9-'.,&#:?!()$#/\\]*)$
Although you don't need the parenthesis because you are matching the entire string.
Just FYI, although Ben Blank's regex works, it's more complicated than it needs to be. I would do it like this:
^(?:[A-Za-z0-9-'.,#:?!()$#/\\]+|&(?!#))+$
Because I used a negative lookahead instead of a negated character class, the regex doesn't need any extra help to match an ampersand at the end of the string.
I'd recommend using two regular expressions in a conditional:
if (string has sequence "&#")
return false
else
return (string matches sequence "A-Za-z0-9-'.,&#:?!()$#/\")
I believe your second "main" regex of
^([A-Za-z0-9-'.,&#:?!()$#/\])$"
has several errors:
It will test only one character in your set
The \ character in regular expressions is a token indicating that the next character is part of some sort of "class" of characters (ex. \n = is the line feed character). The character sequence \] is actually causing your bracketed list not to be terminated.
You may be better off using
^[A-Za-z0-9-'.,&#:?!()$#/\\]+$
Note that the slash character is represented by a double-slash.
The + character indicates that at least one character being tested has to match the regex; if it is fine to pass a zero-length string, replace the + with a *.