i have this expression ([a-zA-Z]|ñ|Ñ)* which i want to use to block all characters but letters and Ñ to be entered on a textbox.
The problem is that return a match for: A9023 but also for 32""". How can i do to return a match for A9023 but not for 32""".
Thanks.
You need to add assertions for the start and the end of the string:
^([a-zA-Z]|ñ|Ñ)*$
Otherwise the regular expression matches at any position. Additionally, you can also write ([a-zA-Z]|ñ|Ñ)* as the character class [a-zA-ZñÑ]*:
^[a-zA-ZñÑ]*$
Sure that you don't mean ^([a-zA-Z]|ñ|Ñ)*$ -- you might be finding the characters you want but not excluding what you don't? The expression I mentioned will pin to the beginning ^ and the end $ of the string, so that nothing else will pass. Otherwise:
123ABC456
...will pass your match, because it found 0-or-more letters... though there were also other letters.
You didn't say which regex flavor (which programming language) you're using, but you might want to consider either
^\p{L}*$
if your regex flavor supports Unicode properties or
^[^\W\d_]*$
if it doesn't.
Reason: Your regex will allow only unaccented letters and Ñ - is there a real language that uses the latter without also having accented letters?
\p{L} means "any letter in any 'language'",
[^\W\d_] means "any character that is neither a non-alphanumeric, a digit or an underscore", which is just a fancy but necessary way to say "any letter" (\w is a shorthand for "letter, digit or underscore", \W is the inverse of that).
Related
I need to select a value which not listed in following string including all special characters.
List of string and requirement that need to rejected:
XNIL
SNIL
All special characters
My expression is like this (?!XNIL|SNIL|[\W])\w+
The problem is, if my text have a word XNIL or SNIL, it still allow the word NIL. But i have listed the word XNIL and SNIL to be rejected. Any mistake did i made here?
You can check my regex online here -> http://regexr.com/3cdsl
This seems to work on your test page: (?!(XNIL|SNIL|\W+))\b\w+ At least it solves the XNIL/SNIL problem.
The reason why your regex was matching XNIL was it was matching from the \w+. To see why, take your original and change \w+ to \w and notice the difference.
UPDATE:
Based on your feedback, you also wish to exclude _.
Because _ is used in programming language symbols, and [arguably] regexes were created, of, by, and for programmers, _ is considered a "word" char (i.e. it's in \w and therefore not excluded by \W).
From the [perl] regex man page:
\w Match a "word" character (alphanumeric plus "_", plus other connector punctuation chars plus Unicode marks)
Your final regex might need to be: (?!(XNIL|SNIL|_+|\W+))\b\w+. (Note: the _+)
A cleaner way: (?!(XNIL|SNIL|[\W_]+))\b\w+ which produces the same results yet is closer in intent to what you wanted.
You may have to adjust \w+ accordingly as well
If you really want to be sure, at the expense of being slightly more verbose, write out the character class as you choose:
(?!(XNIL|SNIL|[^a-zA-Z0-9]+))\b[a-zA-Z0-9]+
Check this regex
[^(XNIL|SNIL|[^\w])]
Explanation
[] having ^ at beginning says the that any thing that is not there in the list given in [] should be matched.
(XNIL|SNIL|[^\w+]) matches words XNIL or SNIL or [^\w] matches anything other than words(i.e. special chars)
So the whole regex matches any thing that is not there in [^(XNIL|SNIL|[^\w])]
This should work
(?m)^(((?!XNIL|SNIL|[\W]).)*)$
Grouping the character match with the negative lookahead will cause the zero length assertion to continue until finished (in this case at the end of the string due to $)
I have the follwwing regex:
/([^\s*][\l\u\w\d\s]+) (\d)/
It should match strings of the form: "some-string digit", e.g. "stackoverflow 1". Those strings cannot have whitespace at the beginning.
It works great, except for the simple strings with one character on the beginning, e.g.: "s 1". How can I fix it? I am using it in boost::regex (PCRE-compatible).
The [^\s*] is eating up your first string character, so when you require one-or-more string characters after it, that'll fail:
/([^\s*][\l\u\w\d\s]+) (\d)/
^^^^ ^^^^^^^^^^ ^^
"s" no match "1"
If you fix your misplaced *:
/([^\s]*[\l\u\w\d\s]+) (\d)/
^^^ ^^^^^^^^^^ ^^
"s"; "s" "1"
match
then cancelled
by backtracking
But in order to avoid the backtracking, I would instead write the regex like this:
/([\l\u\w\d]+[\l\u\w\d\s]*) (\d)/
Note that I am only showing the regex itself — re-apply your extra backslashes for use in a C++ string literal as required; e.g.
const std::string my_regex = "/([\\l\\u\\w\\d]+[\\l\\u\\w\\d\\s]*) (\\d)/";
This can probably be done more optimally anyway (I'm sure most of those character classes are redundant), but this should fix your immediate problem.
You can test your regexes here.
The problem is that you have the * in the wrong place: [^\s*] matches exactly one character that is neither whitespace nor an asterisk. (The s in "s 1" qualifies as "neither whitespace nor an asterisk", so it is matched and consumed, and no longer available to serve as a match for the next part, [\l\u\w\d\s]+. Note that "s 1", with two spaces, would succeed.)
You probably meant [^\s]*, which matches any number (including zero) of whitespace characters. If you make that small change, that will fix your regular expression.
However, there are other improvements to be made. First, the backslash+letter sequences that are short for character classes can be negated by capitalizing the letter: the character class "everything that's not in \s" can be written as above, with [^\s], but it can also be written more simply as \S.
Next, I don't know what \l and \u are. You've tagged this c++, so you're presumably using the standard regex library, which uses ECMAScript regex syntax. But the ECMAScript regular expression specification doesn't define those metacharacters.
If you're trying to match "lowercase letters" and "uppercase letters", those are [:lower:] and [:upper:] - but both sets of letters are already included in \w, so you don't need to include them in a character class that also has \w.
Pulling those out leaves a character class of [\w\d\s] - which is still redundant, because \w also includes the digits, so we don't need \d. Removing that, we have [\w\s], which matches "an underscore, letter, digit, space, tab, formfeed, or linefeed (newline)."
That makes the whole regular expression \S*[\s\w]+ (\d): zero or more non-whitespace characters, followed by at least one whitespace or word character, followed by exactly one space, followed by a digit. That seems like an unusual set of criteria to me, but it should definitely match "s 1". And it does, in my testing.
I would expect you could do something like this:
Add
{X,} where X is a number, onto the second set of brackets
Like below
([^\\s*][\\l\\u\\w\\d\\s]{2,}) (\d)
Replace 2 with whatever you want to be your minimum string length.
I've been reading some Q&A about regular expressions but I haven't found that answer my question. I'll be using ra as the searched string.
My problem is that I want to find the string 'ra' in any string, 'ra' will be replaced with 'RA', but the thing is that I just want to replace 'ra' as long is not part of any other word, for example: order_ra replaced to order RA but camera cannot be replaced with cameRA.
I tried all ready with [\s|_]ra(?:[\s|_]) and does not work, because is looking for anything like order_ra or order ra with an space at the end. I would like to match order ra or order_ra either it has a white space after it or not. Can anyone help me on this? I'm not too literate with regular expressions.
The reason I'm needing this is because I want to capitalize 'ra' dynamically in a string sent by a user interaction but not if belong to a word like came*ra* or *ra*dical. I don't know if I explain myself clearly, excuse me if I'm not.
Usually, you would use word boundaries: \bra\b only matches ra on its own, not inside a word. Unfortunately, the underscore is treated as an alphanumeric character, so index_ra would not be matched.
Therefore you need to implement this yourself. Assuming that your regex dialext supports Unicode and lookaround assertions, use
(?<!\p{L})foo(?!\p{L})
This matches foo, but not foobar or bazfoo:
(?<!\p{L}) # Assert that there is no letter before the current position
foo # Match foo
(?!\p{L}) # Assert that there is no letter after the current position
If you can't use Unicode character classes, try this:
(?<![^\W\d_])foo(?![^\W\d_])
This is a bit contorted logic (triple negative for teh win!): [^\W\d_] matches a letter (= a character that is not a non-alphanumeric character and not a digit or underscore), so the negative lookaround assertions make sure that there are no letters around the search string ("not a not a (non-alphanumeric or digit or underscore)"). Twisted but necessary since we also want start and end of the string match here.
If I understand what you are looking for, the following will perform the match. The non-capturing group is specified in the parens with (?:...). It is similar to the OP but also includes beginning and end-of-line anchors.
(?:^|\s|_)ra(?:$|\s|_)
I came across this regular expression which is used to check for alphabetic strings. Can anyone explain how it works to me?
/^\pL++$/uD
Thanks.
\pL+ (sometimes written as \p{L}) matches one or more Unicode letter(s). I prefer \p{L} to \pL because there are other Unicode properties like \p{Lu} (uppercase letter) that only work with the braces; \pLu would mean "a Unicode letter followed by the letter u").
The additional + makes the quantifier possessive, meaning that it will never relinquish any characters it has matched, even if that means an overall match will fail. In the example regex, this is unnecessary and can be omitted.
^ and $ anchor the match at the start and end of the string, ensuring that the entire string has to consist of letters. Without them, the regex would also match a substring surrounded by non-letters.
The entire regex is delimited by slashes (/). After the trailing slash, PHP regex options follow. u is the Unicode option (necessary to handle the Unicode property). D ensures that the $ only matches at the very end of the string (otherwise it would also match right before the final newline in a string if that string ends in a newline).
Looks like PCRE flavor.
According to RegexBuddy:
Assert position at the beginning of the string «^»
A character with the Unicode property “letter” (any kind of letter from any language) «\pL++»
Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
This looks like Unicode processing.. I found a neat article here that seems to explain \pL the rest are anchors and repetition characters.. which are also explained on this site:
http://www.regular-expressions.info/unicode.html
Enjoy
In my ASP.NET page, I have an input box that has to have the following validation on it:
Must be alphanumeric, with at least one letter (i.e. can't be ALL
numbers).
^\d*[a-zA-Z][a-zA-Z0-9]*$
Basically this means:
Zero or more ASCII digits;
One alphabetic ASCII character;
Zero or more alphanumeric ASCII characters.
Try a few tests and you'll see this'll pass any alphanumeric ASCII string where at least one non-numeric ASCII character is required.
The key to this is the \d* at the front. Without it the regex gets much more awkward to do.
Most answers to this question are correct, but there's an alternative, that (in some cases) offers more flexibility if you want to change the rules later on:
^(?=.*[a-zA-Z].*)([a-zA-Z0-9]+)$
This will match any sequence of alphanumerical characters, but only if the first group also matches the whole sequence. It's a little-known trick in regular expressions that allows you to handle some very difficult validation problems.
For example, say you need to add another constraint: the string should be between 6 and 12 characters long. The obvious solutions posted here wouldn't work, but using the look-ahead trick, the regex simply becomes:
^(?=.*[a-zA-Z].*)([a-zA-Z0-9]{6,12})$
^[\p{L}\p{N}]*\p{L}[\p{L}\p{N}]*$
Explanation:
[\p{L}\p{N}]* matches zero or more Unicode letters or numbers
\p{L} matches one letter
[\p{L}\p{N}]* matches zero or more Unicode letters or numbers
^ and $ anchor the string, ensuring the regex matches the entire string. You may be able to omit these, depending on which regex matching function you call.
Result: you can have any alphanumeric string except there's got to be a letter in there somewhere.
\p{L} is similar to [A-Za-z] except it will include all letters from all alphabets, with or without accents and diacritical marks. It is much more inclusive, using a larger set of Unicode characters. If you don't want that flexibility substitute [A-Za-z]. A similar remark applies to \p{N} which could be replaced by [0-9] if you want to keep it simple. See the MSDN page on character classes for more information.
The less fancy non-Unicode version would be
^[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*$
^[0-9]*[A-Za-z][0-9A-Za-z]*$
is the regex that will do what you're after. The ^ and $ match the start and end of the word to prevent other characters. You could replace the [0-9A-z] block with \w, but i prefer to more verbose form because it's easier to extend with other characters if you want.
Add a regular expression validator to your asp.net page as per the tutorial on MSDN: http://msdn.microsoft.com/en-us/library/ms998267.aspx.
^\w*[\p{L}]\w*$
This one's not that hard. The regular expression reads: match a line starting with any number of word characters (letters, numbers, punctuation (which you might not want)), that contains one letter character (that's the [\p{L}] part in the middle), followed by any number of word characters again.
If you want to exclude punctuation, you'll need a heftier expression:
^[\p{L}\p{N}]*[\p{L}][\p{L}\p{N}]*$
And if you don't care about Unicode you can use a boring expression:
^[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*$
^[0-9]*[a-zA-Z][a-zA-Z0-9]*$
Can be
any number ended with a character,
or an alphanumeric expression started with a character
or an alphanumeric expression started with a number, followed by a character and ended with an alphanumeric subexpression