Regular expression doesn't work as expected - regex

How can it be that this regular expression also returns strings that have a _ underscore as their last character?
It should only return strings with alphabetical characters, mixed lower- and uppercase.
However, the regular expression returns: 'action_'
$regEx = '/^([a-zA-Z])[a-zA-Z]*[\S]$|^([a-zA-Z])*[\S]$|^[a-zA-Z]*[\S]$/';

Because \S means "not whitespace character", \S matches _
A group should not have an underscore though, so, if you meant that, it could be that you are getting the whole match back and not just the first group.
Please show how are you using the regex to clarify that, if needed.

The [\S] will match everything that is not whitespace, including underscore.
Also, your expression is very odd!
If you want a string that only contains letters, then use ^[a-zA-Z]*$ or ^[a-zA-Z]+$ (depending on if blank is allowed or not).
If you're trying to do something else, you will need to expand on what that is.

\S matches any non-whitespace char - thus _

You should show the text and what part from you want to extract from it.
Regular expression shouldn't be so big like yours.
Work on small expression batches... At this size, is very difficult to help you.

Related

Regex to match other than listed string

I need to select a value which not listed in following string including all special characters.
List of string and requirement that need to rejected:
XNIL
SNIL
All special characters
My expression is like this (?!XNIL|SNIL|[\W])\w+
The problem is, if my text have a word XNIL or SNIL, it still allow the word NIL. But i have listed the word XNIL and SNIL to be rejected. Any mistake did i made here?
You can check my regex online here -> http://regexr.com/3cdsl
This seems to work on your test page: (?!(XNIL|SNIL|\W+))\b\w+ At least it solves the XNIL/SNIL problem.
The reason why your regex was matching XNIL was it was matching from the \w+. To see why, take your original and change \w+ to \w and notice the difference.
UPDATE:
Based on your feedback, you also wish to exclude _.
Because _ is used in programming language symbols, and [arguably] regexes were created, of, by, and for programmers, _ is considered a "word" char (i.e. it's in \w and therefore not excluded by \W).
From the [perl] regex man page:
\w Match a "word" character (alphanumeric plus "_", plus other connector punctuation chars plus Unicode marks)
Your final regex might need to be: (?!(XNIL|SNIL|_+|\W+))\b\w+. (Note: the _+)
A cleaner way: (?!(XNIL|SNIL|[\W_]+))\b\w+ which produces the same results yet is closer in intent to what you wanted.
You may have to adjust \w+ accordingly as well
If you really want to be sure, at the expense of being slightly more verbose, write out the character class as you choose:
(?!(XNIL|SNIL|[^a-zA-Z0-9]+))\b[a-zA-Z0-9]+
Check this regex
[^(XNIL|SNIL|[^\w])]
Explanation
[] having ^ at beginning says the that any thing that is not there in the list given in [] should be matched.
(XNIL|SNIL|[^\w+]) matches words XNIL or SNIL or [^\w] matches anything other than words(i.e. special chars)
So the whole regex matches any thing that is not there in [^(XNIL|SNIL|[^\w])]
This should work
(?m)^(((?!XNIL|SNIL|[\W]).)*)$
Grouping the character match with the negative lookahead will cause the zero length assertion to continue until finished (in this case at the end of the string due to $)

Regular Expression for re-verification

I am trying to validate verification question and this is the regular expressin I have, I am not what this mean but this expression not allowing spaces
^\S+$
For example if I enter 'Test Me', this expresson says it is not valid.. How do I fix this to allow spaces?
What exactly are you trying to match?
^ matches the beginning of the string
$ matches the end of the string
+ allows multiple occurances of the last expression
\S stands for anything but a whitespace
\s stands for white-spaces
The expression you have will match any string containing only non-white-space characters. If you could express what exactly you're trying to match, I could help you with it.
^\S+$
^^ ^^
|| ||
^ start of string-------------+| ||
\S anything but a whitespace----+ ||
+ one or more of what precedes---+|
$ end of string-------------------+
(visit regular-expressions.info for a larger reference)
Not sure what you want to change, really, since this regular expressions seems to have been written for the sole purpose of not allowing spaces.
^ means "start of the string"
\S is a special keyword in Regex that denotes "non-white space characters"
+ means find the previous one or more times
$ means "end of the string"
So in English, this Regex says: starting at the start of the string, find me ONLY non-white space characters one or more times before the end of the string. This is why it doesn't permit white space.
The reason it does not match is because you are not allowing white space characters in your string with \S
something that might serve you better is:
^[\w\s]+$
\w is equivalent to [A-Za-z0-9_]
\s matches whitespace
keep in mind that this regex will not allow punctuation, if you want that you may be better off using ^.+$

Get text using Regular Expression

I have the sentence as below:
First learning of regular expression.
And I want to extract only First learning and expression by means of regular expressions.
Where would I start/
Regular expressions are for pattern matching, which means we'd need to know a pattern that is to be matched.
If you literally just want those strings, you'd just use First learning and expression as your patterns.
As #orique says, this is kind of pointless; you don't need RegEx for that. If you want something more complicated, you'd need to explain what you're trying to match.
Regex is not usually used to match literal text like what you're doing, but instead is used to match patterns of text. If you insist on using regex, you'll have to match the trivial expression
(First learning|expression)
As already pointed out, it is unusual to match a literal string like you are asking, but more common to match patterns such as several word characters followed by a space character etc...
Here is a pattern to match several word characters (which are a-z, A-Z, 0-9 and _) followed by a space, followed by several more word characters etc... It ends up capturing three groups. The first group will match the first two words, the second part the next to words, and the last part, the fifth word and the preceding space.
$words = "First learning of regular expression.";
preg_match(/(\w+\s\w+)\s(\w+\s\w+)(\s\w+)/, $words, $matches);
$result = matches[1]+matches[3];
I hope this matches your requirement.

regular expression no characters

I have this regular expression
([A-Z], )*
which should match something like
test, (with a space after the comma)
How to I change the regex expression so that if there are any characters after the space then it doesn't match.
For example if I had:
test, test
I'm looking to do something similar to
([A-Z], ~[A-Z])*
Cheers
Use the following regular expression:
^[A-Za-z]*, $
Explanation:
^ matches the start of the string.
[A-Za-z]* matches 0 or more letters (case-insensitive) -- replace * with + to require 1 or more letters.
, matches a comma followed by a space.
$ matches the end of the string, so if there's anything after the comma and space then the match will fail.
As has been mentioned, you should specify which language you're using when you ask a Regex question, since there are many different varieties that have their own idiosyncrasies.
^([A-Z]+, )?$
The difference between mine and Donut is that he will match , and fail for the empty string, mine will match the empty string and fail for ,. (and that his is more case-insensitive than mine. With mine you'll have to add case-insensitivity to the options of your regex function, but it's like your example)
I am not sure which regex engine/language you are using, but there is often something like a negative character groups [^a-z] meaning "everything other than a character".

Perl matching characters bigger than a given length

I have been struggle to write regex that matches words longer than a given length within parentheses. First I thought I could do this with \(\w{a,}\) but I realize that it doesn't match with words with white space (ab cd ef). All I want to do is find out any characters within parentheses longer than, for instance, 3 characters. How can I resolve this problem ?
What is a word with white space?
if you want to match any character then use .
\(.{3,}\)
. matches any character except newlines
But be careful, this is greedy. it will match for example also
(a)123(b)
To avoid this you could do something like
\([^)]{3,}\)
See it here online on Regexr
[^)] means any character except a )
You could use a character class that includes both \w and \s:
\([\w\s]{a,}\)
Maybe do you mean?
\([\w\s]{a,}\)
if it has a space in it it's not a word anymore.
is matching any characters fine \(.{a,}\)? Or you just need the whitespace \(\(\w|\s\){a,}\)?