the title explain the question itself.
more specifically i need to write a regex in order to accept a "question", something like: "how are you today?". So the last character must be a "?".
I tried something like this:
m/[^a-zA-Z0-9\W{1}]/
but it accept any input with 1 or more \W character
The regex you gave in your question does not do what you think it does.
m/[^a-zA-Z0-9\W{1}]/
This will match any character that is not a-z, A-Z, 0-9, any non word character (\W), {, or }. The ^ inside the square brackets negate the content of the char group. It's not the beginning of the line if it's in there!
If you need to validate any input that has a questionmark at at the end, all you need it the questionmark and the end-of-line metacharacer.
/\?$/
The ? is a metacharacter itself, so you need to escape it with a backslash (\).
If you want to match a whole sentence with the questionmark at the end, think of what kinds of characters could be in the sentence. It will not only be \w probably.
Play around with your input and your regex on http://regex101.com/, that will make it easier because it explains what's going on.
accept a "question", something like:"how are you today?"
How about:
$string =~ /^(?:[a-z0-9]+\s*)+\?$/i;
This may works:
if( $question =~ m!([\w\s]+)\?$! ) {
print "question text: $1\n";
}
The regexpr looks for \w and \s (spaces, tabs, ...) you often have in a text before the question mark at the last position
Try this. I hope to you expect match any character in preceding the ?, this is favor for you
'm/[.+\?$]/ '
.is helps to match the any character of the string
\Ignore the function of the ? (match 0 or 1 time in preceding character) then $ matches the last character.
Related
So I have the following:
^[a-zA-Z]+\b(myword+-)\b*
which I thought would match
^ start of string
[a-zA-Z] any alpha character
+ of one or more characters
\b followed by a word break
(myword+-) followed by myword which could include one or more special characters
\b followed by a word break
\* followed by anything at all
One: it does not work - it does not match anything
Two: any special characters included in {myword+-) throws an error
I could escape the special characters, but I don't know in advance what they might be, so I would have to escape all the possibilites, or perhaps I could just escape every character in {\m\y\w\o\r\d\\+\\-)
Edited to add:
Sorry, I knew I should have given more information
I have a series of strings to seach through in the form:
extra android-sdk and more that is of no interest
extra android-ndk and more that is of no interest
extra anjuta-extra and more that is of no interest
community c++-gtk-utils and more that is of no interest
and I have a list of items to search for in the strings:
android-sdk
android-ndk
extra
c++-gtk-utils
The item should only match if the second word in the string is an exact match to the item, so:
android-sdk will match the first string
android-ndk will match the second string
extra wuill NOT match the third string
c++-gtk-utils will match the fourth string
So (myword+-) is the item I am searching for "which could include one or more special characters"
Thanks for the help
Andrew
OK, with the help from above I worked it out.
This regex does exactly what I wanted, bear in mind that I am working in tcl (note the spaces to delimit the search word):
^[a-zA-Z]+\y extra \y *
where the search word is "extra".
It is necessary to escape any characters in the search string which may be interpreted by regex as qualifiers etc e.g +
So this will also work:
^[a-zA-Z]+\y dbus-c\+\+ \y *
Andrew
Strong recommendation: if you want to match literal strings, don't use regular expressions.
If we have this sample data:
set strings {
{extra android-sdk and more that is of no interest}
{extra android-ndk and more that is of no interest}
{extra anjuta-extra and more that is of no interest}
{community c++-gtk-utils and more that is of no interest}
}
set search_strings {
android-sdk
android-ndk
extra
c++-gtk-utils
}
Then, to find matches in the 2nd word of each string, we'll just use the eq string equality operator
foreach string $strings {
foreach search $search_strings {
if {[lindex [split $string] 1] eq $search} {
puts "$search matches $string"
}
}
}
outputs
android-sdk matches extra android-sdk and more that is of no interest
android-ndk matches extra android-ndk and more that is of no interest
c++-gtk-utils matches community c++-gtk-utils and more that is of no interest
If you insist on regular expression matching, you can escape any special characters to take away their usual regex meaning. Here, we'll take the brute force approach: any non-word chars will get escaped, so that the pattern may look like ^\S+\s+c\+\+\-gtk\-utils
foreach string $strings {
foreach search $search_strings {
set pattern "^\\S+\\s+[regsub -all {\W} $search {\\&}]"
if {[regexp $pattern $string]} {
puts "$search matches $string"
}
}
}
I was hoping to be able to make a portion of a regular expression to be a literal string, like
set pattern "^\\S+\\s+(***=$string)"
set pattern "^\\S+\\s+((?q)$string)"
but both failed.
Tcl regular expressions are documented at
https://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm
Also note your pattern ^[a-zA-Z]+\b(myword+-)\b* does not provide for any whitespace between the first and second words.
Disclaimer: Since your question lacks information what input and output is expected, I will give it a try to tell you why your Regex isn't working at all. Since it's not a full answer you might not want to mark it as accepted and possibly wait for someone to give you an example of working solution, as soon as you provide necessary information.
Notes:
quantifier characters (*, +, ? etc.) are applied to literal character or character class (a.k.a character group, namely characters/ranges inside [ ]) - when in your regex you write (myword+-) the only thing the + sign is applied to is letter 'd', nothing else.
what is myword in your regex? If you want a set of characters use [ ] combined with character ranges and/or character tokens such as \w (all word characters, such as letters and some special characters) or \d (all digit characters)
you also seem to misunderstand and misuse groups ("( )"), character classes ("[ ]") and quantifier notation ("{ }")
I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5
My requirement is
"A string should not be blank or empty"
Eg., A String can contain any number of characters or strings followed by any special characters but should never be empty for eg., a string can contain "a,b,c" or "xyz123abc" or "12!#$#%&*()9" or " aa bb cc "
So, this is what i tried
Regex for blank or space:-
^\s*$
^ is the beginning of string anchor
$ is the end of string anchor
\s is the whitespace character class
* is zero-or-more repetition of
I'm stuck on how to negate the regex ^\s*$ so that it accepts any string like "a,b,c" or "xyz" or "12!#$#%&*()9"
Any help is appreciated.
No need for a regex. In Groovy you have the isAllWhitespace method:
groovy:000> "".allWhitespace
===> true
groovy:000> " \t\n ".allWhitespace
===> true
groovy:000> "something".allWhitespace
===> false
So asking !yourString.allWhitespace should tell you if your string is something else than empty or blank :)
\S
\S matches any non-white space character
Each character class has it's own anti-class defined, so for \w you have \W for \s you have \S for \d you have \D etc.
http://www.regular-expressions.info/charclass.html
Your regex engine may not support \S. If this is the case you use [^ \t\v] if you support unicode (which you should) there are more space types that you should watch for.
If both your regex engine and you support unicode AND \S is not supported by your regex engine then you'll probably want to use (if you care about people entering different unicode space types):
[^ \r\f\t\v\u0085\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2028\u2029\u202F\u205F\u3000\uFEFF]
http://www.cs.tut.fi/~jkorpela/chars/spaces.html
http://en.wikipedia.org/wiki/Whitespace_character#Unicode
to me two simple ways to express it are (both no need for anchoring):
s.trim() =~ /.+/
or
s =~ /\S+/
the first assumes you know how trim() works, the second assumes the meaning of \S.
Of course
!s.allWhitespace
is perfect, again if you know it exists
The following regular expression will ensure that a string contains at least 1 non-whitespace character.
^(?!\s*$).+
Note: I am not familiar with groovy. But I would imagine there is a native functions (trim, empty, etc) that test this more naturally than a regular expression.
is this in a grails domain class?
if so, just use the blank constraint
In perl I want to substitute any character not [A-Z]i or [0-9] and replace it with "_" but only if this non alphanumerical character occurs between two alphanumerical characters. I do not want to touch non-alphanumericals at the beginning or end of the string.
I know enough regex to replace them, just not to only replace ones in the middle of the string.
s/(\p{Alnum})\P{Alnum}(\p{Alnum})/${1}_${2}/g;
Of course that would hurt your chanches with "#A#B%C", so you might use a look-arounds:
s/(?<=\p{Alnum})\P{Alnum}(?=\p{Alnum})/_/g;
That way you isolate it to just the non "alnum" character.
Or you could use the "keep flag", as well and get the same thing done.
s/\p{Alnum}\K\P{Alnum}(?=\p{Alnum})/_/g;
EDIT based on input:
To not eat a newline, you could do the following:
s/\p{Alnum}\K[^\p{Alnum}\n](?=\p{Alnum})/_/g;
Try this:
my $str = 'a-2=c+a()_';
$str =~ s/(?<=[A-Z0-9])[^A-Z0-9](?=[A-Z0-9])/\1_\2/gi;
I have a piece of Perl code (pattern matching) like this,
$var = "<AT>this is an at command</AT>";
if ($var =~ /<AT>([\s\w]*)<\/AT>/i)
{
print "Matched in AT command\n";
print "$var\n\n";
}
It works fine, if the content inbetween tags are without an Hyphen. It is not working if a hyphen is inserted between the string present inbetween tags like this... <AT>this is an at-command</AT>.
Can any one fix this regex to match even if hyphen is also inserted ??
help me pls
Senthil
On character class
Your pattern contains this subpattern:
[\s\w]*
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
\s is the shorthand for whitespace character class; \w for word character class. Neither contains the hyphen.
The * is the zero-or-more repetition specifier.
Now you should understand why this pattern does not match a hyphen: it matches zero-or-more of characters that is either a whitespace or a word character. If you want to match a hyphen, then you can include it into the character class.
[\s\w-]*
If you also want to include the period, question mark, and exclamation mark, for example, then you can simply add them in as well:
[\s\w.!?-]*
Special note on hyphen
BE CAUTIOUS when including the hyphen in a character class. It is used as a regex metacharacter in character class definition to define character range. For example,
[a-z]
matches one of any character the range between 'a' and 'z', inclusive. By contrast,
[az-]
matches one of exactly 3 characters, 'a', 'z', and '-'. When you put - as the last element in a character class, it becomes a literal hyphen instead of range definition. You can also put it as the first element, or escape it (by preceding with backslash, which is the way you escape all other regex metacharacters too).
That is, the following 3 character class are identical:
[az-] [-az] [a\-z]
Related questions
Regex: why doesn't [01-12] range work as expected?
You can just add a hyphen in the char class as:
if ($var =~ /<AT>([\s\w-]*)<\/AT>/i)
Also since your regex has a / in it you can use a different delimiter, this way you can avoid escaping /:
if ($var =~m{<AT>([\s\w-]*)</AT>}i)
Use \S instead of \w.
if ($var =~ /<AT>([\s\S]*)<\/AT>/i) {
If you want to have everything between and you can use
if ($var =~ /<AT>((?:(?!<AT>).)*)<\/AT>/i)
And it's ungreedy.
You need to add more characters to your class like [\s\w-]* (as codaddict told you).
Moreover, you should maybe use a lookahead to match the end of your command ("I want to match that only if it is followed by the ending statement") like :
if ($var =~ /<AT>([^<]*)(?=<\/AT>)/i)
[^<] stands for "any character (including hyphen) except "<".
You could even add a lookbehind :
if ($var =~ (?<=/<AT>)([^<]*)(?=<\/AT>)/i)
For more complexe things (since you seem to want a little parser), you should look at the theory of grammar and at lex/yacc.