Allowing words picked up in regex in certain cases only - regex

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks

You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.

The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Related

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Extend an regex with logical AND in a non-capturing group

I want to extend an existing regex string:
((?:street)|(?:addr)|(?:straße)|(?:strasse)|(?:adr))
It basically matches strings like street or address.
So now I want to add, that if the strings 'addressAdd' or 'streetnr' exists it doesn't match anything anymore (not even street).
I tried
((?:street)|(?:addr)|(?:straße)|(?:strasse)|(?:adr))(^(?:addressAdd))(^(?:streetnr))
and several variations thereof however didn't succeed. Does anyone of you know how to negate strings?
Update: Some clarification: If a string like addressAdd exists I don't want that any string matches. The java code for this would look like this:
String toCheck="some string to match";
if((!toCheck.equals("streetnr") && !toCheck.equals("addressAdd")) && ( toCheck.equals("street") || toCheck.equals("strasse") || toCheck.equals("adr"))
I'd rather remove unnecessary grouping constructs and add a negative lookahead with these 2 exceptions:
(?!addressAdd|streetnr)(?:street|addr|straße|strasse|adr)
See the regex demo
To match whole words:
\b(?!(?:addressAdd|streetnr)\b)(?:street|addr|straße|strasse|adr)\b
See another demo
Here, you can read more about lookaheads. In short: (?!addressAdd|streetnr) checks if there is no addressAdd and streetnr after the current position and only then the regex engine can go on matching one of the alternatives listed in (?:street|addr|straße|strasse|adr) non-capturing group. With word boundaries (\b(?!(?:addressAdd|streetnr)\b)) only those exceptions are skipped that are whole words (so, if there is streetnrs, it will get matched).
Answer to the update:
To match strings (or lines if DOTALL option is not used) that contain specific substrings and do not contain disallowed whole words, use the negative lookahead at the beginning of the pattern right after ^:
^(?!.*\b(?:addressAdd|streetnr)\b).*(?:street|addr|straße|strasse|adr).*
See another regex demo

PostgreSQL check constraint not working

I am trying to get below constraint work in postgresql which checks the column wcode for the pattern. If the pattern doesn't match should throw an error.
CONSTRAINT wcoding CHECK (wcode::text ~ '[\w]{4,4}-[\w]{2,2}-[\w]{1,1}'::text);
Geniun input string is "AA14-AM-1". which actually works. but the problem is if I enter "AA14-AM-14" or "AA14-AM-1444" it doesn't through an error. I want to restrict input to use this ("AA14-AM-1") pattern.
You have an "unbounded" regex (not sure if that is the correct technical term). Which essentially means the pattern has to occur anywhere inside the input string. To match the input string with the exact pattern, you need an "anchored" regex:
CONSTRAINT wcoding CHECK (wcode::text ~ '^[\w]{4,4}-[\w]{2,2}-[\w]{1,1}$');
The ^ and $ "anchor" the pattern at start and ending which results in the fact that the input string must match the pattern exactly (not permitting the pattern as a sub-string of a longer input value).
#a_horse clarifies the role of ^ and $. But simplify overall:
ALTER TABLE ADD CONSTRAINT wcoding
CHECK (wcode::text ~ '^\w{4}-\w\w-\w$');
You don't need a character class for class shorthands like \w.
And why is there a cast to text? Might be redundant.
SQL Fiddle.
Malav, in PostgreSQL, this compact regex does what you want:
First Method
[\w]{4}-[\w]{2}-\w(?!=\w)
Second Method
[\w]{4}-[\w]{2}-\w\y
Please note that instead of {4,4} you can write {4} to mean "exactly four times".
How does this work?
After the last word character, we check that there is no other word character. For this, in the first method, we use a negative lookahead (?=\w)
In the second method, we use a word boundary \y (In most regex flavors I would add a word boundary \b at the end, but in PostgreSQL it is \y )
This is why in the first version I used a negative lookahead instead (more portable). Use whichever version you like.

Exclude strings of pattern "abba"

For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'
It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.
It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)

How do I exclude a word from a regular expression search?

How I can create a regular expression for the following problem:
I have a string,
name1=value1;name2=value2;.....;
Somewhere, there exists a pair,
"begin=10072011;"
I need, with regular expressions, to parse from the string all name=value; pairs, where the value is a number. However, I want to ignore the name begin
Currently I have the following regexp:
([\\w]+)=([\\d]+);
Mine selects the begin name. How can I change it to not include begin?
(?!begin)\b(\w+)=(\d+);
This uses negative lookahead, so it will not match if the string starts with "begin". The \b is necessary so that the regex does not just skip the "b" and match "egin=...".
Note that when describing a regex you should only using a single backslash for escapes, although for some languages you will need to use double backslashes to escape the backslash.
This should do it:
\b(?!begin=)(\w+)=(\d+)\b
As aC++ string literal it would look like this:
"\\b(?!begin=)(\\w+)=(\\d+)\\b"
\b is a word boundary; you use it to make sure you're matching a whole word (as "word" is defined in the context of regexes; read that page carefully). For example, without the first \b the regex would correctly fail to match
begin=1234 // OK
...but then it would skip ahead one position and match:
egin=1234 // oops!
I think (?<=begin=)\d+(?=;) will be a better choice.
If you keep all the information in XML format, the work will be much easier than now.