PostgreSQL check constraint not working - regex

I am trying to get below constraint work in postgresql which checks the column wcode for the pattern. If the pattern doesn't match should throw an error.
CONSTRAINT wcoding CHECK (wcode::text ~ '[\w]{4,4}-[\w]{2,2}-[\w]{1,1}'::text);
Geniun input string is "AA14-AM-1". which actually works. but the problem is if I enter "AA14-AM-14" or "AA14-AM-1444" it doesn't through an error. I want to restrict input to use this ("AA14-AM-1") pattern.

You have an "unbounded" regex (not sure if that is the correct technical term). Which essentially means the pattern has to occur anywhere inside the input string. To match the input string with the exact pattern, you need an "anchored" regex:
CONSTRAINT wcoding CHECK (wcode::text ~ '^[\w]{4,4}-[\w]{2,2}-[\w]{1,1}$');
The ^ and $ "anchor" the pattern at start and ending which results in the fact that the input string must match the pattern exactly (not permitting the pattern as a sub-string of a longer input value).

#a_horse clarifies the role of ^ and $. But simplify overall:
ALTER TABLE ADD CONSTRAINT wcoding
CHECK (wcode::text ~ '^\w{4}-\w\w-\w$');
You don't need a character class for class shorthands like \w.
And why is there a cast to text? Might be redundant.
SQL Fiddle.

Malav, in PostgreSQL, this compact regex does what you want:
First Method
[\w]{4}-[\w]{2}-\w(?!=\w)
Second Method
[\w]{4}-[\w]{2}-\w\y
Please note that instead of {4,4} you can write {4} to mean "exactly four times".
How does this work?
After the last word character, we check that there is no other word character. For this, in the first method, we use a negative lookahead (?=\w)
In the second method, we use a word boundary \y (In most regex flavors I would add a word boundary \b at the end, but in PostgreSQL it is \y )
This is why in the first version I used a negative lookahead instead (more portable). Use whichever version you like.

Related

Match a specific regex using matches()

Trying to match a specific word using matches()
*//id[matches(.,lower-case('*\s?Xander\s?*'))]
Examples:
Set of Xanderous- No match
Xander Tray of 6- Match
Tray of 6 pieces Xander- Match
Set of 6 Xander pieces- Match
Any instance of the exact word 'Xander' match is the objective.
The reason the XPath regex dialect doesn't handle word boundaries is that to do it properly, you need to be language-sensitive - a "word" is a cultural artefact.
You could do tokenize(., '\P{L}+') = 'Xander' which tokenizes treating any sequence of non-letters as a separator and then tests if one of the tokens is 'Xander'.
I have been running some tests and it seems word boundaries are not integrated into the XML/XPATH vocabulary. So the next best thing IMO is to test for a whitespace or start/end string anchors surrounding zero or more characters. Therefore, I ended up with:
*//id[matches(lower-case(.),'.*(^|\s)xander($|\s).*')]
Even better would be to drop lower-case alltogether and use the third matches parameter (flags) setting it to case-insensitive matching:
*//id[matches(.,'.*(^|\s)xander($|\s).*','i')]
Roughly, if you want to get the full line matching if it exactly contains the word Xander, you can use \b which delimits a specific word, plus some greedy operators .*:
^.*\bXander\b.*$
Demo: https://regex101.com/r/PvKptN/1
Or if you don't need the whole line, you can simply check if it contains Xander:
\bXander\b
Demo: https://regex101.com/r/PvKptN/2
I hope it satisfies the regex flavor you're using

Allowing words picked up in regex in certain cases only

I have a regex expression to look for people just sticking "N/A" or similar into a form field.
^(?!(\b(N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)\b))
Probably not the most elegant I am sure. However I cannot for the life of me get it to allow the above words if followed by something.
So if someone just types "yes" then I want it to fail the regex check. But if someone types "yes, I have blah blah etc etc" I want it to pass.
The expression I have allows the word to be used as long as it isn't the first word in the sentence. I just want to disallow the listed words as the ONLY words in the field.
Any ideas?
Thanks
You may remove the first \b (it is redundant between the start of string and a word char) and replace the second one with $ (end of string):
^(?!(?:N/A|NA|n/a|na|Yes|yes|YES|No|no|NO)$)
See the regex demo
With a case insensitive option, you may reduce the pattern to
^(?!(?:n/?a|yes|no)$)
See another regex demo
Details
^ - start of string, then...
(?!(?:n/?a|yes|no)$) - a location in string that is not immediately followed with n/?a (na, n/a), yes or no that are followed with the end of string.
In human words, only the start of string is matched if the whole string is not equal to the alternatives inside the alternation group.
The easiest way would be to match all the forbidden strings exactly and invert the result.
Try ^(n/?a|yes|no)$ with a case-insensitive option and invert the result.
^ matches the beginning of the string. $ matches the end of the string.
When you don't have a case-insensitive option, use ^([nN]/?[aA]|[yY][eE][sS]|[nN][oO])$.

Exclude strings of pattern "abba"

For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'
It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.
It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)

Need help with Regular Expression to Match Blood Group

I'm trying to come up with a regex that helps me validate a Blood Group field - which should accept only A[+-], B[+-], AB[+-] and O[+-].
Here's the regex I came up with (and tested using Regex Tester):
[A|B|AB|O][\+|\-]
Now this pattern successfully matches A,B,O[+-] but fails against AB[+-].
Can anyone please suggest a regex that'll serve my purpose?
Thanks,
m^e
Try:
(A|B|AB|O)[+-]
Using square brackets defines a character class, which can only be a single character. The parentheses create a grouping which allows it to do what you want. You also don't need to escape the +- in the character class, as they don't have their regexy meaning inside of it.
As you mentioned in the comments, if it is a string you want to match against that has the exact values you are looking for, you might want to do this:
^(A|B|AB|O)[+-]$
Without the start of string and end of string anchors, things like "helloAB+asdads" would match.
The brackets [] denote a character class, meaning "any of the characters herein". You want the parentheses () for grouping:
(A|B|AB|0)(\+|-)
When you are building an alternation (e.g. (A|B|AB|O)), you should be careful with the ordering of the elements. Many regex engines will stop at the first alternate that matches (rather than the longest). If it weren't for the [-+] forcing a backtrack, (A|B|AB|O)[-+] would not work for "AB+". It is probably better to say (AB|A|B|O)[-+] (but you should check the docs for your regex engine).
Also, if you do not intend to capture the antigen for latter use, you should you use the non-capturing grouping parentheses: (?:AB|A|B|O)[-+].
Furthermore, if you want to ensure that the only thing in the string is a blood type then you need anchors to prevent it from matching only part of the string: ^(?:AB|A|B|O)[-+]$. A quick note on anchors, Depending on your regex engine, ^ may match the beginning of a line rather than the beginning of the string if you pass it a multiline-match option. Similarly, $ may match the end of a line rather than the end of a string. For this reason there are three other anchors in common (but not %100) usage: \A, \Z, and \z. If your regex engine supports them, \A always matches the start of the string, \Z matches the end of the string or a newline just before the end of the string, and \z matches only the send of the string.
For case insensitive within html pattern attribute you may try this
([AaBbOo]|[Aa][Bb])[\+-]
<input type="text" maxlength="3" pattern="([AaBbOo]|[Aa][Bb])[\+-]" required />
^(A|B|AB|O)[+-]?$
This will produce the correct out put.

Need a simple RegEx to find a number in a single word

I've got the following url route and i'm wanting to make sure that a segment of the route will only accept numbers. as such, i can provide some regex which checks the word.
/page/{currentPage}
so.. can someone give me a regex which matches when the word is a number (any int) greater than 0 (ie. 1 <-> int.max).
/^[1-9][0-9]*$/
Problems with other answers:
/([1-9][0-9]*)/ // Will match -1 and foo1bar
#[1-9]+# // Will not match 10, same problems as the first
[1-9] // Will only match one digit, same problems as first
If you want it greater than 0, use this regex:
/([1-9][0-9]*)/
This'll work as long as the number doesn't have leading zeros (like '03').
However, I recommend just using a simple [0-9]+ regex, and validating the number in your actual site code.
This one would address your specific problem. This expression
/\/page\/(0*[1-9][0-9]*)/ or "Perl-compatible" /\/page\/(0*[1-9]\d*)/
should capture any non-zero number, even 0-filled. And because it doesn't even look for a sign, - after the slash will not fit the pattern.
The problem that I have with eyelidlessness' expression is that, likely you do not already have the number isolated so that ^ and $ would work. You're going to have to do some work to isolate it. But a general solution would not be to assume that the number is all that a string contains, as below.
/(^|[^0-9-])(0*[1-9][0-9]*)([^0-9]|$)/
And the two tail-end groups, you could replace with word boundary marks (\b), if the RE language had those. Failing that you would put them into non-capturing groups, if the language had them, or even lookarounds if it had those--but it would more likely have word boundaries before lookarounds.
Full Perl-compatible version:
/(?<![\d-])(0*[1-9]\d*)\b/
I chose a negative lookbehind instead of a word boundary, because '-' is not a word-character, and so -1 will have a "word boundary" between the '-' and the '1'. And a negative lookbehind will match the beginning of the string--there just can't be a digit character or '-' in front.
You could say that the zero-width assumption ^ is just one of the cases that satisfies the zero-width assumption (?<![\d-]).
string testString = #"/page/100";
string pageNumber = Regex.Match(testString, "/page/([1-9][0-9]*)").Groups[1].Value;
If not matched pageNumber will be ""
While Jeremy's regex isn't perfect (should be tested in context, against leading characters and such), his advice is good: go for a generic, simple regex (eg. if you must use it in Apache's mod_rewrite) but by any means, handle the final redirect in server's code (if you can) and do a real check of parameter's validity there.
Otherwise, I would improve Jeremy's expression with bounds: /\b([1-9][0-9]*)$/
Of course, a regex cannot provide a check against any max int, at best you can control the number of digits: /\b([1-9][0-9]{0,2})$/ for example.
This will match any string such that, if it contains /page/, it must be followed by a number, not consisting of only zeros.
^(?!.*?/page/([0-9]*[^0-9/]|0*/))
(?! ) is a negative look-ahead. It will match an empty string, only if it's contained pattern does not match from the current position.