Postgres regex issue

Postgres regex issue - regex

I need to find all records stored in postgres, which matching following regexp:
^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$
Something like this:
SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$'
But this one falls with error:
invalid regular expression: quantifier operand invalid
Why won't Postgres work with this regex?
Using the same one in plain Ruby works just fine.
UPDATE
Problem is only with WHERE. When i try to:
SELECT '+79637434199' ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'
Postgres returns true. But when i try:
SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'
Result: "invalid regular expression: quantifier operand invalid".

You don't need to escape - inside a character class when you put it at the first or last position, because it cannot be misread as range that way:
[\- ] → [- ]
[\d\- ] → [\d -]
The way you have it the upper bound 10 at the end is futile.
Add $ at the end to disallow trailing characters.
Or \D to disallow trailing digits (but require a non-digit).
Or ($|\D) to either end the string there or have a non-digit follow.
Put together:
SELECT '+79637434199' ~ '^(8|\+7)[ -]?(\(?\d{3}\)?[ -]?)[\d -]{7,10}($|\D)'
Otherwise your expression is just fine and it works for me on PostgreSQL 9.1.4. It should not make any difference whatsoever whether you use it in a WHERE clause or in a SELECT list - unless you are running into a bug with some old version (like #kgrittn commented).
If I prepend the string literal with E, I can provoke the error message that you get. This cannot explain your problem, because you stated that the expression works fine as SELECT item.
But, as Sherlock Holmes is quoted, "when you have excluded the impossible, whatever remains, however improbable, must be the truth."
Maybe you ran one test with standard_conforming_strings = on and the other one with standard_conforming_strings = off - this was the default interpretation of string literals in older versions before 9.1. Maybe with two different clients (that have a different setting as to that).
Read more in the chapter String Constants with C-style Escapes in the manual.

Related

What is the regular expression for all pages except "/"?

I am using NextAuth for Next.js for session management. In addition, I am using the middleware.js to protect my routes from unauthenticated users.
According to https://nextjs.org/docs/advanced-features/middleware#matcher,
if we want to exclude a path, we do something like
export const config = {
matcher: [
/*
* Match all request paths except for the ones starting with:
* - api (API routes)
* - static (static files)
* - favicon.ico (favicon file)
*/
'/((?!api|static|favicon.ico).*)',
],
}
In this example, we exclude /api, /static,/favicon.icon. However, I want to exclude all path except the home page, "/". What is the regular expression for that? I am tried '/(*)'. It doesn't seem to work.

The regular expression which matches everything but a specific one-character string / is constructed as follows:
we need to match the empty string: empty regex.
we need to match all strings two characters long or longer: ..+
we need to match one-character strings which are not that character: [^/].
Combining these three together with the | branching operator: "|..+|[^/]".
If we are using a regular expression tool that performs substring searching rather than a full match, we need to use its anchoring features; perhaps it supports the ^ and $ notation for that: "^(|..+|[^/])$".
I'm guessing that you might not want to match empty strings; in which case, revise your requirement and drop that branch from the expression.
Suppose we wanted to match all strings, except for a specific fixed word like abc. Without negation support in the regex language, we can use a generalization of the above trick.
Match the empty string, like before, if desired.
Match all one-character strings: .
Match all two-character strings: ..
Match all strings longer than three characters: ....+
Those simple cases taken care of, we focus on matching just those three-symbol strings that are not abc. How can we do that?
Match all three-character strings that don't start with a: [^a]...
Match all three-character strings that don't have a b in the middle: .[^b].
Match all three-character strings that don't end in c: ..[^c].
Combine it all together: "|.|..|....+|[^a]..|.[^b].|..[^c]".
For longer words, we might want to take advantage of the {m,n} notation, if available, to express "match from zero to nine characters" and "match eleven or more characters".

I will need to exclude the signin page and register page as well. Because, it will cause an infinite loop and an error, if you don't exclude signin page. For register page, you won't be able to register if you are redirected to the signin page.
So the "/", "/auth/signin", and "/auth/register" will be excluded. Here is what I needed:
export const config = {
matcher: [
'/((?!auth).*)(.+)'
]
}

Regexp_like vs regex validators online - diferent results

I have a regex expression for email validation using plsql that is giving me some headaches... :)
This is the condition I'm using for an email (rercear12345#gmail.com) validation:
IF NOT REGEXP_LIKE (user_email, '^([\w\-\.]+)#((\[([0-9]{1,3}\.){3}[0-9]{1,3}\])|(([\w\-]+\.)+)([a-zA-Z]{2,4}))$') THEN
control := FALSE;
dbms_output.put_line('EMAIL '||C.user_email||' not according to regex');
END IF;
If I make a select based on the expression I don't get any values either:
Select * from TABLE_X where REGEXP_LIKE (user_email, '^([\w\-\.]+)#((\[([0-9]{1,3}\.){3}[0-9]{1,3}\])|(([\w\-]+\.)+)([a-zA-Z]{2,4}))$');
Using regex101.com I get full match with this email: rercear12345#gmail.com
Any idea?

The regular expression syntax that Oracle supports is in the documentation.
It seems Oracle doesn't understand the \w inside the []. You can expand that to:
with table_x (user_email) as (
select 'rercear12345#gmail.com' from dual
union all
select 'bad name#gmail.com' from dual
)
Select * from TABLE_X
where REGEXP_LIKE (user_email, '^[a-zA-Z_0-9.-]+#((\[([0-9]{1,3}\.){3}[0-9]{1,3}\])|([a-zA-Z_0-9-]+.)+[a-zA-Z]{2,4})$');
USER_EMAIL
----------------------
rercear12345#gmail.com
You don't need to escape the . or - inside the square brackets, by doing that you would allow literal backslashes to be matched.
This sort of requirement has come up before - e.g. here - but you seem be allowing IP address octets instead of FQDNs, enclosed in literal square brackets, which is unusual.
As #BobJarvis said you could also use the [:alnum:] but would still need to include underscore. That could allow non-ASCII 'letter' characters you aren't expecting; though they may be valid, as are other symbols you exclude; you seem to be following the 'common advice' mentioned in that article though.

Using set operators with python regex module

I'm having trouble getting set operators to work in the regex module (regex 2013-11-29) in python-3.x. For example, to match ASCII characters minus punctuation I have tried:
import regex as rx
data = '(foo)'
for m in rx.finditer(r'[\p{ASCII}--\p{P}]+',data):
print(m.group(0)) # expect 'foo', getting '(foo)'
The documentation gives this example:
[\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'
Am I missing something here?

It sounds like you need to explicitly opt into Version 1 behavior so that the -- is interpreted as a set operator and not as characters to include in the class.
From the module web page:
Version 1 behaviour (new behaviour, different from the current re
module):
Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.
.split will split a string at a zero-width match.
Inline flags apply to the end of the group or pattern, and they can be
turned off.
Nested sets and set operations are supported.
Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the regex module will default to
regex.DEFAULT_VERSION. In the short term this will be VERSION0, but in
the longer term it will be VERSION1.

Why is it selecting this file?

I have the following statement:
Directory.GetFiles(filePath, "A*.pdf")
.Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].*"))
.Skip((pageNum - 1) * pageSize)
.Take(pageSize)
.Select(path => new FileInfo(path))
.ToArray()
My problems is that the above statement also finds the file "Adali.pdf" which it should not - but i cannot figure out why.
The above statement should only select files starting with a, and where the second letter is in the range i-l.

Because it matches Adali taking 3rd and 4th characters (al):
Adali
--
Try using ^ in your regex which allows looking for start of the string (regex cheatsheet):
Regex.IsMatch(..., "^[Aa][i-lI-L].*")
Also I doubt you need asterisk at all.
PS: As a sidenote let me notice that this question doesn't seem to be written that good. You should try debugging this code yourself and particularly you should try checking your regex against your cases without LINQ. I'm sure there is nothing to do here with LINQ (the tag you have in your question), but the issue is about regular expressions (which you didn't mention in tags at all).

You are not anchoring the string. This makes the regex match the al in Adali.pdf.
Change the regex to ^[Aa][i-lI-L].* You can do just ^[Aa][i-lI-L] if you don't need anything besides matching.

You should to do this
var f = Directory.GetFiles(tb_Path.Text, "A*.pdf").Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].pdf")).ToArray();
When you call ".*" Adali accept in Regex

RegEx check if string contains certain value

I need some help with writing a regex validation to check for a specific value
here is what I have but it don't work
Regex exists = new Regex(#"MyWebPage.aspx");
Match m = exists.Match(pageUrl);
if(m)
{
//perform some action
}
So I basically want to know when variable pageUrl will contains value MyWebPage.aspx
also if possible to combine this check to cover several cases for instance MyWebPage.aspx, MyWebPage2.aspx, MyWebPage3.aspx
Thanks!

try this
"MyWebPage\d*\.aspx$"
This will allow for any pages called MyWebPage#.aspx where # is 1 or more numbers.

if (Regex.Match(url, "MyWebPage[^/]*?\\.aspx")) ....
This will match any form of MyWebPageXXX.aspx (where XXX is zero or more characters). It will not match MyWebPage/test.aspx however

That RegEx should work in the case that MyWebPage.aspx is in your pageUrl, albeit by accident. You really need to replace the dot (.) with \. to escape it.
Regex exists = new Regex(#"MyWebPage\.aspx");
If you want to optionally match a single number after the MyWebPage bit, then look for the (optional) presence of \d:
Regex exists = new Regex(#"MyWebPage\d?\.aspx");

I won't post a regex, as others have good ones going, but one thing that may be an issue is character case. Regexs are, by default, case-sensitive. The Regex class does have a static overload of the Match function (as well as of Matches and IsMatch) which takes a RegexOptions parameter allowing you to specify if you want to ignore case.
For example, I don't know how you are getting your pageUrl variable but depending on how the user typed the URL in their browser, you may get different casings, which could cause your Regex to not find a match.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Postgres regex issue - regex

Related

What is the regular expression for all pages except "/"?

Regexp_like vs regex validators online - diferent results

Using set operators with python regex module

Why is it selecting this file?

RegEx check if string contains certain value

Categories

Resources