Regex in PostgreSQL - regex

I'm ultimately trying to use the following regex expression.
SELECT *
into table
FROM table2
Where
(Description ~ '\bD\s*(&|AND|&|N|AMP|\*|\+)\s*B.*')
However this returns the following errors:
[XX000] ERROR: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: 'P;|N|AMP|>>>HERE>>>|+)sB.'. Detail: ----------------------------------------------- error: Invalid preceding regular expression prior to repetition operator. The error occured while parsing the regular expression fragment: 'P;|N|AMP|>>>HERE>>>|+)sB.'. code: 8 ...
Any idea on the fix?

You should replace \b with \y (or \m) to fix the pattern, and you may put single chars inside a capturing group into a character class where you do not have to escape them, (&|\*|\+) -> [*+&]. Note you do not need .* at the end, unless you are matching (if you just check for a regex match with ~ you do not need it);
Use
'\yD\s*(AND|&|N|AMP|[*+&])\s*B'
See the online demo:
CREATE TABLE tb1
(website character varying)
;
INSERT INTO tb1
(website)
VALUES
('D AND B...'),
('ROCK''N''ROLL'),
('www.google.com'),
('More text here'),
('D N Brother')
;
SELECT * FROM tb1 WHERE website ~ '\yD\s*(AND|&|N|AMP|[*+&])\s*B';
Output

Related

Why do I get empty response for regexp_matches function while using positive lookahead (?=...)

Why the following code returns just empty brackets - {''}. How to make it return matching strings?
SELECT regexp_matches('ATGCATGCATGCCAACAACAACCTGTCAAGTGAGT','(?=..CAA)','g');
Expected output is:
regexp_matches
----------------
{GCCAA}
{AACAA}
{AACAA}
{GTCAA}
(4 rows)
but instead it returns the following:
regexp_matches
----------------
{""}
{""}
{""}
{""}
(4 rows)
I actually have a bit more complicated query, which requires positive lookahead in order to cover all occurrences of patterns in the string even if they overlap.
Well, it's not pretty, but you can do it without regular expressions or custom functions.
WITH data(d) as (
SELECT * FROM (VALUES ('ATGCATGCATGCCAACAACAACCTGTCAAGTGAGT')) v
)
SELECT substr(d, x, 5) AS match
FROM data
JOIN LATERAL (SELECT generate_series(1, length(d))) g(x) ON TRUE
WHERE substr(d, x, 5) LIKE '__CAA'
;
match
-------
GCCAA
AACAA
AACAA
GTCAA
(4 rows)
Basically, get each five letter slice of the string and see if it matches __CAA.
You could change generate_series(1, length(d)) to generate_series(1, length(d)-4) because the last ones will never match, but you would have to remember to update this if the length of your matching string changes.
Using a lookahead has the problem that the lookahead itself is not part of the match but it allows overlapping searches
Without using a lookahead, you lose the ability for overlapping searches.
Using Powershell, you can loop over the indexes returned from the lookaheads and use that as an index into your searchstring to get the matches
$string = 'ATGCATGCATGCCAACAACAACCTGTCAAGTGAGT'
$r = [regex]::new('(?=..CAA)')
$r.Matches($string) | % {$string.Substring($_.Index, 5)}
returns
GCCAA
AACAA
AACAA
GTCAA
I don't know how to translate this to PostgreSQL (or if that's even possible)
update:
Aparently it won't capture inside of an assertion, that's ok because
what you really need is the first 2 characters, which can safely be
consumed. It will only give you the first 2 characters per row, but
since you know the last 3, you can easily join the set elements
with the CAA constant.
Try this
..(?=CAA)
and you're done.
If I knew the bizarre sql language, I could show you how to do the join.
Output should now be
match
-------
GC
AA
AA
GT
(4 rows)
This is the regex you need for overlapped matches.
(?=(..CAA))
https://regex101.com/r/eJ36zb/1
I think you just need this sql statement which captures group 1:
SELECT regexp_matches('ATGCATGCATGCCAACAACAACCTGTCAAGTGAGT','(?=(..CAA))','g');
Formatted regex
(?=
( . . CAA ) # (1)
)
The reason you got empty strings in your result is that
you didn't give the expression anything to consume and
nothing to capture.
I.e., it matched at the right places, but nothing was consumed or captured.
So, doing it this way allows the overlap and the capture so it
should show up on the output now.
Lookahead is a zero-width assertion. It doesn't match anything. If you change your regular expression to just a regular match/capture, you'll get a result. For matching any two characters that are followed by CAA in your case, lookahead probably isn't necessary.

Postgres Regex Negative Lookahead

Scenario: Match any string that starts with "J01" except the string "J01FA09".
I'm baffled why the following code returns nothing:
SELECT 1
WHERE
'^J01(?!FA09).*' ~ 'J01FA10'
when I can see on regexr.com that it's working (I realize there are different flavors of regex and that could be the reason for the site working).
I have confirmed in the postgres documentation that negative look aheads are supported though.
Table 9-15. Regular Expression Constraints
(?!re) negative lookahead matches at any point where no substring
matching re begins (AREs only). Lookahead constraints cannot contain
back references (see Section 9.7.3.3), and all parentheses within them
are considered non-capturing.
Match any string that starts with "J01" except the string "J01FA09".
You can do without a regex using
WHERE s LIKE 'J01%' AND s != 'J01FA09'
Here, LIKE 'J01%' requires a string to start with J01 and then may have any chars after, and s != 'J01FA09' will filter out the matches.
If you want to ahieve the same with a regex, use
WHERE s ~ '^J01(?!FA09$)'
The ^ matches the start of a string, J01 matches the literal J01 substring and (?!FA09$) asserts that right after J01 there is no FA09 followed with the end of string position. IF the FA09 appears and there is end of string after it, no match will be returned.
See the online demo:
CREATE TABLE table1
(s character varying)
;
INSERT INTO table1
(s)
VALUES
('J01NNN'),
('J01FFF'),
('J01FA09'),
('J02FA09')
;
SELECT * FROM table1 WHERE s ~ '^J01(?!FA09$)';
SELECT * FROM table1 WHERE s LIKE 'J01%' AND s != 'J01FA09';
RE is a right side operand:
SELECT 1
WHERE 'J01FA10' ~ '^J01(?!FA09)';
?column?
----------
1
(1 row)

ERROR: invalid regular expression: quantifier operand invalid

I am trying to write a regular expression to get the "url" from the following text:
[![title](thumbnail_url?height=240&width=320)](url)
using the following postgresql query:
SELECT (SELECT m[1] FROM regexp_matches(text, '^\[\!\[^*\]\(^ *\)\]\(((^*))') AS r(m) LIMIT 1) FROM texts;
I am getting the following error while I execute the above query:
ERROR: invalid regular expression: quantifier operand invalid
If you want just the final "url", then this expression would do the trick:
.+\)\]\((\S+)\)
Throw away everything up to and including the literal )](, then capture everything until the literal ).
SELECT regexp_matches(text, '.+\)\]\((\S+)\)')[1] AS m
FROM texts
LIMIT 1;

Why do I get the error "Unmatched ( in regex" when trying to match something with Net::Telnet?

I'm attempting to match the string
Save settings:([Y]/[N]):
that the telnet client returns after issuing
print("exit")
(I'm using Net::Telnet.)
I have tried several regexes, including this one:
waitfor(“/^\s+ Save settings:([Y]/[N]):\s$/”
but I continue to receive the error:
bad match operator: Unmatched ( in regex; marked by <-- HERE in m/\s+Save settings?( <-- HERE [Y]/ <$data> line 1. at printer_config_test.pl line 36
How can I fix this?
You are providing the following regex match operator:
/^s+ Save settings:([Y]/ (followed by junk)
If you want to match the following string:
Save settings:([Y]/[N]):
You want the following regex pattern:
^\s*Save settings:\(\[Y\]/\[N\]\):
But waitfor wants a string containing a match operator. The following is the desired match operator:
/^\s*Save settings:\(\[Y\]\/\[N\]\):/
And the following is a string literal to that creates that string:
"/^\\s*Save settings:\\(\\[Y\\]\\/\\[N\\]\\):/"
So:
waitfor("/^\\s*Save settings:\\(\\[Y\\]\\/\\[N\\]\\):/")

Regex and textmatching issue

I am doing some basic text matching in Postgres 9.3.5.0.
Here is my code so far:
Select text from eightks
WHERE other_events = true and
keywordRegexs = [\y(director and member \s+ and resigned)\y/ix];
I am getting the following errors
psql:test3.sql:3: invalid command \y(director
psql:test3.sql:5: ERROR: syntax error at or near "["
LINE 3: keywordRegexs = [
I am trying to find documents which contain those exact phrases.
The regular expression match operator in Postgres is ~.
The case insensitive variant is ~*.
Branches are enclosed in ().
SELECT text
FROM eightks
WHERE other_events = true
AND keywordregexs ~* '(\y(director | member \s+ |resigned)\y)';
The meaning of "those exact phrases" is not clear in the question.
Details in the manual.