escaping bracket in postgresql query - regex

I am trying to escape a bracket in a pattern matching expression for PostgreSQL 8.2
The clause looks something like:
WHERE field SIMILAR TO '%UPC=\[ R%%(\mLE)%'
but I keep getting:
ERROR: invalid regular expression: brackets [] not balanced

Try this:
select '%UPC=\[ R%%(\mLE)%';
WARNING: nonstandard use of escape in a string literal
LINE 1: select '%UPC=\[ R%%(\mLE)%';
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
?column?
------------------
%UPC=[ R%%(mLE)%
(1 row)
You need to set Postgres in standard conforming strings mode instead of backward compatible mode.
set standard_conforming_strings=1;
select '%UPC=\[ R%%(\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
Or you need to use escape string syntax which works regardless of mode:
set standard_conforming_strings=1;
select E'%UPC=\\[ R%%(\\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
set standard_conforming_strings=0;
select E'%UPC=\\[ R%%(\\mLE)%';
?column?
--------------------
%UPC=\[ R%%(\mLE)%
(1 row)
You can set this setting in postgresql.conf for all databases, using alter database for single database, using alter user for single user or group of users or using set for current connection.

Related

How to replace a period within string and not in Numeric using singlestore (MemSQL) DB REGEXP_REPLACE function

I have a scenario wherein I want to replace a period when its surrounded by Alphabets and not when surrounded by Numbers. I figured out a Regular Expression pattern that can identify only the periods in Key names but the pattern is not working in SQL
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55","(?<!\d)(\.)(?!\d)","_","ig");
Expected output: Amount_fee:0.75,Amount_tot:645.55
Note, I am trying this because, In MemSQL I couldn't access JSON key when it has period in it.
Also verified the pattern "(?<!\d)(.)(?!\d)" using https://coding.tools/regex-replace and it working fine. But, SQL is not working. Am using MemSQL 7.1.9 and POSIX Enhanced Regular expression are supposed to be work. Any help is much appreciated.
Since it looks like you are trying to workaround accessing a JSON key with a period, I will show you how to do that.
This can be done by either surrounding the json key name with backtics while using the shorthand json extract syntax:
select col::%`Amount.fee` from (select '{"Amount.fee":0.75,"Amount.tot":645.55}' col);
+--------------------+
| col::%`Amount.fee` |
+--------------------+
| 0.75 |
+--------------------+
or by using the json_extract_ builtins directly:
select json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee');
+------------------------------------------------------------------------------+
| json_extract_double('{"Amount.fee":0.75,"Amount.tot":645.55}', 'Amount.fee') |
+------------------------------------------------------------------------------+
| 0.75 |
+------------------------------------------------------------------------------+
Assuming you only want to target dots that are in between two non digit characters, where the dot is not the first or last character in the string, you may match on ([^\d])\.([^\d]) and replace with \1_\2:
SELECT REGEXP_REPLACE("Amount.fee:0.75,Amount.tot:645.55", "([^\d])\.([^\d])", "\1_\2", "ig");
Here is a regex demo showing that the replacement is working. Note that you might have to use $1_$2 instead of \1_\2 as the replacement, depending on the regex flavor of your SQL tool.

Use Regex from a column in Redshift

I have 2 tables in Redshift, one of them has a column containing Regex strings. And I want to join them like so:
select *
from one o
join two t
on o.value ~ t.regex
But this query throws an error:
[Amazon](500310) Invalid operation: The pattern must be a valid UTF-8 literal character expression
Details:
-----------------------------------------------
error: The pattern must be a valid UTF-8 literal character expression
code: 8001
context:
query: 412993
location: cgx_impl.cpp:1911
process: padbmaster [pid=5211]
-----------------------------------------------;
As far as I understood from searching in the docs, the right side of a regex operator ~ must be a string literal.
So this would work:
select *
from one o
where o.value ~ 'regex'
And this would fail:
select *
from one o
where 'regex' ~ o.value
Is there any way around this? Anything I missed?
Thanks!
Here's a workaround I am using. Maybe it's not super fast, but it works:
First create a function:
CREATE FUNCTION is_regex_match(pattern text, s text) RETURNS BOOLEAN IMMUTABLE AS $$
import re
return True if re.search(pattern, s) else False
$$ LANGUAGE plpythonu;
Then use it like this (o.value contains a regex pattern):
select *
from one o
where is_regex_match(o.value, 'some string');
You could try using the built-in function regexp_substr()
https://docs.aws.amazon.com/redshift/latest/dg/REGEXP_SUBSTR.html
select *
from one o
join two t
on regexp_substr(o.value, t.regex) <> ''
Edit example added of raw query
It appears that the fields must be explicitly cast as varchars when built.
with fake_table as (
SELECT 'sample value'::varchar as value, '[a-z]'::varchar as pattern
)
SELECT *
, regexp_substr(value, pattern)
FROM
fake_table
WHERE
regexp_substr(value, pattern) <>''

complex regex to identify parameters with brackets only

I'm trying to build a regex that will identify parameters inside brackets and ignore pl/sql comments (single line --, and multiple lines /* */)
For instance:
create or replace table_name ---sfjdslkfjslkfjslkfjdsfsdf
**(var1 in out number, var2 number)**
/* sdfls
sfdsd jfs
sfs f
sd f
sfsf */
AS
BEGIN
(var1 in out number, var2 number) should be matched only. It should also account for cases where:
There are not comments (single or multiple lines)
There are only single line comments either before or after the parameters
There are only multiple line comments either before or after the parameters
There are no parameters
Assumptions:
Parameters are always enclosed in brackets ()
Procedures can sometimes have no parameters but have comments (either single or multiple line comments) before the AS BEGIN clause
Procedures start with create or replace table_name
We're only interested to read the until the AS BEGIN clause
In other words, I need to find the index of the first opening bracket '(' that is outside any comments (single or multiple lines) and that comes before the AS BEGIN clause.
UPDATE:
I have managed to match the comments using the following regex:
(?:\/\*(?:[\s\S]*?)\*\/)|(?:\-\-(?:.*)$)
For instance here it will match all the comments:
create or replace table_name
-- sdlfksl kjs slkjslds js
/* lsdjfdkj
s fskjfs
sf sf
sdf;;''
sfs fs
*/
(hello number, var2 number)
--sdflksf
/*sl --sdflks s kdjfls())({fsfs */
AS
BEGIN
I can do this now in Java to identify the first opening bracket outside of any matching group. However it would be easier if I could just ignore the matching group and match the one parameters in the brackets only instead.
EDIT
This is not asking for a solution with pl/sql or sqlplus or whatever. I have a few pl/sql procedures stored in files that I need to modify and add new parameters to. I'm using Java to do that and inside Java im using a combination of loops and regexs.
Not sure why you want to reinvent the wheel. In Oracle, you could use the *_ARGUMENTS view to get the list of all the arguments.
For example,
SQL> CREATE OR REPLACE
2 PROCEDURE p(
3 in_date DATE,
4 out_text VARCHAR2)
5 AS
6 BEGIN
7 NULL;
8 END;
9 /
Procedure created.
SQL>
SQL> SELECT object_name,
2 package_name,
3 argument_name,
4 position,
5 data_type
6 FROM USER_ARGUMENTS
7 WHERE OBJECT_NAME = 'P';
OBJECT_NAM PACKAGE_NA ARGUMENT_NAME POSITION DATA_TYPE
---------- ---------- --------------- ---------- ---------
P OUT_TEXT 2 VARCHAR2
P IN_DATE 1 DATE
SQL>
I'm unfamiliar with Oracle's flavor of regexes, but I made a pattern that works in most regex-engines. I know MySQL has a wildly different syntax so this is quite possibly also the case in Oracle, but the general idea is this:
create or replace table_name(?:(?!AS\s+BEGIN)(?:--[^\n]*(?:\n|$)|\/\*(?:[^*]|\*(?!\/))*\*\/|(?!\/\*|--)[^(]))*\(([^)]+)
Debuggex Demo
It's a "match this except in contexts A, B or C" approach also used here and explained in depth in this SO answer. Combined with the requirement that whatever we're matching isn't followed by AS\s+BEGIN.
You can see in the Debuggex Demo that the correct part between brackets is matched in capture group 1. As well, the 2nd create doesn't have parameters, and indeed nothing is matched (even though there's brackets in comments and after the AS BEGIN.
The task at hand is porting this to Oracle's regex flavor.

Postgres Query with Regex

I'm trying to create a regex to find (and then eventually replace) parts of strings in a PG DB. I'm using PSQL 9.0.4
I've tested my regex outside of PG and it works perfectly. However, it isn't playing well with PG. If anyone can help me understand what I'm doing wrong it would me much appreciated.
Regex:
{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}
Postgres Query:
SELECT COUNT(*) FROM (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_xx']\)\);.*\n} \n{\/php}') as x;
Postgres Response:
WARNING: nonstandard use of escape in a string literal
LINE 1: ...M (SELECT * FROM "Table" WHERE "Column" ~ '{php}.*\n...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: syntax error at or near "mister_xx"
LINE 1: ..."Table" WHERE "Column" ~ '{php}.*\n.*\n.*'mister_x...
In SQL, quotes are delimited as two quotes, for example:
'Child''s play'
Applying this to your regex makes it work:
SELECT COUNT(*)
FROM "Table"
WHERE "Column" ~ '{php}.*\n.*\n.*''mister_xx'']\)\);.*\n} \n{\/php}' as x;
Note also how the redundant subquery .
You need to double escape the backslashes and add an E before the statement:
SELECT * FROM "Table" WHERE "Column" ~ E'{php}\n.\n.*''mister_xx'']\)\);.*\n} \n{\/php}'

Postgresql regex to match uppercase, Unicode-aware

The title sums it up pretty well. I'm looking for a regular expression matching Unicode uppercase character for the Postgres ~ operator.
The obvious way doesn't work:
=> select 'A' ~ '[[:upper:]]';
?column?
----------
t
(1 row)
=> select 'Ó' ~ '[[:upper:]]';
?column?
----------
t
(1 row)
=> select 'Ą' ~ '[[:upper:]]';
?column?
----------
f
(1 row)
I'm using Postgresql 9.1 and my locale is set to pl_PL.UTF-8. The ordering works fine.
=> show LC_CTYPE;
lc_ctype
-------------
pl_PL.UTF-8
(1 row)
The regexp engine of PG 9.1 and older versions does not correctly classify characters whose codepoint doesn't fit it one byte.
The codepoint of 'Ó' being 211 it gets it right, but the codepoint of 'Ą' is 260, beyond 255.
PG 9.2 is better at this, though still not 100% right for all alphabets. See this commit in PostgreSQL source code, and particularly these parts of the comment:
remove the hard-wired limitation to not consider wctype.h results for
character codes above 255
and
Still, we can push it up to U+7FF (which I chose as the limit of
2-byte UTF8 characters), which will at least make Eastern Europeans
happy pending a better solution
Unfortunately this was not backported to 9.1
I've found that perl regular expressions handles Unicode perfectly.
create extension plperl;
create function is_letter_upper(text) returns boolean
immutable strict language plperl
as $$
use feature 'unicode_strings';
return $_[0] =~ /^\p{IsUpper}$/ ? "true" : "false";
$$;
Tested on postgres 9.2 with perl 5.16.2.