Postgres regexp replace not working - regex

Im trying to create a regexp for this query:
SELECT gruppo
FROM righe_conto_ready
WHERE regexp_replace(gruppo,'(\[{1})|(\].*?\[)|(\].*$)','','g') = '[U6][U53]'
LIMIT 10
This is an example of 'gruppo' column:
[U6] CAFFETTERIA [U43] THE E TISANE
Im currently using this query for testing:
SELECT regexp_replace(gruppo,'(\[{1})|(\].*?\[)|(\].*$)','','g') FROM ....
and it returns just U6
How can i change the regexp to remove everything outside brackets?

You can use regexp_matches() with the much simpler regular expression:
with righe_conto_ready(gruppo) as (
select '[U6] CAFFETTERIA [U43] THE E TISANE'::text
)
select gruppo
from righe_conto_ready,
lateral regexp_matches(gruppo, '\[.+?\]', 'g') matches
group by 1
having string_agg(matches[1], '') = '[U6][U43]'
gruppo
-----------------------------------------
[U6] CAFFETTERIA [U43] THE E TISANE
(1 row)
When you are looking for multiple matches of some pattern, regexp_matches() seems more natural than regexp_replace().
You can also search for first two substrings in brackets (without the g flag the function yields no more than one row):
select gruppo
from righe_conto_ready,
lateral regexp_matches(gruppo, '(\[.+?\]).*(\[.+?\])') matches
where concat(matches[1], matches[2]) = '[U6][U43]'

Related

Regular expression to get specific pattern in snowflake

I have a table with column data like below.
Column5 :
1) ["[\"( "ABC12345678", "ABC00123451","ABC00543211")\"]"]
2) ["[\"( ABC87654321\"]"]
I just need to clean this column and fetch it like below.
1) ABC12345678,ABC00123451,ABC00543211
2) ABC87654321
Currently I am using replace function repeatedly to clean the data
replace( replace (replace (replace(replace(replace(replace (Column5,'[',''),']',''),'',''),'"',''),'\'',''),')',''),'(','') as column5list
is there any regular expression which can I use for the purpose to clean the data.
Pattern remains same ABC followed by 8 digits
Nested REPLACE could be simplfiied with TRANSLATE:
SELECT Column5, TRANSLATE(Column5, $$[]"'()$$, '') AS result
FROM tab;
Sample data:
CREATE OR REPLACE TABLE tab AS
SELECT '["[\"( "ABC12345678", "ABC00123451","ABC00543211")\"]"]' AS column5;
Output:

Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.
so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned
2#9#;2#37#65 -> 2#37#65
2#9#;2#37#65;2#37# -> 2#37#65
2#9#;2#37#65;2#37#;2#37#56 -> 2#37#65 or 2#37#56
2#37#65;2#99 -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99# -> 2#37 or 2#99# ( in this case any value)
I tried few patterns and other pattern but no help.
regexp_substr('2#9#;2#37#65;2#37#','#[^;]+',1)
SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#]+$'),2)
You can use a REGEXP_REPLACE here:
REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9]+(#[0-9]+){2,}).*|^([^;]+).*', '\2\4')
See the regexp demo
Details:
^ - start of string
(.*;)? - an optional Group 1 capturing any text and then a ;
([0-9]+(#[0-9]+){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
.* - the rest of the string
| - or
^([^;]+).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.
The replacement is Group 2 + Group 4 values.
Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.
First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.
In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.
Here's the query, run on a few more sample inputs to test those cases too.
with
sample_data (uo_filiere) as (
select '2#9#;2#37#65' from dual union all
select '2#9#;2#37#65;2#37#' from dual union all
select '2#9#;2#37#65;2#37#;2#37#56' from dual union all
select '2#37#65;2#99' from dual union all
select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
select '2#37#;2#99#' from dual union all
select '1#22#333' from dual union all
select '33#444#' from dual
)
select uo_filiere,
nvl(regexp_substr(uo_filiere, '(;|^)(([^;#]+#){2}[^;]+)', 1, 1, null, 2)
, substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
) as first_value
from sample_data
;
UO_FILIERE FIRST_VALUE
---------------------------- ----------------------------
2#9#;2#37#65 2#37#65
2#9#;2#37#65;2#37# 2#37#65
2#9#;2#37#65;2#37#;2#37#56 2#37#65
2#37#65;2#99 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 3#37#65
2#37#;2#99# 2#37#
1#22#333 1#22#333
33#444# 33#444#

Vertica REGEXP_SUBSTR use /g flag

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?
As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc.
Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.
The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:
WITH
-- one exemplary input row; concatenating substrings for
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
i -- the index from the in-line index table, needed
-- for ordering the outermost SELECT
, REGEXP_SUBSTR (
s -- the input string
, '(\w+)=' -- the pattern - a word followed by an equal sign; capture the word
, 1 -- start from pos 1
, i -- the i-th occurrence of the match
, '' -- no modifiers to regexp
, 1 -- the first and only sub-pattern captured
) AS token
FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
token
FROM find_token
WHERE token <> ''
ORDER BY i
;
The result will be one row per found pattern:
token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID
You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...
Happy playing ...
Marco

Selecting for a Jsonb array contains regex match

Given a data structure as follows:
{"single":"someText", "many":["text1", text2"]}
I can query a regex on single with
WHERE JsonBColumn ->> 'single' ~ '^some.*'
And I can query a contains match on the Array with
WHERE JsonBColumn -> 'many' ? 'text2'
What I would like to do is to do a contains match with a regex on the JArray
WHERE JsonBColumn -> 'many' {Something} '.*2$'
I found that it is also possible to convert the entire JSONB array to a plain text string and simply perform the regular expression on that. A side effect though is that a search on something like
xt 1", "text
would end up matching.
This approach isn't as clean since it doesn't search each element individually but it gets the job done with a visually simpler statement.
WHERE JsonBColumn ->>'many' ~ 'text2'
Use jsonb_array_elements_text() in lateral join.
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select distinct on (id) d.*
from
the_data d,
jsonb_array_elements_text(jsonbcolumn->'many') many(elem)
where elem ~ '^text.*';
id | jsonbcolumn
----+----------------------------------------------------
1 | {"many": ["text1", "text2"], "single": "someText"}
(1 row)
See also this answer.
If the feature is used frequently, you may want to write your own function:
create or replace function jsonb_array_regex_like(json_array jsonb, pattern text)
returns boolean language sql as $$
select bool_or(elem ~ pattern)
from jsonb_array_elements_text(json_array) arr(elem)
$$;
The function definitely simplifies the code:
with the_data(id, jsonbcolumn) as (
values
(1, '{"single":"someText", "many": ["text1", "text2"]}'::jsonb)
)
select *
from the_data
where jsonb_array_regex_like(jsonbcolumn->'many', '^text.*');

Regex non-capturing parenthesis issue

I have a database query which looks like this
select * from students join (select * from teachers) join (select * from workers
I had a requirement to tokenize this string based on 'select'.
I am trying regex (select)(.*?)((?:select)|$), ut it is matching only 2 times.
Request some pointers on how to achieve this.
I need the 3 output tokens as below
select * from students join (
select * from teachers) join (
select * from workers
I think this regex will work:
select.*?(?=select|$)
The regex matches the word select, then any text (not including new lines) up until right before the next select or the end of the string.
Demonstration here: http://regex101.com/r/sR3gV1
If you are trying to parse the select queries from the string then you can use this regex. Assuming you are not doing select from multiple tables(i.e. not doing select * from x,y,z)
(select.*?from\\s+\\w+)