Separating Text by Delimiter using regexp_subtr - regex

I am using plsql to separate parts of my text.
The text:
'a^b^c^d^e'
declare
test varchar2(10);
begin
select 'a^b^c^d^e' into test from dual;
dbms_output.put_line('1. '|| regexp_substr(test, '[^^]+', 1, 1));
dbms_output.put_line('2. '|| regexp_substr(test, '[^^]+', 1, 2));
dbms_output.put_line('3. '|| regexp_substr(test, '[^^]+', 1, 3));
dbms_output.put_line('4. '|| regexp_substr(test, '[^^]+', 1, 4));
dbms_output.put_line('5. '|| regexp_substr(test, '[^^]+', 1, 5));
end;
Output:
1. a
2. b
3. c
4. d
5. e
This works as expected, until a null is found in the middle (i.e. 'a^b^^d^e').
I expect that output to be:
1. a
2. b
3.
4. d
5. e
but the actual output is:
1. a
2. b
3. d
4. e
5.
I'm not real good at regex but I am most of the way there.
Any help would be appreciated.

See my answer here for more detail. Don't use the format '[^^]+' for parsing strings! It returns unexpected results when there is a NULL element in the list and will get you in big trouble as it will return the wrong element. Instead use this form of REGEXP_SUBSTR() as it handles NULL list elements:
REGEXP_SUBSTR('a^b^^d^e', '(.*?)(\^|$)', 1, 4, NULL, 1)
Run it and you'll see you will get 'd' returned as expected.

Related

there is a way to transpose two strings and have a table as result?

I have the next two strings:
String_Cod = 14521;65412;65845
String_Flags = 1;0;1
for code 14521 the flag is 1
for code 65412 the flag is 0
for code 65845 the flag is 1
in this order always
The result must be something like
I'm start with this query:
select regexp_substr(to_char(:STRING_COD),'[^;]+', 1, level)
from dual
connect BY regexp_substr(to_char(:STRING_COD), '[^;]+', 1, level)
is not null
select regexp_substr(to_char(:STRING_FLAGS),'[^;]+', 1, level)
from dual
connect BY regexp_substr(to_char(:STRING_FLAGS), '[^;]+', 1, level)
is not null
But i don't have an idea how continue to join both and get the result i need.
Can somebody give an advise?
Regards
You could add the level as another column in each query, and join them together:
select c.cod, f.flag
from (
select level as n, regexp_substr(to_char('14521;65412;65845'),'[^;]+', 1, level) as cod
from dual
connect BY regexp_substr(to_char('14521;65412;65845'), '[^;]+', 1, level)
is not null
) c
join (
select level as n, regexp_substr(to_char('1;0;1'),'[^;]+', 1, level) as flag
from dual
connect BY regexp_substr(to_char('1;0;1'), '[^;]+', 1, level)
is not null
) f
on f.n = c.n
which - with outer joins - would allow for different numbers of elements; or more simply as you suggest they will always match, use the same level for both extracts:
select regexp_substr(to_char('14521;65412;65845'),'[^;]+', 1, level) as cod,
regexp_substr(to_char('1;0;1'),'[^;]+', 1, level) as flag
from dual
connect BY regexp_substr(to_char('14521;65412;65845'), '[^;]+', 1, level)
is not null
COD | FLAG
:---- | :---
14521 | 1
65412 | 0
65845 | 1
db<>fiddle
This method of expanding a list of values also assumes you can never have null elements, in either list. Read more.

Django filtering with list and specific data

I'm sorry for the weird title. I don't know how to explain my problem in a short sentence. I'm trying to filter my model with a list but sometimes query returns multiple rows. For example:
all_pos [1,2,3]
query = MyModel.objects.filter(pos__in=all_pos)
The query above returns a list of rows from the database but second item in the list returns two rows with B and C in the second column.
1, A, word
2, B, word
2, C, word
3, A, word
4, C, word
But I only want the row with B on the second row and not losing 4th row with C. How can I filter this further so I can achieve the result below.
1, A, word
2, B, word
3, A, word
4, C, word
Your specifications are still not very clear. You need to specify by what logic you keep a result vs other.
But what I deduce, is you want the results ordered by col 1 ASC and col 2 alphabetically and distinct on col 1.
all_pos = [1, 2, 3, 4]
query = MyModel.objects.filter(pos__in=all_pos).order_by('col1', 'col2').distinct('col1')
Otherwise you will need 2 passes maybe like:
all_pos = [1, 2, 3, 4]
final_results = []
qs = MyModel.objects.filter(pos__in=all_pos)
for row in col1_qs:
if custom_logic_with(row.col2):
final_results.append(row)

extract all numbers in a string

How can I extract all numbers in a string?
Sample inputs:
7nr-6p
12c-18L
12nr-24L
11nr-12p
Expected Outputs:
{7,6}
{12,18}
{12,24}
etc...
The following is tested with the first one, 7nr-6p:
select regexp_split_to_array('7nr-6p', '[^0-9]') AS new_volume from mytable;
Gives: {7,"","",6,""} // Why is a numeric-only match returning spaces?
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
select regexp_matches('7nr-6p', '\d'::text) from mytable;
Gives: {7}
select NULLIF(regexp_replace('7nr-6p', '\D',',','g'), '')::text from mytable;
Gives: 7,,,6,
The following query:
select regexp_split_to_array(regexp_replace('7nr-6p', '^[^0-9]*|[^0-9]*$', 'g'), '[^0-9]+')
AS new_volume from mytable;
"Trims" the prefix and suffix non-numbers and splits by the remaining non-numbers.
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
Because without the 'g' flag, the regex stops at the first match.
Add the 'g' flag:
select regexp_matches('7nr-6p', '[0-9]*'::text, 'g') from mytable;
You can replace all text and then split:
SELECT regexp_split_to_array(
regexp_replace('7nr-6p', '[a-zA-Z]', '','g'),
'[^0-9]'
)
This returns {7,6}
SELECT id, (regexp_matches(string, '\d+', 'g'))[1]::int AS nr
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nr
----+----
1 | 7
1 | 6
2 | 12
2 | 18
3 | 12
3 | 24
4 | 11
4 | 12
I wanted them in a single cell so I could extract them as needed
SELECT id, trim(regexp_replace(string, '\D+', ',', 'g'), ',') AS nrs
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nrs
----+-------
1 | 7,6
2 | 12,18
3 | 12,24
4 | 11,12
dbfiddle here
Here is a more robust solution
CREATE OR REPLACE FUNCTION get_ints_from_text(TEXT) RETURNS int[] AS $$
select array_remove(regexp_split_to_array($1,'[^0-9]+','i'),'')::int[];
$$ LANGUAGE SQL IMMUTABLE;
Example
select get_ints_from_text('7nr-6p'); -- 7,6
-- also resilient in situations like
select get_ints_from_text('-7nr--6p'); -- 7,6
Here is a link to try
http://sqlfiddle.com/#!17/c6ac7/2
I feel that wrapping this functionality into an immutable function is prudent. This is a pure function, one that will not mutate data and one that returns the same result given the same input. Immutable functions marked as "immutable" have performance benefits.
By using a function we also benefit from abstraction. There is one source to update should this functionality need to improve in the future.
For more information about immutable functions see
https://www.postgresql.org/docs/10/static/sql-createfunction.html

PL/SQL split one to many rows

I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks
Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..
I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.

Oracle REGEX_SUBSTR Not Honoring null values

I have an issue of regex_substr not honoring the null value.
select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1) AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2) AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3) AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4) AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6) AS pub_indctr
from dual;
If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.
Result:-
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
While it should be
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
Any suggestions ?
I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
2 from dual;
P
-
You need to do this instead I believe (works on 11g):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
2 from dual;
P
-
P
I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls
You can solve your task like this:
with t(val) as (
select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
, substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
, substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
, substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
, substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
, substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
from t1
A B C D E F
-------------------------------------
2035197553 2 S 14-JUN-14 - P
The typical csv parsing approach is as follows:
WITH t(csv_str) AS
( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
UNION ALL
SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
)
SELECT LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t
I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.
Explanation of the search pattern
The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:
-The pattern begins with a ','
-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.
-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.
What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.
You need to change this line,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
to,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5) AS phn_end_dt,
^
[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.
Thanks for pointing me in the right direction, I have used this to solve the issue.
SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,
REGEXP_SUBSTR (val
|| ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr
FROM
(SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual
);
Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.
function substring_specific_occurence(p_string varchar2
,p_delimiter varchar2
,p_occurence number) return varchar2
is
l_output varchar2(2000);
g_miss_char varchar2(20) := 'fdkjkjhkuhhf7';
l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );
begin
while (l_string like '%'||p_delimiter||p_delimiter||'%' )
loop
l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
end loop;
select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence)
into l_output
from dual;
return replace(l_output,g_miss_char);
end substring_specific_occurence;