How can I extract all numbers in a string?
Sample inputs:
7nr-6p
12c-18L
12nr-24L
11nr-12p
Expected Outputs:
{7,6}
{12,18}
{12,24}
etc...
The following is tested with the first one, 7nr-6p:
select regexp_split_to_array('7nr-6p', '[^0-9]') AS new_volume from mytable;
Gives: {7,"","",6,""} // Why is a numeric-only match returning spaces?
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
select regexp_matches('7nr-6p', '\d'::text) from mytable;
Gives: {7}
select NULLIF(regexp_replace('7nr-6p', '\D',',','g'), '')::text from mytable;
Gives: 7,,,6,
The following query:
select regexp_split_to_array(regexp_replace('7nr-6p', '^[^0-9]*|[^0-9]*$', 'g'), '[^0-9]+')
AS new_volume from mytable;
"Trims" the prefix and suffix non-numbers and splits by the remaining non-numbers.
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
Because without the 'g' flag, the regex stops at the first match.
Add the 'g' flag:
select regexp_matches('7nr-6p', '[0-9]*'::text, 'g') from mytable;
You can replace all text and then split:
SELECT regexp_split_to_array(
regexp_replace('7nr-6p', '[a-zA-Z]', '','g'),
'[^0-9]'
)
This returns {7,6}
SELECT id, (regexp_matches(string, '\d+', 'g'))[1]::int AS nr
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nr
----+----
1 | 7
1 | 6
2 | 12
2 | 18
3 | 12
3 | 24
4 | 11
4 | 12
I wanted them in a single cell so I could extract them as needed
SELECT id, trim(regexp_replace(string, '\D+', ',', 'g'), ',') AS nrs
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nrs
----+-------
1 | 7,6
2 | 12,18
3 | 12,24
4 | 11,12
dbfiddle here
Here is a more robust solution
CREATE OR REPLACE FUNCTION get_ints_from_text(TEXT) RETURNS int[] AS $$
select array_remove(regexp_split_to_array($1,'[^0-9]+','i'),'')::int[];
$$ LANGUAGE SQL IMMUTABLE;
Example
select get_ints_from_text('7nr-6p'); -- 7,6
-- also resilient in situations like
select get_ints_from_text('-7nr--6p'); -- 7,6
Here is a link to try
http://sqlfiddle.com/#!17/c6ac7/2
I feel that wrapping this functionality into an immutable function is prudent. This is a pure function, one that will not mutate data and one that returns the same result given the same input. Immutable functions marked as "immutable" have performance benefits.
By using a function we also benefit from abstraction. There is one source to update should this functionality need to improve in the future.
For more information about immutable functions see
https://www.postgresql.org/docs/10/static/sql-createfunction.html
I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks
Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..
I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.
I have an issue of regex_substr not honoring the null value.
select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1) AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2) AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3) AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4) AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6) AS pub_indctr
from dual;
If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.
Result:-
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
While it should be
PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR
---------- ------- ------- ----------- ---------- ------------
2035197553 2 S 14-JUN-14 P
Any suggestions ?
I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
2 from dual;
P
-
You need to do this instead I believe (works on 11g):
SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
2 from dual;
P
-
P
I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls
You can solve your task like this:
with t(val) as (
select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
, substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
, substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
, substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
, substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
, substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
from t1
A B C D E F
-------------------------------------
2035197553 2 S 14-JUN-14 - P
The typical csv parsing approach is as follows:
WITH t(csv_str) AS
( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
UNION ALL
SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
)
SELECT LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
LTRIM(REGEXP_SUBSTR (','
|| csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t
I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.
Explanation of the search pattern
The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:
-The pattern begins with a ','
-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.
-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.
What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.
You need to change this line,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt,
to,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5) AS phn_end_dt,
^
[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.
Thanks for pointing me in the right direction, I have used this to solve the issue.
SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,
REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,
REGEXP_SUBSTR (val
|| ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr
FROM
(SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual
);
Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.
function substring_specific_occurence(p_string varchar2
,p_delimiter varchar2
,p_occurence number) return varchar2
is
l_output varchar2(2000);
g_miss_char varchar2(20) := 'fdkjkjhkuhhf7';
l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );
begin
while (l_string like '%'||p_delimiter||p_delimiter||'%' )
loop
l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
end loop;
select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence)
into l_output
from dual;
return replace(l_output,g_miss_char);
end substring_specific_occurence;