PL/SQL split one to many rows - regex

I have a table like this.
|PARAMKEY | PARAMVALUE
----------+------------
KEY |[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]]
I need to split the values into three columns and I use REGEXP_SUBSTR. Here is my code.
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1,1 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 2) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 3) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 4 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 5) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 6) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY'
UNION ALL
SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 7 ) PARAMETER
,REGEXP_SUBSTR(paramvalue, '[^],[",]+', 1, 8) VERSION
,REGEXP_SUBSTR(paramvalue, '[^],["]+', 1, 9) SCHEMA
FROM tmp_param_table
where paramkey = 'KEY';
and this is the result that i need.
PARAMETER | VERSION | SCHEMA
---------+---------+-------
PAR_A |2 |SCH_A
PAR_B |4 |SCH_B
PAR_C |3 |SCH_C
But the value is too long and I hope there is another way to make it simplier by using loop or anything.
Thanks

Try something like this:
with tmp_param_table as
(
select 'KEY' as PARAMKEY , '[["PAR_A",2,"SCH_A"],["PAR_B",4,"SCH_B"],["PAR_C",3,"SCH_C"]],["PAR_D",4,"SCH_D"]]' as PARAMVALUE from dual
),
levels as (select level as lv from dual connect by level <= 156),
steps as (select lv-2 as step from levels where MOD(lv,3)=0)
select step, (SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') parameter,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+1 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') version,
(SELECT REGEXP_SUBSTR(paramvalue, '[^],["]+',1, step+2 ) PARAMETER FROM tmp_param_table where paramkey = 'KEY') schema
from steps
Here
levels - returns numbers form 1 till 156 (52*3) (or whatever you need)
steps - are the numbers 1, 4, 7 etc with step 3
Results:
1 PAR_A 2 SCH_A
4 PAR_B 4 SCH_B
7 PAR_C 3 SCH_C
10 PAR_D 4 SCH_D
13
etc..

I have tried using regular expression
and part paramvalue column value into common separated value
SELECT
REGEXP_SUBSTR(COL, '[^],["]+', 1, 1) PARAMETER,
REGEXP_SUBSTR(COL, '[^],[",]+', 1, 2) VERSION,
REGEXP_SUBSTR(COL, '[^],["]+', 1, 3) SCHEMA
FROM
(
SELECT paramkey,REGEXP_SUBSTR(to_char(paramvalue),'[^][^]+',1,level ) COL
from tmp_param_table
connect by regexp_substr(to_char(paramvalue),'[^][^]+',1, level) is not null
)
WHERE COL <>','
I hope this may help.

Related

Reg Exp don't select if more than one group matches (multiple XOR)

The data are held in an Oracle 12c database, one row per ICD-10-CM code, with a patient ID (foreign key) like so (note that there could be many other codes, the following are just the ones pertinent to this question):
ID ICD10CODE
1 S72.91XB
1 S72.92XB
2 S72.211A
3 S72.414A
3 S72.415A
4 S32.509A
5 S32.301A
5 S32.821A
6 S32.421A
6 S32.422A
7 S32.421A
8 S32.421A
8 S32.509A
The task at hand is to select distinct patients that match only one of the following points (using standard regular expression syntax):
Any number of: S32\.1\w\w\w, S32\.2\w\w\w, S32\.3\w\w\w, S32\.5\w\w\w, S32\.6\w\w\w, S32\.7\w\w\w, S32\.8\w\w\w
Any number of: S32\.4\w1\w, S32\.4\w3\w, S32\.4\w4\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S32\.4\w2\w, S32\.4\w3\w, S32\.4\w5\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S72\.[0-8]\w1\w, S72\.[0-8]\w3\w, S72\.[0-8]\w4\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.[0-8]\w2\w, S72\.[0-8]\w3\w, S72\.[0-8]\w5\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.91\w\w, S72\.93\w\w, S72\.94\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any number of: S72\.92\w\w, S72\.93\w\w, S72\.95\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any permutation or combination (including repetitions) of codes listed within a bullet are permitted for each patient, but permutations or combinations across rows should occur mutually exclusively for a patient. My method is to apply LISTAGG on GROUP BY ID:
ID LISTAGG(ICD10CODE, ',')
1 S72.91XB,S72.92XB
2 S72.211A
3 S72.414A,S72.415A
4 S32.509A
5 S32.301A,S32.821A
6 S32.421A,S32.422A
7 S32.421A
8 S32.421A,S32.509A
Then filter using this regular expression, (S32\.(([1-3]|[5-8])|(4\w((1|4)|(2|5)|(3)|([5-9]))))\w+)|(S72\.(([0-8]\w((1|4)|(2|5)|(3)|([5-9])))|(9((1|4)|(2|5)|(3)|([5-9]))))\w+), which is almost a literal representation of the bullets above. My expression is adapted from the idea in this answer, where it seems that, ((RB\s+)+|(JJ\s+)+) automatically selects either "RB" or "JJ", but not both.
I cannot get it to work. The answer should contain only IDs 2, 4, 5, and 7. But, the expression I developed matches all IDs.
What is a solution to this problem?
[Edit] Some more information:
All these S codes above relate to injuries to the bones in the lower extremity: S32 is for fractures of the pelvis (hip bone), S72 is for fractures of the femur (thigh bone). Note that we have two femurs, and two acetabulum (socket of the pelvis where the femur connects). The S32.4 code denotes the acetabulum (the rest of the S32.[1235678]\w{3} series denotes other parts of the pelvis). Right and left femur and acetabulum are denoted by 1|4 or 2|5 in the 6th character, respectively, unless the code starts with S72.9 when those numbers appear in the 5th character.
The patients to be included in the study population should only have one of the bones broken. That means, one of the two femurs, one of the acetabulum, or the pelvis, but not a combination of them. Combinations of fractures of a single bone do not matter. For example, the right single femur can be broken in 10 different places and ways (the knee area, the middle shaft, the head, etc., each generating a different S72.\w[1|4]\w{2} code), and should still be selected.
Option 1:
You can do it with a single regular expression:
SELECT t.id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
WHERE REGEXP_LIKE(
t.icd10codes,
'^(S32\.[1235678]\w\w\w(,|$))+$'
|| '|^(S32\.4\w[1346789]\w(,|$))+$'
|| '|^(S32\.4\w[2356789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[1346789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[2356789]\w(,|$))+$'
|| '|^(S72\.9[1346789]\w\w(,|$))+$'
|| '|^(S72\.9[2356789]\w\w(,|$))+$'
)
Which, for your sample data:
CREATE TABLE table_name (ID, ICD10CODE) AS
SELECT 1, 'S72.91XB' FROM DUAL UNION ALL
SELECT 1, 'S72.92XB' FROM DUAL UNION ALL
SELECT 2, 'S72.211A' FROM DUAL UNION ALL
SELECT 3, 'S72.414A' FROM DUAL UNION ALL
SELECT 3, 'S72.415A' FROM DUAL UNION ALL
SELECT 4, 'S32.509A' FROM DUAL UNION ALL
SELECT 5, 'S32.301A' FROM DUAL UNION ALL
SELECT 5, 'S32.821A' FROM DUAL UNION ALL
SELECT 6, 'S32.421A' FROM DUAL UNION ALL
SELECT 6, 'S32.422A' FROM DUAL UNION ALL
SELECT 7, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.509A' FROM DUAL;
Outputs:
ID
ICD10CODES
2
S72.211A
4
S32.509A
5
S32.301A,S32.821A
7
S32.421A
Option 2:
You can put the regular expressions into a table:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32\.[1235678]\w\w\w' FROM DUAL UNION ALL
SELECT 2, 'S32\.4\w[1346789]\w' FROM DUAL UNION ALL
SELECT 3, 'S32\.4\w[2356789]\w' FROM DUAL UNION ALL
SELECT 4, 'S72\.[0-8]\w[1346789]\w' FROM DUAL UNION ALL
SELECT 5, 'S72\.[0-8]\w[2356789]\w' FROM DUAL UNION ALL
SELECT 6, 'S72\.9[1346789]\w\w' FROM DUAL UNION ALL
SELECT 7, 'S72\.9[2356789]\w\w' FROM DUAL;
Then you can use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (REGEXP_LIKE(t.icd10code, '^' || m.match || '$'))
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
Option 3:
Similar to the first option, but you can put the matches into a table and you can determine which match has been used:
SELECT t.id,
m.id AS match_id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
INNER JOIN matches m
ON (REGEXP_LIKE(t.icd10codes, '^(' || m.match || '(,|$))+$' ))
Options 2 & 3 both output:
ID
MATCH_ID
ICD10CODES
4
1
S32.509A
5
1
S32.301A,S32.821A
7
2
S32.421A
2
4
S72.211A
Option 4:
You can also get rid of the (slow) regular expressions and use LIKE if you store the matches as:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32.1___' FROM DUAL UNION ALL
SELECT 1, 'S32.2___' FROM DUAL UNION ALL
SELECT 1, 'S32.3___' FROM DUAL UNION ALL
SELECT 1, 'S32.5___' FROM DUAL UNION ALL
SELECT 1, 'S32.6___' FROM DUAL UNION ALL
SELECT 1, 'S32.7___' FROM DUAL UNION ALL
SELECT 1, 'S32.8___' FROM DUAL UNION ALL
SELECT 2, 'S32.4_1_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_4_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_2_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_5_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_1_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_4_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_2_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_5_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 6, 'S72.91__' FROM DUAL UNION ALL
SELECT 6, 'S72.93__' FROM DUAL UNION ALL
SELECT 6, 'S72.94__' FROM DUAL UNION ALL
SELECT 6, 'S72.96__' FROM DUAL UNION ALL
SELECT 6, 'S72.97__' FROM DUAL UNION ALL
SELECT 6, 'S72.98__' FROM DUAL UNION ALL
SELECT 6, 'S72.99__' FROM DUAL UNION ALL
SELECT 7, 'S72.92__' FROM DUAL UNION ALL
SELECT 7, 'S72.93__' FROM DUAL UNION ALL
SELECT 7, 'S72.95__' FROM DUAL UNION ALL
SELECT 7, 'S72.96__' FROM DUAL UNION ALL
SELECT 7, 'S72.97__' FROM DUAL UNION ALL
SELECT 7, 'S72.98__' FROM DUAL UNION ALL
SELECT 7, 'S72.99__' FROM DUAL;
Then use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (t.icd10code LIKE m.match)
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
db<>fiddle here
Ok, I've added your broken bones codes to the S32 and S72 series.
That's all that needed to be done really.
Feel free to change ,? to (,|$) but don't change anything else.
Let me know if the broken bones codes is about right.
S32\.\w[25]\w\w, S32.1\w\w\w, S32.2\w\w\w, S32.3\w\w\w, S32.5\w\w\w, S32.6\w\w\w, S32.7\w\w\w, S32.8\w\w\w
S32\.\w[25]\w\w, S32.4\w1\w, S32.4\w3\w, S32.4\w4\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S32\.\w[25]\w\w, S32.4\w2\w, S32.4\w3\w, S32.4\w5\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w1\w, S72.[0-8]\w3\w, S72.[0-8]\w4\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w2\w, S72.[0-8]\w3\w, S72.[0-8]\w5\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.[14]\w\w\w, S72.91\w\w, S72.93\w\w, S72.94\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
S72\.[14]\w\w\w, S72.92\w\w, S72.93\w\w, S72.95\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
The new regex is
^((S32\.(\w[25]|[1-35-8]\w)\w\w,?)+|(S32\.(\w[25]\w|4\w[1346-9])\w,?)+|(S32\.(\w[25]\w|4\w[235-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[1346-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[235-9])\w,?)+|(S72\.([14]\w|9[1346-9])\w\w,?)+|(S72\.([14]\w|9[235-9])\w\w,?)+)$
https://regex101.com/r/OAHdCO/1
^
(
( S32 \. ( \w [25] | [1-35-8] \w ) \w\w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [1346-9] ) \w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [235-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [1346-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [235-9] ) \w ,? )+
| ( S72 \. ( [14] \w | 9 [1346-9] ) \w\w ,? )+
| ( S72 \. ( [14] \w | 9 [235-9] ) \w\w ,? )+
)
$

Oracle REGEXP_SUBSTR will not match the dot character

I'm trying to extract information from strings like:
FOO-BAR-AUDIT-DATABASE.NUPKG
FOO.BAR.DATABASE-2.0.0.NUPKG
to info like:
'FOO.BAR.DATABASE' '2.0.0'
| |
module_name version
Currently I'm not able to parse correctly when the module_name part contains . chars. See table below.
The example below show how I extract the information.
The first group of the regexp is the one that do not work correctly '(.*?), the remaining groups handle the cases of varying version information.
select case module_name when expected then 'pass' else 'fail' end as test, y.* from(
select lower(regexp_substr(t.pck, g.regex, 1, 1, '', 1)) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 5)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 7)) as patch,
(t.pck) as package_name
from (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual) t
cross join (select '(.*?)(-(\d+)(\.(\d+))?(\.(\d+))?)?(\..*)' as regex from dual) g
)y;
The query above yields the following (Oracle 19c):
test
module_name
expected
major
minor
patch
package_name
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
fail
foo
foo.bar.database
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
I've tried use ([[:alnum:]._-]*?) as the first group, but it yield the same result. Switching to a greedy match matches too much.
Any good suggestions out there?
You can match from the end to get the version then extract the sub-string before the version to get the module name:
select case module_name when expected then 'pass' else 'fail' end as test,
y.*
from (
select lower(
substr(
t.pck,
1,
REGEXP_INSTR(t.pck, g.regex) - 1
)
) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 2)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 4)) as patch,
t.pck as package_name
from (
select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual
) t
cross join (
select '(-(\d+)\.?(\d+)?\.?(\d+)?)?\.[^.]+$' as regex from dual
) g
)y;
Outputs:
TEST
MODULE_NAME
EXPECTED
MAJOR
MINOR
PATCH
PACKAGE_NAME
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
pass
foo.bar.database
foo.bar.database
2
0
0
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
db<>fiddle here
Would this do? It isn't sophisticated, but - returns data you wanted (at least, I think so).
lines #1 - 8 - sample data
temp CTE: removes extension (.nupkg), for simplicity
final query:
line #18 is module name; if it contains numbers, then get substring up to the first digit. Otherwise, remove the whole PCT value
lines #20 - 22 return version: if there are no digits, return NULL. Otherwise, return substring from the first digit onwards
SQL> with
2 test as
3 (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
4 union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
5 union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
6 union select 'funk-database-1.nupkg', 'funk-database' from dual
7 union select 'funk-database-1.2.nupkg', 'funk-database' from dual
8 union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual),
9 temp as
10 -- remove extension
11 (select pck pck_old, expected,
12 replace(lower(pck), '.nupkg', '') pck
13 from test
14 )
15 select pck_old,
16 expected,
17 --
18 nvl(substr(pck, 1, regexp_instr(pck, '\d') - 2), pck) module_name,
19 --
20 case when regexp_instr(pck, '\d') = 0 then null
21 else substr(pck, regexp_instr(pck, '\d'))
22 end version
23 from temp;
PCK_OLD EXPECTED MODULE_NAME VERSION
------------------------------ ---------------------- ----------------------- --------
FOO-BAR-AUDIT-DATABASE.NUPKG foo-bar-audit-database foo-bar-audit-database
FOO.BAR.DATABASE-2.0.0.NUPKG foo.bar.database foo.bar.database 2.0.0
FUNKY_LOG_DATABASE-1.0.0.NUPKG funky_log_database funky_log_database 1.0.0
baz-database-1.0.1.nupkg baz-database baz-database 1.0.1
funk-database-1.2.nupkg funk-database funk-database 1.2
funk-database-1.nupkg funk-database funk-database 1
6 rows selected.
SQL>

there is a way to transpose two strings and have a table as result?

I have the next two strings:
String_Cod = 14521;65412;65845
String_Flags = 1;0;1
for code 14521 the flag is 1
for code 65412 the flag is 0
for code 65845 the flag is 1
in this order always
The result must be something like
I'm start with this query:
select regexp_substr(to_char(:STRING_COD),'[^;]+', 1, level)
from dual
connect BY regexp_substr(to_char(:STRING_COD), '[^;]+', 1, level)
is not null
select regexp_substr(to_char(:STRING_FLAGS),'[^;]+', 1, level)
from dual
connect BY regexp_substr(to_char(:STRING_FLAGS), '[^;]+', 1, level)
is not null
But i don't have an idea how continue to join both and get the result i need.
Can somebody give an advise?
Regards
You could add the level as another column in each query, and join them together:
select c.cod, f.flag
from (
select level as n, regexp_substr(to_char('14521;65412;65845'),'[^;]+', 1, level) as cod
from dual
connect BY regexp_substr(to_char('14521;65412;65845'), '[^;]+', 1, level)
is not null
) c
join (
select level as n, regexp_substr(to_char('1;0;1'),'[^;]+', 1, level) as flag
from dual
connect BY regexp_substr(to_char('1;0;1'), '[^;]+', 1, level)
is not null
) f
on f.n = c.n
which - with outer joins - would allow for different numbers of elements; or more simply as you suggest they will always match, use the same level for both extracts:
select regexp_substr(to_char('14521;65412;65845'),'[^;]+', 1, level) as cod,
regexp_substr(to_char('1;0;1'),'[^;]+', 1, level) as flag
from dual
connect BY regexp_substr(to_char('14521;65412;65845'), '[^;]+', 1, level)
is not null
COD | FLAG
:---- | :---
14521 | 1
65412 | 0
65845 | 1
db<>fiddle
This method of expanding a list of values also assumes you can never have null elements, in either list. Read more.

Extract Numbers from String - Custom

I'd like to extract "Most" numbers from a string and Add "JW" at the end.
My values look like:
RFID_DP_IDS339020JW3_IDMsg - Result = 339020JW
RFID_DP_IDSA72130JW_IDMsg --> 72130JW
RFID_DP_IDS337310JW1_IDMsg --> 337310JW
Basically I would remove all first letters, keep all numbers and JW
For now I had this
regexp_replace(Business_CONTEXT, '[^0-9]', '')||'JW' RegistrationPoint
But that would include the numbers AFTER 'JW'
Any idea?
How about this?
result would return exactly two letters after bunch of digits
result2 would return digits + JW
Pick the one you find the most appropriate.
SQL> with test (col) as
2 (select 'RFID_DP_IDS339020JW3_IDMsg' from dual union all
3 select 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
4 select 'RFID_DP_IDS337310JW1_IDMsg' from dual
5 )
6 select col,
7 regexp_substr(col, '\d+[[:alpha:]]{2}') result,
8 regexp_substr(col, '\d+JW') result2
9 from test;
COL RESULT RESULT2
-------------------------- -------------------------- --------------------------
RFID_DP_IDS339020JW3_IDMsg 339020JW 339020JW
RFID_DP_IDSA72130JW_IDMsg 72130JW 72130JW
RFID_DP_IDS337310JW1_IDMsg 337310JW 337310JW
SQL>
If you really want to extract the longest digit string out of your given strings you can use the following:
WITH test (Business_CONTEXT) AS
(SELECT 'RFID_DP_IDS339020JW3_I9DMsg' from dual union all
SELECT 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
SELECT 'RFID_DP_IDS337310JW1_IDMsg' from dual
)
SELECT Business_CONTEXT
, (SELECT MAX(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL))
KEEP (dense_rank last ORDER BY LENGTH(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL)))
FROM dual
CONNECT BY regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL) IS NOT NULL) num
FROM test
Result:
Business_CONTEXT | NUM
----------------------------+-----
RFID_DP_IDS339020JW3_I9DMsg | 339020
RFID_DP_IDSA72130JW_IDMsg | 72130
RFID_DP_IDS337310JW1_IDMsg | 337310

Google BigQuery - Execute dynamically generated queries from a select statement

Have a huge table in Google BigQuery with following structure (> 100 million rows):
name | departments
abc | 1,2,5,6
xyz | 4,5
pqr | 3,4,6
Want to convert the data into following format:
name | 1 | 2 | 3 | 4 | 5 | 6
abc | 1 | 1 | | | 1 | 1
xyz | | | | 1 | 1 |
pqr | | | 1 | 1 | | 1
As of now, able to generate the queries required to prepare the dataset in this format by using CONCAT and REGEX_REPLACE functions:
SELECT ' insert into dataset.output ( name, ' +
CONCAT(
'_' , replace(departments,',',',_') )
+ ' ) values( \'' + name +'\','+ REGEXP_REPLACE(departments, "([^,\n]+)", "1") +')'
FROM (
select name, departments from dataset.input )
This generates the output with the 100 M insert queries which can be used to create the data in the required structure.
However, now below are my questions:
Can we execute the output of this query (100 M insert queries) directly by using Big Query SQL or we would need to fire each insert one by one?
I believe there is no way to pivoting or transposing the data in a column with multiple comma separated values. Is that right?
Is there a more optimal way of achieving this using BigQuery SQL and not writing custom Java code?
Thanks.
Below example for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
with result as
Row name d1 d2 d3 d4 d5 d6
1 abc 1 1 0 0 1 1
2 xyz 0 0 0 1 1 0
3 pqr 0 0 1 1 0 1
So you need to run above with destination to whatever new table you prepared
Note, above assumes you have just 6 departments and most important there is no ambiguity in numbers like 1 does not conflict with 10 for example
If you do have such case - you need transform below lines
IF(departments LIKE '%2%', 1, 0) AS d2,
into
IF(CONCAT(',', departments, ',') LIKE '%,2,%', 1, 0) AS d2 ...
And of course, you can use just one simple INSERT statement
INSERT `project.dataset.new_table` (name, d1, d2, d3, d4, d5, d6)
SELECT
name,
IF(departments LIKE '%1%', 1, 0) AS d1,
IF(departments LIKE '%2%', 1, 0) AS d2,
IF(departments LIKE '%3%', 1, 0) AS d3,
IF(departments LIKE '%4%', 1, 0) AS d4,
IF(departments LIKE '%5%', 1, 0) AS d5,
IF(departments LIKE '%6%', 1, 0) AS d6
FROM `project.dataset.table`
So, the final point of all this is:
instead of generating INSERT STATEMENT for each and every row in original table - you should generate simple SELECT statement that does "pivoting"
Update for "extreme" minimizing generated code
See an example:
#standardSQL
CREATE TEMP FUNCTION c(departments STRING, department INT64) AS (
IF(departments LIKE CONCAT('%',CAST(department AS STRING),'%'), 1, 0)
);
WITH `project.dataset.table` AS (
SELECT 'abc' name, '1,2,5,6' departments UNION ALL
SELECT 'xyz', '4,5' UNION ALL
SELECT 'pqr', '3,4,6'
), temp AS (
SELECT name, departments AS d
FROM `project.dataset.table`
)
SELECT
name,
c(d,1)d1,
c(d,2)d2,
c(d,3)d3,
c(d,4)d4,
c(d,5)d5,
c(d,6)d6
FROM temp
as you can see - now each of your 10000 lines will be like c(d,N)dN, with max in length as c(d,10000)d10000, so you have chance to fit into query size limit