Extract Numbers from String - Custom

Extract Numbers from String - Custom - regex

I'd like to extract "Most" numbers from a string and Add "JW" at the end.
My values look like:
RFID_DP_IDS339020JW3_IDMsg - Result = 339020JW
RFID_DP_IDSA72130JW_IDMsg --> 72130JW
RFID_DP_IDS337310JW1_IDMsg --> 337310JW
Basically I would remove all first letters, keep all numbers and JW
For now I had this
regexp_replace(Business_CONTEXT, '[^0-9]', '')||'JW' RegistrationPoint
But that would include the numbers AFTER 'JW'
Any idea?

How about this?
result would return exactly two letters after bunch of digits
result2 would return digits + JW
Pick the one you find the most appropriate.
SQL> with test (col) as
2 (select 'RFID_DP_IDS339020JW3_IDMsg' from dual union all
3 select 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
4 select 'RFID_DP_IDS337310JW1_IDMsg' from dual
5 )
6 select col,
7 regexp_substr(col, '\d+[[:alpha:]]{2}') result,
8 regexp_substr(col, '\d+JW') result2
9 from test;
COL RESULT RESULT2
-------------------------- -------------------------- --------------------------
RFID_DP_IDS339020JW3_IDMsg 339020JW 339020JW
RFID_DP_IDSA72130JW_IDMsg 72130JW 72130JW
RFID_DP_IDS337310JW1_IDMsg 337310JW 337310JW
SQL>

If you really want to extract the longest digit string out of your given strings you can use the following:
WITH test (Business_CONTEXT) AS
(SELECT 'RFID_DP_IDS339020JW3_I9DMsg' from dual union all
SELECT 'RFID_DP_IDSA72130JW_IDMsg' from dual union all
SELECT 'RFID_DP_IDS337310JW1_IDMsg' from dual
)
SELECT Business_CONTEXT
, (SELECT MAX(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL))
KEEP (dense_rank last ORDER BY LENGTH(regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL)))
FROM dual
CONNECT BY regexp_substr(Business_CONTEXT, '\d+', 1, LEVEL) IS NOT NULL) num
FROM test
Result:
Business_CONTEXT | NUM
----------------------------+-----
RFID_DP_IDS339020JW3_I9DMsg | 339020
RFID_DP_IDSA72130JW_IDMsg | 72130
RFID_DP_IDS337310JW1_IDMsg | 337310

Related

Reg Exp don't select if more than one group matches (multiple XOR)

The data are held in an Oracle 12c database, one row per ICD-10-CM code, with a patient ID (foreign key) like so (note that there could be many other codes, the following are just the ones pertinent to this question):
ID ICD10CODE
1 S72.91XB
1 S72.92XB
2 S72.211A
3 S72.414A
3 S72.415A
4 S32.509A
5 S32.301A
5 S32.821A
6 S32.421A
6 S32.422A
7 S32.421A
8 S32.421A
8 S32.509A
The task at hand is to select distinct patients that match only one of the following points (using standard regular expression syntax):
Any number of: S32\.1\w\w\w, S32\.2\w\w\w, S32\.3\w\w\w, S32\.5\w\w\w, S32\.6\w\w\w, S32\.7\w\w\w, S32\.8\w\w\w
Any number of: S32\.4\w1\w, S32\.4\w3\w, S32\.4\w4\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S32\.4\w2\w, S32\.4\w3\w, S32\.4\w5\w, S32\.4\w6\w, S32\.4\w7\w, S32\.4\w8\w, S32\.4\w9\w
Any number of: S72\.[0-8]\w1\w, S72\.[0-8]\w3\w, S72\.[0-8]\w4\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.[0-8]\w2\w, S72\.[0-8]\w3\w, S72\.[0-8]\w5\w, S72\.[0-8]\w6\w, S72\.[0-8]\w7\w, S72\.[0-8]\w8\w, S72\.[0-8]\w9\w
Any number of: S72\.91\w\w, S72\.93\w\w, S72\.94\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any number of: S72\.92\w\w, S72\.93\w\w, S72\.95\w\w, S72\.96\w\w, S72\.97\w\w, S72\.98\w\w, S72\.99\w\w
Any permutation or combination (including repetitions) of codes listed within a bullet are permitted for each patient, but permutations or combinations across rows should occur mutually exclusively for a patient. My method is to apply LISTAGG on GROUP BY ID:
ID LISTAGG(ICD10CODE, ',')
1 S72.91XB,S72.92XB
2 S72.211A
3 S72.414A,S72.415A
4 S32.509A
5 S32.301A,S32.821A
6 S32.421A,S32.422A
7 S32.421A
8 S32.421A,S32.509A
Then filter using this regular expression, (S32\.(([1-3]|[5-8])|(4\w((1|4)|(2|5)|(3)|([5-9]))))\w+)|(S72\.(([0-8]\w((1|4)|(2|5)|(3)|([5-9])))|(9((1|4)|(2|5)|(3)|([5-9]))))\w+), which is almost a literal representation of the bullets above. My expression is adapted from the idea in this answer, where it seems that, ((RB\s+)+|(JJ\s+)+) automatically selects either "RB" or "JJ", but not both.
I cannot get it to work. The answer should contain only IDs 2, 4, 5, and 7. But, the expression I developed matches all IDs.
What is a solution to this problem?
[Edit] Some more information:
All these S codes above relate to injuries to the bones in the lower extremity: S32 is for fractures of the pelvis (hip bone), S72 is for fractures of the femur (thigh bone). Note that we have two femurs, and two acetabulum (socket of the pelvis where the femur connects). The S32.4 code denotes the acetabulum (the rest of the S32.[1235678]\w{3} series denotes other parts of the pelvis). Right and left femur and acetabulum are denoted by 1|4 or 2|5 in the 6th character, respectively, unless the code starts with S72.9 when those numbers appear in the 5th character.
The patients to be included in the study population should only have one of the bones broken. That means, one of the two femurs, one of the acetabulum, or the pelvis, but not a combination of them. Combinations of fractures of a single bone do not matter. For example, the right single femur can be broken in 10 different places and ways (the knee area, the middle shaft, the head, etc., each generating a different S72.\w[1|4]\w{2} code), and should still be selected.

Option 1:
You can do it with a single regular expression:
SELECT t.id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
WHERE REGEXP_LIKE(
t.icd10codes,
'^(S32\.[1235678]\w\w\w(,|$))+$'
|| '|^(S32\.4\w[1346789]\w(,|$))+$'
|| '|^(S32\.4\w[2356789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[1346789]\w(,|$))+$'
|| '|^(S72\.[0-8]\w[2356789]\w(,|$))+$'
|| '|^(S72\.9[1346789]\w\w(,|$))+$'
|| '|^(S72\.9[2356789]\w\w(,|$))+$'
)
Which, for your sample data:
CREATE TABLE table_name (ID, ICD10CODE) AS
SELECT 1, 'S72.91XB' FROM DUAL UNION ALL
SELECT 1, 'S72.92XB' FROM DUAL UNION ALL
SELECT 2, 'S72.211A' FROM DUAL UNION ALL
SELECT 3, 'S72.414A' FROM DUAL UNION ALL
SELECT 3, 'S72.415A' FROM DUAL UNION ALL
SELECT 4, 'S32.509A' FROM DUAL UNION ALL
SELECT 5, 'S32.301A' FROM DUAL UNION ALL
SELECT 5, 'S32.821A' FROM DUAL UNION ALL
SELECT 6, 'S32.421A' FROM DUAL UNION ALL
SELECT 6, 'S32.422A' FROM DUAL UNION ALL
SELECT 7, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.509A' FROM DUAL;
Outputs:
ID
ICD10CODES
2
S72.211A
4
S32.509A
5
S32.301A,S32.821A
7
S32.421A
Option 2:
You can put the regular expressions into a table:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32\.[1235678]\w\w\w' FROM DUAL UNION ALL
SELECT 2, 'S32\.4\w[1346789]\w' FROM DUAL UNION ALL
SELECT 3, 'S32\.4\w[2356789]\w' FROM DUAL UNION ALL
SELECT 4, 'S72\.[0-8]\w[1346789]\w' FROM DUAL UNION ALL
SELECT 5, 'S72\.[0-8]\w[2356789]\w' FROM DUAL UNION ALL
SELECT 6, 'S72\.9[1346789]\w\w' FROM DUAL UNION ALL
SELECT 7, 'S72\.9[2356789]\w\w' FROM DUAL;
Then you can use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (REGEXP_LIKE(t.icd10code, '^' || m.match || '$'))
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
Option 3:
Similar to the first option, but you can put the matches into a table and you can determine which match has been used:
SELECT t.id,
m.id AS match_id,
t.icd10codes
FROM ( SELECT id,
LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
AS icd10codes
FROM table_name
GROUP BY id
) t
INNER JOIN matches m
ON (REGEXP_LIKE(t.icd10codes, '^(' || m.match || '(,|$))+$' ))
Options 2 & 3 both output:
ID
MATCH_ID
ICD10CODES
4
1
S32.509A
5
1
S32.301A,S32.821A
7
2
S32.421A
2
4
S72.211A
Option 4:
You can also get rid of the (slow) regular expressions and use LIKE if you store the matches as:
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32.1___' FROM DUAL UNION ALL
SELECT 1, 'S32.2___' FROM DUAL UNION ALL
SELECT 1, 'S32.3___' FROM DUAL UNION ALL
SELECT 1, 'S32.5___' FROM DUAL UNION ALL
SELECT 1, 'S32.6___' FROM DUAL UNION ALL
SELECT 1, 'S32.7___' FROM DUAL UNION ALL
SELECT 1, 'S32.8___' FROM DUAL UNION ALL
SELECT 2, 'S32.4_1_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_4_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_2_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_5_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_1_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_4_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_2_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_5_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 6, 'S72.91__' FROM DUAL UNION ALL
SELECT 6, 'S72.93__' FROM DUAL UNION ALL
SELECT 6, 'S72.94__' FROM DUAL UNION ALL
SELECT 6, 'S72.96__' FROM DUAL UNION ALL
SELECT 6, 'S72.97__' FROM DUAL UNION ALL
SELECT 6, 'S72.98__' FROM DUAL UNION ALL
SELECT 6, 'S72.99__' FROM DUAL UNION ALL
SELECT 7, 'S72.92__' FROM DUAL UNION ALL
SELECT 7, 'S72.93__' FROM DUAL UNION ALL
SELECT 7, 'S72.95__' FROM DUAL UNION ALL
SELECT 7, 'S72.96__' FROM DUAL UNION ALL
SELECT 7, 'S72.97__' FROM DUAL UNION ALL
SELECT 7, 'S72.98__' FROM DUAL UNION ALL
SELECT 7, 'S72.99__' FROM DUAL;
Then use the query:
SELECT t.id,
m.id AS match_id,
LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
AS icd10codes
FROM table_name t
LEFT OUTER JOIN matches m
PARTITION BY (m.id)
ON (t.icd10code LIKE m.match)
GROUP BY
t.id,
m.id
HAVING
COUNT(m.match) = COUNT(t.id);
db<>fiddle here

Ok, I've added your broken bones codes to the S32 and S72 series.
That's all that needed to be done really.
Feel free to change ,? to (,|$) but don't change anything else.
Let me know if the broken bones codes is about right.
S32\.\w[25]\w\w, S32.1\w\w\w, S32.2\w\w\w, S32.3\w\w\w, S32.5\w\w\w, S32.6\w\w\w, S32.7\w\w\w, S32.8\w\w\w
S32\.\w[25]\w\w, S32.4\w1\w, S32.4\w3\w, S32.4\w4\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S32\.\w[25]\w\w, S32.4\w2\w, S32.4\w3\w, S32.4\w5\w, S32.4\w6\w, S32.4\w7\w, S32.4\w8\w, S32.4\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w1\w, S72.[0-8]\w3\w, S72.[0-8]\w4\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.\w[14]\w\w, S72.[0-8]\w2\w, S72.[0-8]\w3\w, S72.[0-8]\w5\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
S72\.[14]\w\w\w, S72.91\w\w, S72.93\w\w, S72.94\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
S72\.[14]\w\w\w, S72.92\w\w, S72.93\w\w, S72.95\w\w, S72.96\w\w, S72.97\w\w, S72.98\w\w, S72.99\w\w
The new regex is
^((S32\.(\w[25]|[1-35-8]\w)\w\w,?)+|(S32\.(\w[25]\w|4\w[1346-9])\w,?)+|(S32\.(\w[25]\w|4\w[235-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[1346-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[235-9])\w,?)+|(S72\.([14]\w|9[1346-9])\w\w,?)+|(S72\.([14]\w|9[235-9])\w\w,?)+)$
https://regex101.com/r/OAHdCO/1
^
(
( S32 \. ( \w [25] | [1-35-8] \w ) \w\w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [1346-9] ) \w ,? )+
| ( S32 \. ( \w [25] \w | 4 \w [235-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [1346-9] ) \w ,? )+
| ( S72 \. ( \w [14] \w | [0-8] \w [235-9] ) \w ,? )+
| ( S72 \. ( [14] \w | 9 [1346-9] ) \w\w ,? )+
| ( S72 \. ( [14] \w | 9 [235-9] ) \w\w ,? )+
)
$

Oracle REGEXP_SUBSTR will not match the dot character

I'm trying to extract information from strings like:
FOO-BAR-AUDIT-DATABASE.NUPKG
FOO.BAR.DATABASE-2.0.0.NUPKG
to info like:
'FOO.BAR.DATABASE' '2.0.0'
| |
module_name version
Currently I'm not able to parse correctly when the module_name part contains . chars. See table below.
The example below show how I extract the information.
The first group of the regexp is the one that do not work correctly '(.*?), the remaining groups handle the cases of varying version information.
select case module_name when expected then 'pass' else 'fail' end as test, y.* from(
select lower(regexp_substr(t.pck, g.regex, 1, 1, '', 1)) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 5)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 7)) as patch,
(t.pck) as package_name
from (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual) t
cross join (select '(.*?)(-(\d+)(\.(\d+))?(\.(\d+))?)?(\..*)' as regex from dual) g
)y;
The query above yields the following (Oracle 19c):
test
module_name
expected
major
minor
patch
package_name
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
fail
foo
foo.bar.database
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
I've tried use ([[:alnum:]._-]*?) as the first group, but it yield the same result. Switching to a greedy match matches too much.
Any good suggestions out there?

You can match from the end to get the version then extract the sub-string before the version to get the module name:
select case module_name when expected then 'pass' else 'fail' end as test,
y.*
from (
select lower(
substr(
t.pck,
1,
REGEXP_INSTR(t.pck, g.regex) - 1
)
) as module_name,
t.expected,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 2)) as major,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 3)) as minor,
to_number(regexp_substr(t.pck, g.regex, 1, 1, '', 4)) as patch,
t.pck as package_name
from (
select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
union select 'funk-database-1.nupkg', 'funk-database' from dual
union select 'funk-database-1.2.nupkg', 'funk-database' from dual
union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual
) t
cross join (
select '(-(\d+)\.?(\d+)?\.?(\d+)?)?\.[^.]+$' as regex from dual
) g
)y;
Outputs:
TEST
MODULE_NAME
EXPECTED
MAJOR
MINOR
PATCH
PACKAGE_NAME
pass
foo-bar-audit-database
foo-bar-audit-database
FOO-BAR-AUDIT-DATABASE.NUPKG
pass
foo.bar.database
foo.bar.database
2
0
0
FOO.BAR.DATABASE-2.0.0.NUPKG
pass
funky_log_database
funky_log_database
1
0
0
FUNKY_LOG_DATABASE-1.0.0.NUPKG
pass
baz-database
baz-database
1
0
1
baz-database-1.0.1.nupkg
pass
funk-database
funk-database
1
2
funk-database-1.2.nupkg
pass
funk-database
funk-database
1
funk-database-1.nupkg
db<>fiddle here

Would this do? It isn't sophisticated, but - returns data you wanted (at least, I think so).
lines #1 - 8 - sample data
temp CTE: removes extension (.nupkg), for simplicity
final query:
line #18 is module name; if it contains numbers, then get substring up to the first digit. Otherwise, remove the whole PCT value
lines #20 - 22 return version: if there are no digits, return NULL. Otherwise, return substring from the first digit onwards
SQL> with
2 test as
3 (select 'FUNKY_LOG_DATABASE-1.0.0.NUPKG' as pck, 'funky_log_database' as expected from dual
4 union select 'FOO.BAR.DATABASE-2.0.0.NUPKG', 'foo.bar.database' from dual
5 union select 'FOO-BAR-AUDIT-DATABASE.NUPKG', 'foo-bar-audit-database' from dual
6 union select 'funk-database-1.nupkg', 'funk-database' from dual
7 union select 'funk-database-1.2.nupkg', 'funk-database' from dual
8 union select 'baz-database-1.0.1.nupkg', 'baz-database' from dual),
9 temp as
10 -- remove extension
11 (select pck pck_old, expected,
12 replace(lower(pck), '.nupkg', '') pck
13 from test
14 )
15 select pck_old,
16 expected,
17 --
18 nvl(substr(pck, 1, regexp_instr(pck, '\d') - 2), pck) module_name,
19 --
20 case when regexp_instr(pck, '\d') = 0 then null
21 else substr(pck, regexp_instr(pck, '\d'))
22 end version
23 from temp;
PCK_OLD EXPECTED MODULE_NAME VERSION
------------------------------ ---------------------- ----------------------- --------
FOO-BAR-AUDIT-DATABASE.NUPKG foo-bar-audit-database foo-bar-audit-database
FOO.BAR.DATABASE-2.0.0.NUPKG foo.bar.database foo.bar.database 2.0.0
FUNKY_LOG_DATABASE-1.0.0.NUPKG funky_log_database funky_log_database 1.0.0
baz-database-1.0.1.nupkg baz-database baz-database 1.0.1
funk-database-1.2.nupkg funk-database funk-database 1.2
funk-database-1.nupkg funk-database funk-database 1
6 rows selected.
SQL>

extract all numbers in a string

How can I extract all numbers in a string?
Sample inputs:
7nr-6p
12c-18L
12nr-24L
11nr-12p
Expected Outputs:
{7,6}
{12,18}
{12,24}
etc...
The following is tested with the first one, 7nr-6p:
select regexp_split_to_array('7nr-6p', '[^0-9]') AS new_volume from mytable;
Gives: {7,"","",6,""} // Why is a numeric-only match returning spaces?
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
select regexp_matches('7nr-6p', '\d'::text) from mytable;
Gives: {7}
select NULLIF(regexp_replace('7nr-6p', '\D',',','g'), '')::text from mytable;
Gives: 7,,,6,

The following query:
select regexp_split_to_array(regexp_replace('7nr-6p', '^[^0-9]*|[^0-9]*$', 'g'), '[^0-9]+')
AS new_volume from mytable;
"Trims" the prefix and suffix non-numbers and splits by the remaining non-numbers.
select regexp_matches('7nr-6p', '[0-9]*'::text) from mytable;
Gives: {7} // Why isn't this continuing?
Because without the 'g' flag, the regex stops at the first match.
Add the 'g' flag:
select regexp_matches('7nr-6p', '[0-9]*'::text, 'g') from mytable;

You can replace all text and then split:
SELECT regexp_split_to_array(
regexp_replace('7nr-6p', '[a-zA-Z]', '','g'),
'[^0-9]'
)
This returns {7,6}

SELECT id, (regexp_matches(string, '\d+', 'g'))[1]::int AS nr
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nr
----+----
1 | 7
1 | 6
2 | 12
2 | 18
3 | 12
3 | 24
4 | 11
4 | 12
I wanted them in a single cell so I could extract them as needed
SELECT id, trim(regexp_replace(string, '\D+', ',', 'g'), ',') AS nrs
FROM (
VALUES
(1, '7nr-6p')
, (2, '12c-18L')
, (3, '12nr-24L')
, (4, '11nr-12p')
) tbl(id, string);
Result:
id | nrs
----+-------
1 | 7,6
2 | 12,18
3 | 12,24
4 | 11,12
dbfiddle here

Here is a more robust solution
CREATE OR REPLACE FUNCTION get_ints_from_text(TEXT) RETURNS int[] AS $$
select array_remove(regexp_split_to_array($1,'[^0-9]+','i'),'')::int[];
$$ LANGUAGE SQL IMMUTABLE;
Example
select get_ints_from_text('7nr-6p'); -- 7,6
-- also resilient in situations like
select get_ints_from_text('-7nr--6p'); -- 7,6
Here is a link to try
http://sqlfiddle.com/#!17/c6ac7/2
I feel that wrapping this functionality into an immutable function is prudent. This is a pure function, one that will not mutate data and one that returns the same result given the same input. Immutable functions marked as "immutable" have performance benefits.
By using a function we also benefit from abstraction. There is one source to update should this functionality need to improve in the future.
For more information about immutable functions see
https://www.postgresql.org/docs/10/static/sql-createfunction.html

regular expression to solve for the following

Example 1
asdk[wovkd'vk'psacxu5=205478499|205477661zamd;amd;a;d
Example 2
sadlmdlmdadsldu5=205478499|205477661|234567899amsd/samdamd
u5 can have multiple values separated by |
How can I capture all u5 values from a long string I have?

Below is for BigQuery Standard SQL
#standardSQL
WITH data AS (
SELECT 1 AS id, "asdk[wovkd'vk'psacxu5=205478499|205477661zamd;amd;a;d" AS junk UNION ALL
SELECT 2, "sadlmdlmdadsldu5=205478499|205477661|234567899amsd/samdamd"
)
SELECT id, SPLIT(REGEXP_EXTRACT(junk, r'(?i)u5=([\d|]*)'), '|') AS value
FROM data
with output as below
id value
1 205478499
205477661
2 205478499
205477661
234567899

ORACLE:SQL REGEXP_SUBSTR that returns the column value after 3rd semicolon and before the pipe whose value starts with D

ORACLE:SQL REGEXP_SUBSTR that returns the column value after 3rd semicolon and before the pipe whose value starts with D
example:
column value: 'D:5:testps:12345|blah blah/blah'
expected value: 12345
regex that would filter values which start with D and returns value after 3rd semicolon and before pipe

select column_value,
regexp_substr(column_value, '([^:]*:){3}([^|]*)\|', 1, 1, null, 2) as str
from (
select 'D:5:testps:12345|blah blah/blah' as column_value from dual union all
select 'XD:5:testps:12345|blahblah/blah' as column_value from dual
)
where column_value like 'D%'
;
COLUMN_VALUE STR
------------------------------- -----
D:5:testps:12345|blah blah/blah 12345

You can use regex_replace:
select case when col like 'D%'
then regexp_replace(col, '^([^:]*:){3}(\d+)\|.*', '\2') num
end
from t;
Produces:
12345
it produces null if the col doesn't start with D

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extract Numbers from String - Custom - regex

Related

Reg Exp don't select if more than one group matches (multiple XOR)

Oracle REGEXP_SUBSTR will not match the dot character

extract all numbers in a string

regular expression to solve for the following

ORACLE:SQL REGEXP_SUBSTR that returns the column value after 3rd semicolon and before the pipe whose value starts with D

Categories

Resources