Oracle REGEXP_SUBSTR - SEMICOLON STRING EXTRACTION - regex

I have below input and need mentioned output. How I can get it. I tried different pattern but could not get through it.
so in brief, any value having all three 1#2#3 parts(if it is present) or first value should be returned
2#9#;2#37#65 -> 2#37#65
2#9#;2#37#65;2#37# -> 2#37#65
2#9#;2#37#65;2#37#;2#37#56 -> 2#37#65 or 2#37#56
2#37#65;2#99 -> 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 -> 3#37#65 or 3#37#36 or 2#37#56
2#37#;2#99# -> 2#37 or 2#99# ( in this case any value)
I tried few patterns and other pattern but no help.
regexp_substr('2#9#;2#37#65;2#37#','#[^;]+',1)
SUBSTR(REGEXP_SUBSTR(SUBSTR(uo_filiere,1,INSTR(uo_filiere,';',1)-1), '#[^#]+$'),2)

You can use a REGEXP_REPLACE here:
REGEXP_REPLACE(uo_filiere, '^(.*;)?([0-9]+(#[0-9]+){2,}).*|^([^;]+).*', '\2\4')
See the regexp demo
Details:
^ - start of string
(.*;)? - an optional Group 1 capturing any text and then a ;
([0-9]+(#[0-9]+){2,}) - Group 2 (\2): one or more digits, and then two or more occurrences of # followed with one or more digits
.* - the rest of the string
| - or
^([^;]+).* - start of string, Group 4 capturing one or more chars other than ; and then any text till end of string.
The replacement is Group 2 + Group 4 values.

Here is a simple-minded way to solve this. It may prove more efficient than other approaches, given the particular nature of the problem.
First, use a regular expression to find the first token that has all three parts. This part of the solution should be the most efficient approach for those strings that do have a three-part token, and it performs work that must be performed on all input strings in any case.
In the second part, wrap within nvl - if no three-part token is found, select the first token regardless of how many parts are present. This part uses only substr and instr in a trivial manner, so it should be very fast too.
Here's the query, run on a few more sample inputs to test those cases too.
with
sample_data (uo_filiere) as (
select '2#9#;2#37#65' from dual union all
select '2#9#;2#37#65;2#37#' from dual union all
select '2#9#;2#37#65;2#37#;2#37#56' from dual union all
select '2#37#65;2#99' from dual union all
select '3#9#;3#37#65;3#37#36;2#37#56' from dual union all
select '2#37#;2#99#' from dual union all
select '1#22#333' from dual union all
select '33#444#' from dual
)
select uo_filiere,
nvl(regexp_substr(uo_filiere, '(;|^)(([^;#]+#){2}[^;]+)', 1, 1, null, 2)
, substr(uo_filiere, 1, instr(uo_filiere || ';', ';') - 1)
) as first_value
from sample_data
;
UO_FILIERE FIRST_VALUE
---------------------------- ----------------------------
2#9#;2#37#65 2#37#65
2#9#;2#37#65;2#37# 2#37#65
2#9#;2#37#65;2#37#;2#37#56 2#37#65
2#37#65;2#99 2#37#65
3#9#;3#37#65;3#37#36;2#37#56 3#37#65
2#37#;2#99# 2#37#
1#22#333 1#22#333
33#444# 33#444#

Related

Oracle SQL String Manipulation

My field contains short codes that I want to access, such as C-COR3.
The issue is some records have additional information (F and H with numbers). An example is C-COR3 F1.54H19, I only care about C-COR3. Anything after "F" I want to ignore.
Code below works, but only if I hard-code the full F1.54H19. I want to use wildcards to abstract this for other occurrences that have F and H info in the field. (Ex C-R3 F0.18H18 -> C-R3 or C-COR3 F0.23H8.5 -> C-COR3), note varying short code string lengths.
/* Translates C-COR3 F1.54H19 to C-COR3. */
select distinct SUBSTR(lud_code_short,1,INSTR(lud_code_short, 'F1.54H19')-2)
from rep_dba.mytable
I've read that SUBSTR does not allow wildcards, but have had no luck trying my hand at REGEXP_INSTR and REGEX_SUBSTR instead. Any help appreciated.
Assuming that the "code" is always the first continuous sequence of non-space characters (and that there are no leading spaces - if there are, that's easy to handle), you could do something like this. Note the str || ' ' in the call to instr() - that takes care of the case when the input string has no spaces in it to begin with. Also notice the last input - since there are no spaces anywhere, the output is the same as the input. (Showing that if the "code" is not always separated from the "additional information" by at least one space, the solution would not work.)
with
test_data (str) as (
select 'C-COR3 F14H2.5' from dual union all
select 'C-AB3' from dual union all
select null from dual union all
select 'C-AB2F14H2.5' from dual
)
select str, substr(str, 1, instr(str || ' ', ' ') - 1) as code
from test_data
;
STR CODE
-------------- --------------
C-COR3 F14H2.5 C-COR3
C-AB3 C-AB3
C-AB2F14H2.5 C-AB2F14H2.5
Try using regexp_replace within your query like below
SELECT
regexp_replace('C-COR3 F14H2.5', '(C-[[:alnum:]]+) [FH].*', '\1')
FROM dual;

regex to find alphanumeric combination of number+text only no special

have to find fix pattern of length 4 alphanumeric in input string
i have tried numeric only and alnum but cant figure out how i would only limit to char+num and no other special character or Numeric by itself
WITH tab AS (
SELECT '''1234,4565,1212,7658''' AS str FROM dual UNION ALL
SELECT '''abce,dddd,jdjd,rdrd,dder''' AS str FROM dual UNION ALL
SELECT '''123m,d565,1dd2,7fur' AS str FROM dual UNION ALL
SELECT '''1m#4,4u#5,1212,abcd' AS str FROM dual UNION ALL
SELECT '''abcd,456a,d212,7658''' AS str FROM dual UNION ALL
SELECT '''1234,4565,1212'',7658''' AS str FROM dual
)
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^['']([[:alnum:]]{4},)+([[:alnum:]]{4})['']$')
AND NOT REGEXP_LIKE(t.str ,'^['']([[:digit:]]{4},)+([[:digit:]]{4})['']$')
Expected
abce,dddd,jdjd,rdrd,dder
123m,d565,1dd2,7fur
Not expected
1m#4,4u#5,1212,abcd' --since this one has only 'abcd' valid but not others
abcd,456a,d212,7658 --since this one has '7658' which is invalid but others are
1234,4565,1212 --all numeric should be ignored
A regular expression similar to this will capture what you have outlined in words:
^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$
SELECT * FROM tab WHERE REGEXP_LIKE(str, '^(([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]]),)*([[:alpha:]][[:alnum:]]{3}|[[:alnum:]][[:alpha:]][[:alnum:]]{2}|[[:alnum:]]{2}[[:alpha:]][[:alnum:]]|[[:alnum:]]{3}[[:alpha:]])$', 'i');
However I can't work out your use of single quotes in your example, so you'll need to modify this to handle your quotes.
I would recommend updating your question to be more clear about quotes.
Also note I'm not explicitly familiar with PLSQL - written with MySQL in mind.
All you need in the second REGEXP is ignore rows that have characters that are not alphanumeric (except comma) and number groups with a size equivalent to 4. This is necesary because Oracle does not support positive lookahead according to this web site.
The solution that I propose is...
SELECT * FROM tab t
WHERE REGEXP_LIKE(t.str ,'^(([[:alnum:]]{4}),)*([[:alnum:]]{4})$')
AND NOT REGEXP_LIKE(t.str ,'[^[:alnum:],]|[0-9]{4}');

Vertica REGEXP_SUBSTR use /g flag

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?
As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc.
Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.
The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:
WITH
-- one exemplary input row; concatenating substrings for
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
i -- the index from the in-line index table, needed
-- for ordering the outermost SELECT
, REGEXP_SUBSTR (
s -- the input string
, '(\w+)=' -- the pattern - a word followed by an equal sign; capture the word
, 1 -- start from pos 1
, i -- the i-th occurrence of the match
, '' -- no modifiers to regexp
, 1 -- the first and only sub-pattern captured
) AS token
FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
token
FROM find_token
WHERE token <> ''
ORDER BY i
;
The result will be one row per found pattern:
token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID
You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...
Happy playing ...
Marco

Find a string with or without space in oracle using like or regex

I have a string which contains specific 'winner code' which needs to be matched exactly but in the database some records contains spaces and extra characters within 'winners code' and if I use 'like operator' it only returns the matching criteria. I want to use one simplified query which can return all the records if it contains the winner code.Please find below my query and details
Winner code - أ4 ب3 ج10
Records with spaces - أ4 ب 3 ج 10
Records with extra character - (أ(4)
ب(3)
ج(10
My Query -
SELECT COLUMN_NAME,
FROM TABLE_NAME
WHERE
((COLUMN_NAME LIKE '%أ4%ب3%ج10%') or(COLUMN_NAME LIKE '%أ 4%ب 3%ج 10%'))
The above query returns with and without space data as its matching the criteria.
Thanks
If I correctly understand your need, you may try :
with test(str) as (
select '10X3Y4Z' from dual union all
select '10 X 3 Y 4 Z' from dual union all
select '(10)X(3)Y(4)Z' from dual union all
select '10#X3Y4 Z' from dual union all
select '10 # X3Y4Z' from dual )
select str
from test
where regexp_instr(str, '10[ |\)]{0,1}X[ |\(]{0,1}3[ |\)]{0,1}Y[ |\(]{0,1}4[ |\)]{0,1}Z') != 0
This matches your "winner code" ( I used different characters to simplify my test) even if the numbers are surrounded by '()' or a single space.
This can be re-written in a more compact way, but I believe this form is clear enough; it uses regular expressions like [ |\)]{0,1} to match a space or a parenthesis, with zero or one occurrence.

Remove substrings that vary in value in Oracle

I have a column in Oracle which can contain up to 5 separate values, each separated by a '|'. Any of the values can be present or missing. Here are come examples of how the data might look:
100-1
10-3|25-1|120/240
15-1|15-3|15-2|120/208
15-1|15-3|15-2|120/208|STA-2
112-123|120/208|STA-3
The values are arbitrary except for the order. The numerical values separated by dashes always come first. There can be 1 to 3 of these values present. The numerical values separated by a slash (if it is present) is next. The string, 'STA', and a numerical value separated by a dash is always last, if it is present.
What I would like to do is reformat this column to only ever include the first three possible values, those being the three numerical values separated by dashes. Afterwards, I want to replace 2nd numeric in each value (the numeric after the dash) using the following pattern:
1 = A
2 = B
3 = C
I would also like to remove the dash afterwards, but not the '|' that separates the values unless there is a trailing '|'.
To give you an idea, here's how the values at the beginning of the post would look after the reformatting:
100A
10C|25A
15A|15C|15B
15A|15C|15B
112ABC
I'm thinking this can be done with regex expressions but it's got me a little confused. Does anyone have a solution?
If I have to solve this problem I will solve it in following ways.
SELECT
REGEXP_REPLACE(column,'\|\d+\/\d+(\|STA-\d+)?',''),
REGEXP_REPLACE(column,'(\d+)-(1)([^\d])','\1A\3'),
REGEXP_REPLACE(column,'(\d+)-(2)([^\d])','\1B\3'),
REGEXP_REPLACE(column,'(\d+)-(3)([^\d])','\1C\3'),
REGEXP_REPLACE(column,'(\d+)-(123)([^\d])','\1ABC')
FROM table;
Explanation: Let us break down each REGEXP_REPLACE statement one by one.
REGEXP_REPLACE(column,'\|\d+\/\d+(\|STA-\d+)?','')
This will replace the end part like 120/208|STA-2 with empty string so that further processing is easy.
Finding match was easy but replacing A for 1, B for 2 and C for 3 was not possible ( as per my knowledge ) So I did those matching and replacements separately.
In each regex from second statement (\d+)-(yourNumber)([^\d]) first group is number before - then yourNumber is either 1,2,3 or 123 followed by |.
So the replacement will be according to yourNumber.
All demos here from version 1 to 5.
Note:- I have just done replacement for combination of yourNUmber for those present in question. You can do likewise for other combinations too.
you can do this in one line, but you can write simple function to do that
SELECT str, REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?','') cut
, REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4') rep3toC
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4') rep2toB
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4'), '(\-)([B,C]*)(1)([B,C]*)', '\1\2A\4') rep1toA
, REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(REGEXP_REPLACE(str,'(\|\d+\/\d+)?(\|STA-\d+)?',''), '(\-)([1,2]*)(3)([1,2]*)', '\1\2C\4'), '(\-)([1,C]*)(2)([1,C]*)', '\1\2B\4'), '(\-)([B,C]*)(1)([B,C]*)', '\1\2A\4'), '-', '') "rep-"
FROM (
SELECT '100-1' str FROM dual UNION
SELECT '10-3|25-1|120/240' str FROM dual UNION
SELECT '15-1|15-3|15-2|120/208' str FROM dual UNION
SELECT '15-1|15-3|15-2|120/208|STA-2' str FROM dual UNION
SELECT '112-123|120/208|STA-3' FROM dual
) tab