regexp_substr skips over empty positions - regex

With this code to return the nth value in a pipe delimited string...
regexp_substr(int_record.interfaceline, '[^|]+', 1, i)
it works fine when all values are present
Mike|Male|Yes|20000|Yes so the 3rd value is Yes (correct)
but if the string is
Mike|Male||20000|Yes, the 3rd value is 20000 (not what I want)
How can I tell the expression to not skip over the empty values?
TIA
Mike

The regexp_substr works this way:
If occurrence is greater than 1, then the database searches for the
second occurrence beginning with the first character following the
first occurrence of pattern, and so forth. This behavior is different
from the SUBSTR function, which begins its search for the second
occurrence at the second character of the first occurrence.
So the pattern [^|] will look for NON pipes, meaning it will skip consecutive pipes ("||") looking for a non-pipe char.
You might try:
select trim(regexp_substr(replace('A|test||string', '|', '| '), '[^|]+', 1, 4)) from dual;
This will replace a "|" with a "| " and allow you to match based on the pattern [^|]

I had a similar problem with a CSV file thus my separator was the semicolon (;)
So I started with an expression like the following one:
select regexp_substr(';2;;4;', '[^;]+', 1, i) from dual
letting i iterate from 1 to 5.
And of course it didn't work either.
To get the empty parts I just say they could be at the beginning (^;), or in the middle (;;) or at the end (;$). And or-ing all of this together gives:
select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, i) from dual
And believe me or not: testing for i from 1 to 5 it works!
But let's not forgot the last details: with this approach you get ; for fields that are empty originally.
The next lines are showing how to get rid of them easily replacing them by empty strings(nulls):
with stage1 as (
select regexp_substr(';2;;4;', '[^;]+|^;|;;|;$', 1, 2) as F from dual
)
select case when F like '%;' then '' else F end from stage1

OK. This should be the best solution for you.
SELECT
REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'^([^|]*\|){2}([^|]*).*$',
'\2' )
TEXT
FROM
DUAL;
So for your problem
SELECT
REGEXP_REPLACE ( INCOMINGSTREAMOFSTRINGS,
'^([^|]*\|){N-1}([^|]*).*$',
'\2' )
TEXT
FROM
DUAL;
--INCOMINGSTREAMOFSTRINGS is your complete string with delimiter
--You should pass n-1 to obtain nth position
ALTERNATE 2:
WITH T AS (SELECT 'Mike|Male||20000|Yes' X FROM DUAL)
SELECT
X,
REGEXP_REPLACE ( X,
'^([^|]*).*$',
'\1' )
Y1,
REGEXP_REPLACE ( X,
'^[^|]*\|([^|]*).*$',
'\1' )
Y2,
REGEXP_REPLACE ( X,
'^([^|]*\|){2}([^|]*).*$',
'\2' )
Y3,
REGEXP_REPLACE ( X,
'^([^|]*\|){3}([^|]*).*$',
'\2' )
Y4,
REGEXP_REPLACE ( X,
'^([^|]*\|){4}([^|]*).*$',
'\2' )
Y5
FROM
T;
ALTERNATE 3:
SELECT
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
1,
NULL,
2 )
AS FIRST,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
2,
NULL,
2 )
AS SECOND,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
3,
NULL,
2 )
AS THIRD,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
4,
NULL,
2 )
AS FOURTH,
REGEXP_SUBSTR ( REGEXP_REPLACE ( 'Mike|Male||20000|Yes',
'\|',
';' ),
'(^|;)([^;]*)',
1,
5,
NULL,
2 )
AS FIFTH
FROM
DUAL;

You can use the following :
with l as (select 'Mike|Male||20000|Yes' str from dual)
select regexp_substr(str,'(".*"|[^|]*)(\||$)',1,level,null,1)
from dual,l
where level=3/*use any position*/ connect by level <= regexp_count(str,'([^|]*)(\||$)')

As an complement to #tbone response...
Oddly, my oracle didn't recognize the blank space character in this list: [^|]
In this cases can be confusing and hard to realize what is going wrong.
Try with this regex ([^|]| )+. Also, to detect a posible first blank item, it is better to replace the separator with the space blank before, and not after it:
' |'
trim(regexp_substr(replace('A|test||string', '|', ' |'), '([^|]| )+', 1, 4))

Related

Oracle PLSQL regexp_substr separate comma separated string with double quotes

I've seen examples of how to separate comma-separated strings into rows like this:
select distinct id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
from tbl1
connect by regexp_substr(value, '[^,]+', 1, level) is not null
order by id, level;
but, my question is, how do I do this on double quote and comma delimited strings?
Ex: the above works for strings like "1,2,3,4,5,6,7", but what about "1","2","3","4,5","6,7,8","9" so that the rows end up like:
1
2
3
4,5
6,7,8
9
edit: I'm on Oracle 11.2.0.4, 11gR2.
There is a hack. Replace the pattern "," with # and use it in regular expression.It works like a charm.
Input String : "1","2","3","4,5","6,7,8","9","Ant,B","Gradle","E,F","G"(Can be number/Character doesn't matter)
with temp as (
select replace(replace('"1","2","3","4,5","6,7,8","9","Ant,B","Gradle","E,F","G"','","','#'),'"') Colmn from dual
)
SELECT trim(regexp_substr(str, '[^#]+', 1, level)) str
FROM (SELECT Colmn str FROM temp) t
CONNECT BY instr(str, '#', 1, level - 1) > 0
Output :
STR
1
2
3
4,5
6,7,8
9
Ant,B
Gradle
E,F
G
10 rows
Refer DBFiddle link for demo.
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=d09c326f614d10f5d3c407fdfd3a44c5
here is the another solution.it must be working all string with have number.
i used number values index as base.
with temp as (
select '"1","2","3",x"4,5","6,73,8","9"' Colmn from dual
)
SELECT regexp_substr(Colmn, '\d{1}', REGEXP_INSTR(Colmn, '\d{1}', REGEXP_INSTR(Colmn, '\d{1}') ,level),1 ) from temp
CONNECT BY REGEXP_COUNT (Colmn,'\d{1}')+1> level

Regular expression - Remove special characters except single white space

From stack overflow, I got the standard reg expression
to eliminate -
a) special characters
b) digits
c) more than 2 spaces to single space
to include -
d) - (hyphen)
e) ' (single quote)
SELECT ID, REGEXP_REPLACE(REGEXP_REPLACE(forenames, '[^A-Za-z-]', ' '),'\s{2,}',' ') , REGEXP_REPLACE(REGEXP_REPLACE(surname, '[^A-Za-z-]', ' '),'\s{2,}',' ') , forenames, surname from table1;
Instead of 2 functions how to get the result in single function?
to include '(single quote) \' is not working in regexp_replace.
Thanks.
Oracle Setup:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '123a45b£$- ''c45d#{e''' FROM DUAL
Query:
SELECT ID,
REGEXP_REPLACE(
value,
'[^a-zA-Z'' -]| +( )',
'\1'
)
FROM test_data
Output:
ID | REGEXP_REPLACE(VALUE,'[^A-ZA-Z''-]|+()','\1')
-: | :--------------------------------------------
1 | ab- 'cde'
db<>fiddle here

Regex - Don't match nulls after words

I'm splitting a string on pipes (|) and the regex [^|]* is working fine when there are missing fields, but it's matching null characters after words:
GARBAGE|||GA|30604
yields matches
GARBAGE, null, null, null, GA, null, 30604, null
I've also tried [^|]+ which yields matches
GARBAGE, GA, 30604
EDIT: What I want is GARBAGE, null, null, GA, 30604. And |||| would yield matches null, null, null, null, null.
I'm referencing matches by index, so how can I fix the regex so it matches field by field, including nulls if there is no other data in the field?
This is how split works. You should use a split type function.
There is always a bias split uses.
Your case is simple in that it splits on a single character, in normal cases a regex is not needed.
And in this case, using a regex, the bias cannot be achieved without lookahead/lookbehind.
# (?:^|(?<=\|))([^|]*)(?=\||$)
(?:
^ # BOS
| # or
(?<= \| ) # Pipe behind
)
( [^|]* ) # (1), Optional non-pipe chars
(?=
\| # Pipe ahead
| # or
$ # EOS
)
While not exactly what you want, perhaps you could turn it into rows and work with it that way:
select nvl(regexp_substr( str, '([^|]*)\|{0,1}', 1, level, 'i', 1 ), 'null') part
from ( select 'GARBAGE|||GA|30604' str from dual )
connect by level <= regexp_count( str, '\|' ) + 1;
Specify which row (field) by adding a where clause where level equals the row (field) you want:
select nvl(regexp_substr( str, '([^|]*)\|{0,1}', 1, level, 'i', 1 ), 'null') part
from ( select 'GARBAGE|||GA|30604' str from dual )
where level = 4
connect by level <= regexp_count( str, '\|' ) + 1;

Query to replace a string with matching pattern

I have a string like below.
'comp' as "COMPUTER",'ms' as "MOUSE" ,'keybr' as "KEYBOARD",'MONT' as "MONITOR",
Is it possible to write a query so that i will get the result as
'comp' ,'ms' ,'keybr' ,'MONT' ,
I can replace the string "as" with empty string using REPLACE query.
But how do I remove the string inside double quote?
Can anyone help me doing this?
Thanks in advance.
select replace(
regexp_replace('''comp'' as "COMPUTER"'
, '(".*")'
,null)
,' as '
,null)
from dual
The regex '(".*")' selects text whatever in double quotes.
EDIT:
the regex replaced the entire length of the matching pattern. So, we might need to tokenise the string first using comma as delimiter and apply the regex. Later join it.(LISTAGG)
WITH str_tab(str1, rn) AS
(SELECT regexp_substr(str, '[^,]+', 1, LEVEL), -- delimts
LEVEL
FROM (SELECT '''comp'' as "computer",''comp'' as "computer"' str
FROM dual) tab
CONNECT BY LEVEL <= LENGTH(str) - LENGTH(REPLACE(str, ',')) + 1)
SELECT listagg(replace(
regexp_replace(str1
, '(".*")'
,null)
,' as '
,null), ',') WITHIN GROUP (ORDER BY rn) AS new_text
FROM str_tab;
EDIT2:
A cleaner approach from #EatAPeach
with x(y) as (
select q'<'comp' as "COMPUTER",'ms' as "MOUSE" ,'keybr' as "KEYBOARD",'MONT' as "MONITOR",>'
from dual
)
select y,
regexp_replace(y, 'as ".*?"' ,null)
from x;

how to get out string oracle regex

I have the following string my trying get out the 1111111 and 33333333333 with out the |
character
SELECT regexp_substr('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|','*[|]*[|][0-9]*')FROM dual
Using REGEXP_REPLACE may be a bit simpler;
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){1}([^|]*).*$', '\2') FROM dual;
> 1111111
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){3}([^|]*).*$', '\2') FROM dual;
> 33333333333
You can choose column by choosing how many pipes to skip in the {1} part.
A simple SQLfiddle to test with.
A short explanation of the regexp;
([^|]+[|]){3} -- Matches 3 groups of {optional characters}{pipe}
(\d*) -- Matches the next digit group (the one we want)
.* -- Matches the rest of the expression
What we want is the second paranthesized group, that is, we replace the whole string by the back reference \2.
Because "|" separators always present it's simpler to extract fields with simple substring function rather than using regular expressions.
Just find positions of corresponding separators in source string and extract content between them:
with test_data as (
select
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' as s,
8 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
decode( field_number,
1,1,
instr(s,'|',1,field_number - 1) + 1
),
(
decode( instr(s,'|',1,field_number),
0, length(s)+ 1,
instr(s,'|',1,field_number)
)
-
decode( field_number,
1, 1,
instr(s,'|',1,field_number - 1) + 1
)
)
) as field_value
from
test_data
SQLFiddle
This variant works with empty fields, non-numeric fields and so on.
Possible simplification with appending additional separators to the start and the end of the string:
with test_data as (
select
(
'|' ||
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' ||
'|'
) as s, -- additional separators appended before and after original string
10 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
instr(s, '|', 1, field_number) + 1,
(
instr(s, '|', 1, field_number + 1)
-
(instr(s, '|', 1, field_number) + 1)
)
) as field_value
from
test_data
;
SQLFiddle