Oracle 11g: Substring between nth & mth occurrance - regex

I have to SELECT a substring from a table between nth and mth occurrences of a special character (say -).
For eg: If column data is 'a-b-c-d-e-f-g-h', n is 2 & m is 5, my select statement should return: 'c-d-e'
I tried various regex combinations but I think '\K' cannot be used.
Please help.

You might want to try using instr like this:
with PARAM as (select 'a-b-c-d-e-f-g-h' as S,
'-' as D, 2 as N, 5 as M from dual)
select substr(substr(S, instr(S, D, 1, N) + 1),
1, instr(S, D, 1, M - N) - 1) as RANGE
from PARAM;
The with statement is just there to make the expression clearer.
S is the input string. D is the delimiter.
If you want to use regular expressions then you might try this:
with PARAM as (select 'a-b-c-d-e-f-g-h' as S,
'-' as D, 2 as N, 5 as M from dual)
select -- ^([^D]*D){N}([^D]*(D[^D]*){M-N-1}).*$
regexp_replace(S, '^([^'||D||']*'||D||'){'||N||'}([^'||D||']*('||D||'[^'||D||']*){'||(M-N-1)||'}).*$', '\2') as RANGE
from PARAM;
The regular expression starts with skipping N groups of text and delimiters (([^D]*D){N}) and then forms a group \2 of one text followed by M-N-1 groups of text and delimiters (([^D]*(D[^D]*){M-N-1}).
In the regex approach, delimiters with special meaning in regular expressions have to be quoted.

You can try the below approach:
Select substr('a-b-c-d-e-f-g-h',instr('a-b-c-d-e-f-g-h','-',1,2)+1,5) as occurence from dual;

Related

Oracle PLSQL regexp_substr separate comma separated string with double quotes

I've seen examples of how to separate comma-separated strings into rows like this:
select distinct id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
from tbl1
connect by regexp_substr(value, '[^,]+', 1, level) is not null
order by id, level;
but, my question is, how do I do this on double quote and comma delimited strings?
Ex: the above works for strings like "1,2,3,4,5,6,7", but what about "1","2","3","4,5","6,7,8","9" so that the rows end up like:
1
2
3
4,5
6,7,8
9
edit: I'm on Oracle 11.2.0.4, 11gR2.
There is a hack. Replace the pattern "," with # and use it in regular expression.It works like a charm.
Input String : "1","2","3","4,5","6,7,8","9","Ant,B","Gradle","E,F","G"(Can be number/Character doesn't matter)
with temp as (
select replace(replace('"1","2","3","4,5","6,7,8","9","Ant,B","Gradle","E,F","G"','","','#'),'"') Colmn from dual
)
SELECT trim(regexp_substr(str, '[^#]+', 1, level)) str
FROM (SELECT Colmn str FROM temp) t
CONNECT BY instr(str, '#', 1, level - 1) > 0
Output :
STR
1
2
3
4,5
6,7,8
9
Ant,B
Gradle
E,F
G
10 rows
Refer DBFiddle link for demo.
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=d09c326f614d10f5d3c407fdfd3a44c5
here is the another solution.it must be working all string with have number.
i used number values index as base.
with temp as (
select '"1","2","3",x"4,5","6,73,8","9"' Colmn from dual
)
SELECT regexp_substr(Colmn, '\d{1}', REGEXP_INSTR(Colmn, '\d{1}', REGEXP_INSTR(Colmn, '\d{1}') ,level),1 ) from temp
CONNECT BY REGEXP_COUNT (Colmn,'\d{1}')+1> level

How can I split by commas while ignoring any comma that's inside quotes? [duplicate]

This question already has answers here:
Split a string by commas but ignore commas within double-quotes using Javascript
(17 answers)
Java: splitting a comma-separated string but ignoring commas in quotes
(12 answers)
Javascript: Splitting a string by comma but ignoring commas in quotes
(3 answers)
Closed 3 years ago.
I have a Typescript file that takes a csv file and splits it using the following code:
var cells = rows[i].split(",");
I now need to fix this so that any comma that's inside quotes does not result in a split. For example, The,"quick, brown fox", jumped should split into The, quick, brown fox, and jumped instead of also splitting quick and brown fox. What is the proper way to do this?
Update:
I think the final version in a line should be:
var cells = (rows[i] + ',').split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/).slice(1).reduce((a, b) => (a.length > 0 && a[a.length - 1].length < 4) ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]] : [...a, [b]], []).map(e => e.reduce((a, b) => a !== undefined ? a : b, undefined))
or put it more beautifully:
var cells = (rows[i] + ',')
.split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/)
.slice(1)
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
;
This is rather long, but still looks purely functional. Let me explain it:
First, the regular expression part. Basically, a segment you want may fall into 3 possibilities:
*?([^",]+?) *?,, which is a string without " or , surrounded with spaces, followed by a ,.
" *?(.+?)" *?,, which is a string, surrounded with a pair of quotes and an indefinite number of spaces beyond the quotes, followed by a ,.
( *?),, which is an indefinite number of spaces, followed by a ','.
So splitting by a non-capturing group of a union of these three will basically get us to the answer.
Recall that when splitting with a regular expression, the resulting array consists of:
Strings separated by the separator (the regular expression)
All the capturing groups in the separator
In our case, the separators fill the whole string, so the strings separated are all empty strings, except that last desired part, which is left out because there is no , following it. Thus the resulting array should be like:
An empty string
Three strings, representing the three capturing groups of the first separator matched
An empty string
Three strings, representing the three capturing groups of the second separator matched
...
An empty string
The last desired part, left alone
So why simply adding a , at the end so that we can get a perfect pattern? This is how (rows[i] + ',') comes about.
In this case the resulting array becomes capturing groups separated by empty strings. Removing the first empty string, they will appear in a group of 4 as [ 1st capturing group, 2nd capturing group, 3rd capturing group, empty string ].
What the reduce block does is exactly grouping them into groups of 4:
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
And finally, find the first non-undefined elements (an unmatched capturing group will appear as undefined. Our three patterns are exclusive in that any 2 of them cannot be matched simultaneously. So there is exactly 1 such element in each group) in each group which are precisely the desired parts:
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
This completes the solution.
I think the following should suffice:
var cells = rows[i].split(/([^",]+?|".+?") *, */).filter(e => e)
or if you don't want the quotes:
var cells = rows[i].split(/(?:([^",]+?)|"(.+?)") *, */).filter(e => e)

Filter a string using regular expression

I tried the following code. However, the result is not what I want.
$strLine = "100.11 Q9"
$sortString = StringRegExp ($strLine,'([0-9\.]{1,7})', $STR_REGEXPARRAYMATCH)
MsgBox(0, "", $sortString[0],2)
The output shows 100.11, but I want 100.11 9. How could I display it this way using a regular expression?
$sPattern = "([0-9\.]+)\sQ(\d+)"
$strLine = "100.11 Q9"
$sortString = StringRegExpReplace($strLine, $sPattern, '\1 \2')
MsgBox(0, "$sortString", $sortString, 2)
$strLine = "100.11 Q9"
$sortString = StringRegExp($strLine, $sPattern, 3); array of global matches.
For $i1 = 0 To UBound($sortString) -1
MsgBox(0, "$sortString[" & $i1 & "]", $sortString[$i1], 2)
Next
The pattern is to get the 2 groups being 100.11 and 9.
The pattern will 1st match the group with any digit and dot until it reach
/s which will match the space. It will then match the Q. The 2nd group
matches any remaining digits.
StringRegExpReplace replaces the whole string with 1st and 2nd groups
separated with a space.
StringRegExp get the 2 groups as 2 array elements.
Choose 1 from the 2 types regexp above of which you prefer.

A little misunderstanding about this regex pattern

Let H be column 1, E be column 2, L column 3, P 4
I understand where the H comes from.
I also see how the L works.
But I am a bit confused on E and P.
If we look horizontally, the regex HE|LL|0+ only matches {HE, LL, 0 (1 or more times)}
The regex EP|IP|EF matches {EP, IP, EF}
How is it that the string E matches both of these conditions?
Similarly with [PLEASE], which matches {P, L, E, A, S, E} (any combination of these letters), only matches with EP from the vertical regex, then why is there just a P?
Am I reading this incorrectly? This was taken from regexcrossword
I think you misunderstand the nature of the crossword.
The string HE matches HE|LL|O+
The string LP matches [PLEASE]+
The string HL matches [^SPEAK]+
The string EP matches EP|IF|EF
Each row and column matches its regex, so the solution is valid.
Like, the following statement doesn't make sense...
How is it that the string E matches both of these conditions?
There is no string E. There are two strings, HE and EP.

how to get out string oracle regex

I have the following string my trying get out the 1111111 and 33333333333 with out the |
character
SELECT regexp_substr('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|','*[|]*[|][0-9]*')FROM dual
Using REGEXP_REPLACE may be a bit simpler;
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){1}([^|]*).*$', '\2') FROM dual;
> 1111111
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){3}([^|]*).*$', '\2') FROM dual;
> 33333333333
You can choose column by choosing how many pipes to skip in the {1} part.
A simple SQLfiddle to test with.
A short explanation of the regexp;
([^|]+[|]){3} -- Matches 3 groups of {optional characters}{pipe}
(\d*) -- Matches the next digit group (the one we want)
.* -- Matches the rest of the expression
What we want is the second paranthesized group, that is, we replace the whole string by the back reference \2.
Because "|" separators always present it's simpler to extract fields with simple substring function rather than using regular expressions.
Just find positions of corresponding separators in source string and extract content between them:
with test_data as (
select
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' as s,
8 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
decode( field_number,
1,1,
instr(s,'|',1,field_number - 1) + 1
),
(
decode( instr(s,'|',1,field_number),
0, length(s)+ 1,
instr(s,'|',1,field_number)
)
-
decode( field_number,
1, 1,
instr(s,'|',1,field_number - 1) + 1
)
)
) as field_value
from
test_data
SQLFiddle
This variant works with empty fields, non-numeric fields and so on.
Possible simplification with appending additional separators to the start and the end of the string:
with test_data as (
select
(
'|' ||
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' ||
'|'
) as s, -- additional separators appended before and after original string
10 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
instr(s, '|', 1, field_number) + 1,
(
instr(s, '|', 1, field_number + 1)
-
(instr(s, '|', 1, field_number) + 1)
)
) as field_value
from
test_data
;
SQLFiddle