how to find consecutive repetitive characters in oracle column - regex

Is there any way to find consecutive repetitive characters like 1414, 200200 in a varchar column of an oracle table.
how can we achieve it with regexp ?
Im failing to achieve it with regexp
im my example i can get a consecutive repetition of a number but not a pattern
select regexp_substr('4120066' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- getting output as expected
select regexp_substr('6360360' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- i want to select this also as i have 360 followed by 360

You should be able to use something like this:
[...] WHERE REGEXP_LIKE(field, '(\d+?)\1')
If you're looking for any repetition of characters, or:
[...] WHERE REGEXP_LIKE(field, '^(\d+?)\1$')
If you want to check the whole string in the field.
\d+? will match digits.
( ... ) will store those digits.
\1 refers to the captured digits.
Note: Change to \d to . if you are not checking digits only.

Try this :
SQL> WITH t AS
2 (SELECT '1414,200200,11,12,33,33,1234,1234' test_string
3 FROM DUAL)
4 SELECT LTRIM (SYS_CONNECT_BY_PATH (test_string, ','), ',') names
5 FROM (SELECT ROW_NUMBER () OVER (ORDER BY test_string) rno, test_string
6 FROM (SELECT DISTINCT REGEXP_SUBSTR (test_string,
7 '[^,]+',
8 1,
9 LEVEL
10 ) test_string
11 FROM t
12 CONNECT BY LEVEL <=
13 LENGTH (test_string)
14 - LENGTH (REPLACE (test_string,
15 ',',
16 NULL
17 )
18 )
19 + 1))
20 WHERE CONNECT_BY_ISLEAF = 1 AND ROWNUM = 1
21 CONNECT BY rno = PRIOR rno + 1;
NAMES
--------------------------------------------------------------------------------
11,12,1234,1414,200200,33
None of the delimited string repeat !

Related

Extract Date Along with Am or pm in oracle

I want to get the time pattern along with AM or PM from the given string Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv
I tried the following:
Select regexp_substr(filename,'\d{4}',1,3)
From
(Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' filename from dual);
which only gives me the last number, e.g. 0329, but I need 0329PM.
Using this form of REGEXP_SUBSTR() will get what you need in one call. It returns the first group, which is the set of characters after the last underscore and before the literal period of 1 or more numbers followed by an A or P then an M.
with tbl(filename) as (
Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv'
from dual
)
select regexp_substr(filename, '_(\d+[AP]M)\.', 1, 1, NULL, 1)
From tbl;
Actually, to tighten up the match you could make it case-insensitive and add the extension:
select regexp_substr(filename, '_(\d+[AP]M)\.csv', 1, 1, 'i', 1)
From tbl;
Note if a match is not found NULL will be returned.
Nested substr is one option (if data always looks like this; you didn't say it doesn't):
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select substr(substr(col, -10), 1, 6) result from test
4 /
RESULT
------
0329PM
SQL>
the inner substr returns the last 10 characters (0329PM.csv)
the outer substr returns the first 6 characters out of it (0329PM)
Or, using regular expressions:
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select regexp_substr(translate(col, '_.', ' '), '\S+',
4 1,
5 regexp_count(translate(col, '_.', ' '), '\S+') - 1
6 ) result
7 from test;
RESULT
------
0329PM
SQL>
line #3: translate replaces underlines and dots with a space
line #4: start from the beginning
line #5: return substring which is one before the last one

Simpler regular expression for oracle regexp_replace

I have a | separated string with 20 |s like 123|1|42|13||94123|2983191|2|98863|...|211| upto 20 |. This is a oracle db column. The string is just 20 numbers followed by |.
I am trying to get a string out from it where I remove the numbers at position 4,6,8,9,11,12 and 13. Also, need to move the number at position 16 to position 4. Till now, I have got a regex like
select regexp_replace(col1, '^((\d*\|){4})(\d*\|)(\d*\|)(\d*\|)(\d*\|)((\d*\|){2})(\d*\|)((\d*\|){3})((\d*\|){2})(\d*\|)(.*)$', '\1|\4|\6||\9||||||||') as cc from table
This is where I get stuck since oracle only supports backreference upto 9 groups. Is there any way to make this regex simpler so it has lesser groups and can be fit into the replace? Any alternative solutions/suggestions are also welcome.
Note - The position counter begins at 0, so 123 in above string is the 0th number.
Edit: Example -
Source string
|||14444|10107|227931|10115||10118||11361|11485||10110||11512|16666|||
Expected result
|||16666|10107||10115||||||11512||||
You can get the result you want by removing capture groups for the numbers you are removing from the string anyway, and writing (for example) ((\d*\|){2}) as (\d*\|\d*\|). This reduces the number of capture groups to 7, allowing your code to work as is:
select regexp_replace(col1,
'^(\d*\|\d*\|\d*\|\d*\|)\d*\|(\d*\|)\d*\|(\d*\|)\d*\|\d*\|(\d*\|)\d*\|\d*\|\d*\|(\d*\|\d*\|)(\d*\|)(.*)$',
'\1\6\2|\3||\4|||\5|\7') as cc
from table
Output (for your test data and also #Littlefoot good column example):
CC
|||14444|16666|227931|||||11361|||||11512|||||
0|1|2|3|16|5||7|||10||||14|15||17|18|19|
Demo on dbfiddle
As there's unique column (ID, as you said), see if this helps:
split every column into rows
compose them back (using listagg) which uses 2 CASEs:
one to remove values you don't want
another to properly sort them ("put value at position 16 to position 4")
Note that my result differs from yours; if I counted it correctly, 16666 isn't at position 16 but 17 so - 11512 has to be moved to position 4.
I also added another dummy row which is here to confirm whether I counted positions correctly, and to show why you have to use lines #10-12 (because of duplicates).
OK, here you are:
SQL> with test (id, col) as
2 (
3 select 1, '|||14444|10107|227931|10115||10118||11361|11485||10110||11512|16666|||' from dual union all
4 select 2, '1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20' from dual
5 ),
6 temp as
7 (select replace(regexp_substr(col, '(.*?)(\||$)', 1, column_value), '|', '') val,
8 column_value lvl,
9 id
10 from test cross join table(cast(multiset(select level from dual
11 connect by level <= regexp_count(col, '\|') + 1
12 ) as sys.odcinumberlist))
13 )
14 select id,
15 listagg(case when lvl in (4, 6, 8, 9, 11, 12, 13) then '|'
16 else val || case when lvl = 20 then '' else '|' end
17 end, '')
18 within group (order by case when lvl = 16 then 4
19 when lvl = 4 then 16
20 else lvl
21 end) result
22 from temp
23 group by id;
ID RESULT
---------- ------------------------------------------------------------
1 |||11512|10107||10115|||||||10110|||16666|||
2 1|2|3|16|5||7|||10||||14|15||17|18|19|20
SQL>

I am trying to use REGEXP_REPLACE to replace/remove the first hyphenand 3 subsequent characters

I am trying to use REGEXP_REPLACE to replace/remove the first hyphen and 3 subsequent characters. Input table is not of fixed length and varies. I am trying to come up with a working express that will
5F9B9C7F-ABC-40F4
CODE-AXF 2014 CODE
ADSHLHSALK
Expected results results should be
5F9B9C7F-ABC-40F4 ==> 5F9B9C7F-40F4
CODE-AXF-2014 CODE ==> CODE- 2014 CODE
ADSHLHSALK ==> ADSHLHSALK
Query:
SELECT text, column
REGEXP_REPLACE( text,'[-]',NULL )
FROM TABLE
where column= '5';
You may use
REGEXP_REPLACE('5F9B9C7F-ABC-40F4','^([^-]*)-.{3}','\1')
If you mean 3 letters, then replace . with [a-zA-Z].
Details
^ - start of string
([^-]*) - Group 1: any 0+ chars other than -
- - a hyphen
.{3} - any 3 chars (or [a-zA-Z]{3} will match 3 ASCII letters).
See an online demo printing
You do not need regular expressions:
SELECT text,
column,
SUBSTR( text, 1, pos - 1 ) || SUBSTR( text, pos + 4 )
FROM (
SELECT text,
column,
INSTR( text, '-' ) AS pos
FROM TABLE
WHERE column = '5'
);

Regex to find 9 to 11 digit integer occuring anywhere closest to a keyword

In simple term, what I am looking for is this If there is a string, which has a keyword ZTFN00, then the regex shall be able to return the closest 9 to 11 digit number to the left or right side of the string.
I want to do this in REGEXP_REPLACE function of oracle.
Below are some of the sample strings:
The following error occurred in the SAP UPDATE_BP service as part of the combine:
(error:653, R11:186:Number 867278489 Already Exists for ID Type ZTFN00)
Expected result: 867278489
The following error occurred in the SAP UPDATE_BP service as part of the combine
(error:653, R11:186:Number ZTFN00 identification number 123456778 already exist)
Expected result: 123456778
I could not find a way to easily do this with regular expressions, but if you want to do the task without PL/SQL, you can do something like the following.
It's a little bit tricky, combining many calls to regexp functions to evaluate, for each occurrence of digit string, the distance from your keyword and then pick the nearest one.
with test(string, keyWord) as
( select
'(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
'ZTFN00'
from dual)
select numberString
from (
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
from (
select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
instr( string, keyWord) as keyWordPosition,
length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
length(keyWord) as keyWordLength
from test
connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0
)
order by distance
)where rownum = 1
Looking at the single parts:
SQL> with test(string, keyWord) as
2 ( select
3 '(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
4 'ZTFN00'
5 from dual)
6 select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
7 instr( string, keyWord) as keyWordPosition,
8 length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
9 regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
10 length(keyWord) as keyWordLength
11 from test
12 connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0;
NUMBERPOSITION KEYWORDPOSITION NUMBERLENGTH NUMBERSTRING KEYWORDLENGTH
-------------- --------------- ------------ ---------------- -------------
22 77 9 999999999 6
39 77 10 0000000000 6
91 77 11 11111111111 6
108 77 9 222222222 6
This scans all the string, and iterates while insrt (...) != 0, that is while there are occurrences; the level is used to look for the first, second, ... occurrence, so that row 1 gives the first occurrence, row two the second and so on, while exists the nth occurrence.
This part is only used to evaluate some useful fields, tha we use to look both to the right and to the left of you keyword, exactly evaluating the distance between the string number and the keyword:
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
The inner query is ordered by distance, so that the first row contains the nearest string; that's why in the outermost query we only extract the row with
rownum = 1 to get the nearest row.
It can be re-written in a more compact way, but this is a bit more readable.
This should even work when you have multiple occurrences of the digit string, even on both sides of your keyword.
This regex works for me in RegexBuddy with Oracle mode selected (10g, 11g and 12c):
SELECT REGEXP_SUBSTR(mycolumn,
'\(error:[0-9]+,[ ]+
(
(
([0-9]{9,11})()
|
ZTFN00()
|
[^ ),]+
)
[ ),]+
)+
\4\5',
1, 1, 'cx', 3) FROM mytable;
The regex treats the main body of the string as a series of tokens matching the general pattern [^ ),]+ (one or more of any characters except space, right parenthesis, or comma). But there are two specific tokens that it tries to match first: the keyword (ZTFN00) and a valid ID number ([0-9]{9,11}).
The empty groups at the end of the first two alternatives serve as check boxes; the corresponding backreferences at the end (\4 and \5) will only succeed if those groups participated in the match, meaning both an ID number and the keyword were seen.
(This is an obscure "feature" that definitely doesn't work in many flavors, so I can't be positive it will work in Oracle. Please let me know if it doesn't.)
The ID number is captured in group #3, and that's what the REGEXP_SUBSTR command returns. (Since you only want to retrieve the number, there no call for REGEXP_REPLACE.)

Use regexp_instr to get the last number in a string

If I used the following expression, the result should be 1.
regexp_instr('500 Oracle Parkway, Redwood Shores, CA','[[:digit:]]')
Is there a way to make this look for the last number in the string? If I were to look for the last number in the above example, it should return 3.
If you were using 11g, you could use regexp_count to determine the number of times that a pattern exists in the string and feed that into the regexp_instr
regexp_instr( str,
'[[:digit:]]',
1,
regexp_count( str, '[[:digit:]]')
)
Since you're on 10g, however, the simplest option is probably to reverse the string and subtract the position that is found from the length of the string
length(str) - regexp_instr(reverse(str),'[[:digit:]]') + 1
Both approaches should work in 11g
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select '500 Oracle Parkway, Redwood Shores, CA' str
3 from dual
4 )
5 select length(str) - regexp_instr(reverse(str),'[[:digit:]]') + 1,
6 regexp_instr( str,
7 '[[:digit:]]',
8 1,
9 regexp_count( str, '[[:digit:]]')
10 )
11* from x
SQL> /
LENGTH(STR)-REGEXP_INSTR(REVERSE(STR),'[[:DIGIT:]]')+1
------------------------------------------------------
REGEXP_INSTR(STR,'[[:DIGIT:]]',1,REGEXP_COUNT(STR,'[[:DIGIT:]]'))
-----------------------------------------------------------------
3
3
Another solution with less effort is
SELECT regexp_instr('500 Oracle Parkway, Redwood Shores, CA','[^[:digit:]]*$')-1
FROM dual;
this can be read as.. find the non-digits at the end of the string. and subtract 1. which will give the position of the last digit in the string..
REGEXP_INSTR('500ORACLEPARKWAY,REDWOODSHORES,CA','[^[:DIGIT:]]*$')-1
--------------------------------------------------------------------
3
which i think is what you want.
(tested on 11g)