Extract Date Along with Am or pm in oracle - regex

I want to get the time pattern along with AM or PM from the given string Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv
I tried the following:
Select regexp_substr(filename,'\d{4}',1,3)
From
(Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' filename from dual);
which only gives me the last number, e.g. 0329, but I need 0329PM.

Using this form of REGEXP_SUBSTR() will get what you need in one call. It returns the first group, which is the set of characters after the last underscore and before the literal period of 1 or more numbers followed by an A or P then an M.
with tbl(filename) as (
Select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv'
from dual
)
select regexp_substr(filename, '_(\d+[AP]M)\.', 1, 1, NULL, 1)
From tbl;
Actually, to tighten up the match you could make it case-insensitive and add the extension:
select regexp_substr(filename, '_(\d+[AP]M)\.csv', 1, 1, 'i', 1)
From tbl;
Note if a match is not found NULL will be returned.

Nested substr is one option (if data always looks like this; you didn't say it doesn't):
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select substr(substr(col, -10), 1, 6) result from test
4 /
RESULT
------
0329PM
SQL>
the inner substr returns the last 10 characters (0329PM.csv)
the outer substr returns the first 6 characters out of it (0329PM)
Or, using regular expressions:
SQL> with test (col) as
2 (select 'Aaaaa_gggg_ne_A030_66788_Abcd_Oct_24_0329PM.csv' from dual)
3 select regexp_substr(translate(col, '_.', ' '), '\S+',
4 1,
5 regexp_count(translate(col, '_.', ' '), '\S+') - 1
6 ) result
7 from test;
RESULT
------
0329PM
SQL>
line #3: translate replaces underlines and dots with a space
line #4: start from the beginning
line #5: return substring which is one before the last one

Related

Regex to find 9 to 11 digit integer occuring anywhere closest to a keyword

In simple term, what I am looking for is this If there is a string, which has a keyword ZTFN00, then the regex shall be able to return the closest 9 to 11 digit number to the left or right side of the string.
I want to do this in REGEXP_REPLACE function of oracle.
Below are some of the sample strings:
The following error occurred in the SAP UPDATE_BP service as part of the combine:
(error:653, R11:186:Number 867278489 Already Exists for ID Type ZTFN00)
Expected result: 867278489
The following error occurred in the SAP UPDATE_BP service as part of the combine
(error:653, R11:186:Number ZTFN00 identification number 123456778 already exist)
Expected result: 123456778
I could not find a way to easily do this with regular expressions, but if you want to do the task without PL/SQL, you can do something like the following.
It's a little bit tricky, combining many calls to regexp functions to evaluate, for each occurrence of digit string, the distance from your keyword and then pick the nearest one.
with test(string, keyWord) as
( select
'(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
'ZTFN00'
from dual)
select numberString
from (
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
from (
select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
instr( string, keyWord) as keyWordPosition,
length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
length(keyWord) as keyWordLength
from test
connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0
)
order by distance
)where rownum = 1
Looking at the single parts:
SQL> with test(string, keyWord) as
2 ( select
3 '(error:653, R11:186: 999999999 Number 0000000000 Already Exists for ID Type ZTFN00 hjhk 11111111111 kjh k222222222)',
4 'ZTFN00'
5 from dual)
6 select regexp_instr(string, '[0-9]{9,11}', 1, level) as numberPosition,
7 instr( string, keyWord) as keyWordPosition,
8 length(regexp_substr(string, '[0-9]{9,11}', 1, level)) as numberLength,
9 regexp_substr(string, '[0-9]{9,11}', 1, level) as numberString,
10 length(keyWord) as keyWordLength
11 from test
12 connect by regexp_instr(string, '[0-9]{9,11}', 1, level) != 0;
NUMBERPOSITION KEYWORDPOSITION NUMBERLENGTH NUMBERSTRING KEYWORDLENGTH
-------------- --------------- ------------ ---------------- -------------
22 77 9 999999999 6
39 77 10 0000000000 6
91 77 11 11111111111 6
108 77 9 222222222 6
This scans all the string, and iterates while insrt (...) != 0, that is while there are occurrences; the level is used to look for the first, second, ... occurrence, so that row 1 gives the first occurrence, row two the second and so on, while exists the nth occurrence.
This part is only used to evaluate some useful fields, tha we use to look both to the right and to the left of you keyword, exactly evaluating the distance between the string number and the keyword:
select numberString,
decode (greatest (numberPosition, keyWordPosition),
keyWordPosition,
keyWordPosition - numberPosition - numberLength,
numberPosition,
numberPosition - keyWordPosition - keyWordLength
) as distance
The inner query is ordered by distance, so that the first row contains the nearest string; that's why in the outermost query we only extract the row with
rownum = 1 to get the nearest row.
It can be re-written in a more compact way, but this is a bit more readable.
This should even work when you have multiple occurrences of the digit string, even on both sides of your keyword.
This regex works for me in RegexBuddy with Oracle mode selected (10g, 11g and 12c):
SELECT REGEXP_SUBSTR(mycolumn,
'\(error:[0-9]+,[ ]+
(
(
([0-9]{9,11})()
|
ZTFN00()
|
[^ ),]+
)
[ ),]+
)+
\4\5',
1, 1, 'cx', 3) FROM mytable;
The regex treats the main body of the string as a series of tokens matching the general pattern [^ ),]+ (one or more of any characters except space, right parenthesis, or comma). But there are two specific tokens that it tries to match first: the keyword (ZTFN00) and a valid ID number ([0-9]{9,11}).
The empty groups at the end of the first two alternatives serve as check boxes; the corresponding backreferences at the end (\4 and \5) will only succeed if those groups participated in the match, meaning both an ID number and the keyword were seen.
(This is an obscure "feature" that definitely doesn't work in many flavors, so I can't be positive it will work in Oracle. Please let me know if it doesn't.)
The ID number is captured in group #3, and that's what the REGEXP_SUBSTR command returns. (Since you only want to retrieve the number, there no call for REGEXP_REPLACE.)

Oracle REGEXP_REPLACE replace space in middle with empty string

I'm trying to use the Oracle REGEXP_REPLACE function to replace a whitespace (which is in the middle of a string) with an empty string.
One of my columns contains strings like the following one.
[alphanumeric][space][digits][space][alpha] (eg. R4SX 315 GFX)
Now, I need to replace ONLY the second whitespace (the whitespace after the digits) with an empty string (i.e. R4SX 315 GFX --> R4SX 315GFX)
To achieve this, I tried the following code:
SELECT REGEXP_REPLACE(
'R4SX 315 GFX',
'([:alphanum:])\s(\d)\s([:alpha:])',
'\1 \2\3') "REPLACED"
FROM dual;
However, the result that I get is the same as my input (i.e. R4SX 315 GFX).
Can someone please tell me what I have done wrong and please point me in the right direction.
Thanks in advance.
[:alphanum:]
alphanum is incorrrect. The alphanumeric character class is [[:alnum:]].
You could use the following pattern in the REGEXP_REPLACE:
([[:alnum:]]{4})([[:space:]]{1})([[:digit:]]{3})([[:space:]]{1})([[:alpha:]]{3})
Using REGEXP
SQL> SELECT REGEXP_REPLACE('R4SX 315 GFX',
2 '([[:alnum:]]{4})([[:space:]]{1})([[:digit:]]{3})([[:space:]]{1})([[:alpha:]]{3})',
3 '\1\2\3\5')
4 FROM DUAL;
REGEXP_REPL
-----------
R4SX 315GFX
SQL>
If you are not sure about the number of characters in each expression of the pattern, then you could do:
SQL> SELECT REGEXP_REPLACE('R4SX 315 GFX',
2 '([[:alnum:]]+[[:blank:]]+[[:digit:]]+)[[:blank:]]+([[:alpha:]]+)',
3 '\1\2')
4 FROM dual;
REGEXP_REPL
-----------
R4SX 315GFX
SQL>
Using SUBSTR and INSTR
The same could be done with substr and instr which wouldbe less resource consuming than regexp.
SQL> WITH DATA AS
2 ( SELECT 'R4SX 315 GFX' str FROM DUAL
3 )
4 SELECT SUBSTR(str, 1, instr(str, ' ', 1, 2) -1)
5 ||SUBSTR(str, instr(str, ' ', 1, 2) +1, LENGTH(str)-instr(str, ' ', 1, 2)) new_str
6 FROM DATA;
NEW_STR
-----------
R4SX 315GFX
SQL>
Your regex contains an invalid class alphanum. Also, these classes must be used inside character classes [...]. Instead of \s, you need to use a supported [:blank:] class. More details on the regex syntax in MySQL can be found here.
I recommend using
SELECT REGEXP_REPLACE(
'R4SX 315 GFX',
'([[:alnum:]]+[[:blank:]]+[[:digit:]]+)[[:blank:]]+([[:alpha:]]+)'
, '\1\2') "REGEXP_REPLACE"
FROM dual;
This way you will use just 2 capturing groups. The less we have the better is for performance. Here you can see more details on REGEXP_REPLACE function.

how to find consecutive repetitive characters in oracle column

Is there any way to find consecutive repetitive characters like 1414, 200200 in a varchar column of an oracle table.
how can we achieve it with regexp ?
Im failing to achieve it with regexp
im my example i can get a consecutive repetition of a number but not a pattern
select regexp_substr('4120066' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- getting output as expected
select regexp_substr('6360360' ,'([[:alnum:]])\1', 7,1,'i') from dual; -- i want to select this also as i have 360 followed by 360
You should be able to use something like this:
[...] WHERE REGEXP_LIKE(field, '(\d+?)\1')
If you're looking for any repetition of characters, or:
[...] WHERE REGEXP_LIKE(field, '^(\d+?)\1$')
If you want to check the whole string in the field.
\d+? will match digits.
( ... ) will store those digits.
\1 refers to the captured digits.
Note: Change to \d to . if you are not checking digits only.
Try this :
SQL> WITH t AS
2 (SELECT '1414,200200,11,12,33,33,1234,1234' test_string
3 FROM DUAL)
4 SELECT LTRIM (SYS_CONNECT_BY_PATH (test_string, ','), ',') names
5 FROM (SELECT ROW_NUMBER () OVER (ORDER BY test_string) rno, test_string
6 FROM (SELECT DISTINCT REGEXP_SUBSTR (test_string,
7 '[^,]+',
8 1,
9 LEVEL
10 ) test_string
11 FROM t
12 CONNECT BY LEVEL <=
13 LENGTH (test_string)
14 - LENGTH (REPLACE (test_string,
15 ',',
16 NULL
17 )
18 )
19 + 1))
20 WHERE CONNECT_BY_ISLEAF = 1 AND ROWNUM = 1
21 CONNECT BY rno = PRIOR rno + 1;
NAMES
--------------------------------------------------------------------------------
11,12,1234,1414,200200,33
None of the delimited string repeat !

REGEXP_SUBSTR desired output

SELECT REGEXP_SUBSTR('one,two,three',',[^,]+') AS reg_result FROM DUAL;
REG_RESULT
,two
SELECT REGEXP_SUBSTR('eight,nineteen,five',',[^,]+') AS reg_result FROM DUAL;
REG_RESULT
,nineteen
I have to remove the " , " from the result. Also I want the last string
as the output. i.e. three from 'one,two,three' & five from 'eight,nineteen,five'.
How can I do it??
If want to just get the last word without checking if your string meets particular pattern:
SQL> with t1 as(
2 select 'one,two,three' as str from dual
3 )
4 select regexp_substr(str, '([[:alpha:]]+)$') last_word
5 from t1
6 ;
LAST_WORD
---------
three
Response to the comment
how to get string two from the first one & nineteen from the second??
Fourth parameter of the regexp_substr function is the occurrence of a pattern. So to get the second word in the string we can use regexp_substr(str, '[^,]+', 1, 2)
SQL> with t1 as(
2 select 'one,two,three' as str from dual
3 )
4 select regexp_substr(str, '[^,]+', 1, 2) as Second_Word
5 from t1;
Second_Word
---------
two
If there is a need to extract every word from your strings:
-- sample of data from your question
SQL> with t1 as(
2 select 'one,two,three' as str from dual union all
3 select 'eight,nineteen,five' from dual
4 ), -- occurrences of the pattern
5 occurrence as(
6 select level as ps
7 from ( select max(regexp_count(str, '[^,]+')) mx
8 from t1
9 ) s
10 connect by level <= s.mx
11 ) -- the query
12 select str
13 , regexp_substr(str, '[^,]+', 1, o.ps) word
14 , o.ps as word_num
15 from t1 t
16 cross join occurrence o
17 order by str
18 ;
STR WORD WORD_NUM
------------------- ----------- ----------
eight,nineteen,five eight 1
eight,nineteen,five nineteen 2
eight,nineteen,five five 3
one,two,three three 3
one,two,three one 1
one,two,three two 2
6 rows selected
SELECT REGEXP_SUBSTR(REGEXP_SUBSTR('one,two,three',',[^,]+$'),'[^,]+') AS reg_result FROM DUAL;
I'm not certain if Oracle has lookbehinds, but you can try this as well:
SELECT REGEXP_SUBSTR('one,two,three','(?<=,)[^,]+$') AS reg_result FROM DUAL;

Use regexp_instr to get the last number in a string

If I used the following expression, the result should be 1.
regexp_instr('500 Oracle Parkway, Redwood Shores, CA','[[:digit:]]')
Is there a way to make this look for the last number in the string? If I were to look for the last number in the above example, it should return 3.
If you were using 11g, you could use regexp_count to determine the number of times that a pattern exists in the string and feed that into the regexp_instr
regexp_instr( str,
'[[:digit:]]',
1,
regexp_count( str, '[[:digit:]]')
)
Since you're on 10g, however, the simplest option is probably to reverse the string and subtract the position that is found from the length of the string
length(str) - regexp_instr(reverse(str),'[[:digit:]]') + 1
Both approaches should work in 11g
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select '500 Oracle Parkway, Redwood Shores, CA' str
3 from dual
4 )
5 select length(str) - regexp_instr(reverse(str),'[[:digit:]]') + 1,
6 regexp_instr( str,
7 '[[:digit:]]',
8 1,
9 regexp_count( str, '[[:digit:]]')
10 )
11* from x
SQL> /
LENGTH(STR)-REGEXP_INSTR(REVERSE(STR),'[[:DIGIT:]]')+1
------------------------------------------------------
REGEXP_INSTR(STR,'[[:DIGIT:]]',1,REGEXP_COUNT(STR,'[[:DIGIT:]]'))
-----------------------------------------------------------------
3
3
Another solution with less effort is
SELECT regexp_instr('500 Oracle Parkway, Redwood Shores, CA','[^[:digit:]]*$')-1
FROM dual;
this can be read as.. find the non-digits at the end of the string. and subtract 1. which will give the position of the last digit in the string..
REGEXP_INSTR('500ORACLEPARKWAY,REDWOODSHORES,CA','[^[:DIGIT:]]*$')-1
--------------------------------------------------------------------
3
which i think is what you want.
(tested on 11g)