I am trying to extract a date after a specific string (we will call it IMP for now in a text field. It may appear upper/lower case and show as IMP 1/1/10 or IMP happened on 1/1/10 or IMP-1/1/10
So for example code below-
SELECT
REGEXP_SUBSTR('abc 3/4/16 blah blah IMP 3/7/16',
'(\d{1,2}/\d{1,2}/\d{2,4})') "REGEXP_SUBSTR" from dual
Will get the first date but not the one I want-
I have tried
'(IMP) (.|(a-z){1-10}) (\d{1,2}/\d{1,2}/\d{2,4})'
and other permutations.
SELECT
REGEXP_SUBSTR('abc 3/4/16 blah blah IMP 3/7/16',
'(\d{1,2}/\d{1,2}/\d{2,4})') "REGEXP_SUBSTR" from dual
If I include the (IMP) (.|(a-z){1-10}) I get null results, if I just use the
'(\d{1,2}/\d{1,2}/\d{2,4})') I get the first date that appears
Something like this?
SQL> with test (id, col) as
2 (select 1, 'abc 3/4/16 blah blah IMP 3/7/16' from dual union all
3 select 2, 'abc 3/4/16 blah blah 3/7/16 imp 2/8/15 xxx cc2' from dual union all
4 select 3, 'xxx 3/5/18 ccdd 234 imp happened on 5/8/19 some 23f' from dual union all
5 select 4, '3/10/18 bla bla imp-3/9/17 xfe 334 3/4/13 x' from dual
6 )
7 select id,
8 regexp_substr(substr(col, instr(lower(col), 'imp ') + 4), '\d+/\d+/\d+') result
9 from test;
ID RESULT
---------- --------------------
1 3/7/16
2 2/8/15
3 5/8/19
4 3/9/17
SQL>
Related
I want to find ascii strings in oracle query which have symbols more than chr(127)
I see a lot of suggestions that '['||chr(128)||'-'||chr(255)||']' must work, but it doesn't
so next must return OK, but it doesn't
select 'OK' as result from dual where regexp_like('why Ä ?', '['||chr(128)||'-'||chr(255)||']')
and next must not return OK, but it does
select 'OK' as result from dual where regexp_like('why - ?', '['||chr(128)||'-'||chr(255)||']')
UPD: Sorry, capital A umlaut in my case is \xC4 (ISO 8859 Latin 1) , but here it turns into unicode chr(50052)
How about a different approach? Split string into characters and check whether maximum value is higher than 127.
For example:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select substr(col, level, 1) one_character,
4 ascii(substr(col, level, 1)) ascii_of_one_character
5 from test
6 connect by level <= length(col);
ONE_ ASCII_OF_ONE_CHARACTER
---- ----------------------
w 119
h 104
y 121
32
Ä 50621 --> here it is!
32
? 63
7 rows selected.
SQL>
Now, move it into a subquery and fetch the result:
SQL> with test (col) as
2 (select 'why Ä ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
OK
Or:
SQL> with test (col) as
2 (select 'why - ?' from dual)
3 select case when max(ascii_of_one_character) > 127 then 'OK'
4 else 'Not OK'
5 end result
6 from (select substr(col, level, 1) one_character,
7 ascii(substr(col, level, 1)) ascii_of_one_character
8 from test
9 connect by level <= length(col)
10 );
RESULT
------
Not OK
Millions of rows? Well, even for two rows queries I posted wouldn't work properly. Switch to
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when max(ascii_of_one_character) > 127 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 substr(col, column_value, 1) one_character,
11 ascii(substr(col, column_value, 1)) ascii_of_one_character
12 from test cross join table(cast(multiset(select level from dual
13 connect by level <= length(col)
14 ) as sys.odcinumberlist))
15 )
16 group by col;
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>
How will it behave? I don't know, try it and tell us. Note that for large data sets regular expressions might actually be slower than a simple substr option.
Yet another option: how about TRANSLATE? You don't have to split anything in that case. For example:
SQL> with test (col) as
2 (select 'why - ?' from dual union all
3 select 'why Ä ?' from dual
4 )
5 select col,
6 case when nvl(length(res), 0) > 0 then 'OK'
7 else 'Not OK'
8 end result
9 from (select col,
10 translate
11 (col,
12 '!"#$%&''()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ',
13 '!') res
14 from test
15 );
COL RESULT
-------- ------
why - ? Not OK
why Ä ? OK
SQL>
There is also another approach:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
xmlcast(
xmlquery(
'count(string-to-codepoints(.)[. > 127])'
passing t.str
returning content)
as int) cnt_over_127
from t;
Results:
STR RES CNT_OVER_127
---------- ------ ------------
why Ä ? Ok 1
why - ? Not ok 0
why - ? Ä Ok 1
why Not ok 0
As you can see I've used xmlquery() with string-to-codepoints xpath function, then filtered out codepoints >127 and returned their count().
Also you can use dump or utl_raw.cast_to_raw() functions, but it's a bit more complex and I'm a bit lazy to write full solutions using them.
But just small draft:
with t(str) as (
select 'why Ä ?' from dual union all
select 'why - ?' from dual union all
select 'why - ? Ä' from dual union all
select 'why' from dual
)
select
str,
case
when regexp_like(str, '[^'||chr(1)||'-'||chr(127)||']')
then 'Ok'
else 'Not ok'
end as res,
dump(str,1016) dmp,
dump(str,1015) dmp,
utl_raw.cast_to_raw(str) as_row,
regexp_count(dump(str,1016)||',', '[89a-f][0-9a-f],') xs
from t;
Results:
STR RES DMP DMP AS_ROW XS
---------- ------ ------------------------------------------------------------------- ----------------------------------------------------------------------- -------------------- --
why Ä ? Ok Typ=1 Len=8 CharacterSet=AL32UTF8: 77,68,79,20,c3,84,20,3f Typ=1 Len=8 CharacterSet=AL32UTF8: 119,104,121,32,195,132,32,63 77687920C384203F 2
why - ? Not ok Typ=1 Len=7 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f Typ=1 Len=7 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63 776879202D203F 0
why - ? Ä Ok Typ=1 Len=10 CharacterSet=AL32UTF8: 77,68,79,20,2d,20,3f,20,c3,84 Typ=1 Len=10 CharacterSet=AL32UTF8: 119,104,121,32,45,32,63,32,195,132 776879202D203F20C384 2
why Not ok Typ=1 Len=3 CharacterSet=AL32UTF8: 77,68,79 Typ=1 Len=3 CharacterSet=AL32UTF8: 119,104,121 776879 0
Note: as that is unicode, so the first byte >127 means that is a multibyte character, so it counts 'Ä' twice - c3,84, - both bytes are higher than 127.
Don't know why you want to use codepoints instead of character sets, but you can invert the logic - use not 1-127 - [^1-127] :
DBFiddle
select 'OK' as result
from dual
where regexp_like('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']');
select regexp_substr('why Ä ?', '[^'||chr(1)||'-'||chr(127)||']') x from dual;
And do not forget that some characters can be special characters like ] or even non-printable
I'm looking for a regexp to get the correct output
For my example:
SELECT regexp_substr('brablcdefghig', '[^(bl)]+$') FROM dual;
I expect evth what is follow 'bl': cdefghig and it's OK,
But when I modify input and add 'b' charcter I've NULL in output why?
SELECT regexp_substr('brablcdefghigb', '[^(bl)]+$') FROM dual;
That's a simple substr + instr; you don't need regular expressions. If it has to be regexp, see lines #8 and 9
SQL> with test (id, col) as
2 (select 1, 'brablcdefghig' from dual union all
3 select 2, 'brablcdefghigb' from dual
4 )
5 select id,
6 col,
7 substr(col, instr(col, 'bl') + 2) result,
8 regexp_substr(replace(col, 'bl', '#'), '[^#]+$') result2,
9 regexp_replace(col, '.+bl', '') result3
10 from test;
ID COL RESULT RESULT2 RESULT3
---------- -------------- ---------- ---------- ----------
1 brablcdefghig cdefghig cdefghig cdefghig
2 brablcdefghigb cdefghigb cdefghigb cdefghigb
SQL>
I wanted to write an Oracle query to extract only the last sub-string of comma separated string like below:
DEST = "1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H"
I am interested in only G12. How do I get in the Oracle query?
Thanks
Try
REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '[^,]+$')
But that will fetch G12 47H. You may consider
REGEXP_SUBSTR('1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H', '([^, ]+)( +[^,]*)?$', 1,1,NULL,1)
This will give G12.
A little bit of substringing (see comments within the code):
SQL> with test (dest) as
2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual)
3 select
4 regexp_substr(dest, --> out of the DEST, give me ...
5 '\w+', --> ... the first word that begins right after ...
6 instr(dest, ',', 1, regexp_count(dest, ',')) + 1 --> ... postition of the last
7 ) result --> comma in the source string
8 from test;
RESULT
--------------------
G12
SQL>
Or, by splitting the comma-separated values into rows:
SQL> with test (dest) as
2 (select '1,MMA SALAI,ARIANKUPAM,CITY CENTRE,G12 47H' from dual)
3 select regexp_substr(col, '\w+') result
4 from (select regexp_substr(dest, '[^,]+', 1, level) col, --> split column to rows
5 row_number() over (order by level desc) rn --> the last row will be RN = 1
6 from test
7 connect by level <= regexp_count(dest, ',') + 1
8 )
9 where rn = 1;
RESULT
--------------------
G12
SQL>
I have following statement in oracle sql I want to run this in Google Big Query.
CONNECT BY REGEXP_SUBSTR(VALUE, '[^,]+', 1, LEVEL) IS NOT NULL)
How can I run above code in Big query?
I am guessing here - but usually this construct is used for so called string decomposition
So, in BigQuery you can use SPLIT(value) or REGEXP_EXTRACT_ALL(value, r'[^,]+') for this as in below examples
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, SPLIT(value) value
FROM `project.dataset.table`
or
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, REGEXP_EXTRACT_ALL(value, r'[^,]+') value
FROM `project.dataset.table`
both above query will return
Row id value
1 1 1
2
3
4
5
6
7
2 2 a
b
c
d
Here, as you can see - value in each row gets split into array of elements but still in the same row
To flatten result you can further use UNNEST() as in below examples
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, value
FROM `project.dataset.table`,
UNNEST(SPLIT(value)) value
or
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, '1,2,3,4,5,6,7' AS value UNION ALL
SELECT 2, 'a,b,c,d'
)
SELECT id, value
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(value, r'[^,]+')) value
both return below result (with all extracted elements in separate row)
Row id value
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7
8 2 a
9 2 b
10 2 c
11 2 d
I'm using regexp_like function in Oracle in order to match the following number format : xxxyxxx
I'm trying this :
select 1 "val"
from dual
where regexp_like('5553555','^(\d){3}(?!\1)\d\1{3}$')
but as I realized, negative lookahead is not supported in Oracle.
how to do it without negative lookahead?
Indeed, no look around is possible. Please note that you also have another issue: (\d){3} will match also 3 different digits. You would need (\d)\1\1 to match only three of the same digits.
For your particular case you could still use a regular expression. What I could think of is using a particular property: numbers with all the same 7 digits (xxxxxxx) will be dividable by 1111111.
With regexp_like and an additional modulo test:
with tbl(val) as (
select '5555555' from dual union
select '5553555' from dual union
select 'nothing' from dual
)
select val
from tbl
where regexp_like(val,'^(\d)\1\1\d\1{3}$') and mod(val, 1111111) > 0;
Or you could use two regexes:
with tbl(val) as (
select '5555555' from dual union
select '5553555' from dual union
select 'nothing' from dual
)
select val
from tbl
where regexp_like(val,'^(\d)\1\1\d\1{3}$') and not regexp_like(val,'^(\d)..\1');
Admittedly, neither is really elegant, and also not the most efficient. For more efficiency you should not use regular expressions.
Maybe oldfashioned SUBSTR might help. Something like this: split input string (COL) into two equal pieces, and compare whether they match. LEN is used to distinguish odd from even lengths and what to do with the second part of the string (i.e. which is its starting point).
A few examples:
SQL> WITH test (col) AS (SELECT '5554555' FROM DUAL),
2 len AS (SELECT LENGTH (col) len FROM test)
3 SELECT CASE
4 WHEN SUBSTR (col, 1, TRUNC (LENGTH (col) / 2)) =
5 SUBSTR (
6 col,
7 TRUNC (LENGTH (col) / 2)
8 + CASE WHEN MOD (l.len, 2) = 0 THEN 1 ELSE 2 END)
9 THEN
10 'OK'
11 ELSE
12 'Not OK'
13 END
14 result
15 FROM test t, len l;
RESULT
------
OK
SQL> l1
1* WITH test (col) AS (SELECT '5554555' FROM DUAL),
SQL> c/5554/2234/
1* WITH test (col) AS (SELECT '2234555' FROM DUAL),
SQL> /
RESULT
------
Not OK
SQL> l1
1* WITH test (col) AS (SELECT '2234555' FROM DUAL),
SQL> c/2234555/1221/
1* WITH test (col) AS (SELECT '1221' FROM DUAL),
SQL> /
RESULT
------
Not OK
SQL> l1
1* WITH test (col) AS (SELECT '1221' FROM DUAL),
SQL> c/1221/8888/
1* WITH test (col) AS (SELECT '8888' FROM DUAL),
SQL> /
RESULT
------
OK
SQL>
Use of the Trim Function to Trim Off The 'X' Values from 'Y'
This is just another approach to solving this subset of numeric palindrome problems.
If this were just a numeric palindrome, the undocumented function, reverse, could be used. Since we have a Y for the midvalue and we are testing to make sure that Y is not equal to X, the reverse function does not help us a lot here.
Borrowing on the use of subexpressions (aka character grouping) approach that Trincot uses, I just create a second subexpression for the midvalue and then I trim off the midvalue. If the trimmed expression is equal to original value, then we can be assured that Y != X.
SCOTT#db>WITH tst ( val ) AS (
2 SELECT '5555555' FROM DUAL UNION ALL
3 SELECT '12121' FROM DUAL UNION ALL
4 SELECT '5553555' FROM DUAL UNION ALL
5 SELECT 'amanaplanpanama' FROM DUAL UNION ALL
6 SELECT '' FROM DUAL
7 ) SELECT
8 val,
9 REGEXP_SUBSTR(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) midval,
10 TRIM(BOTH REGEXP_SUBSTR(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) FROM val) trim_midval
11 FROM
12 tst
13 WHERE
14 1 = 1
15 AND val = TRIM(BOTH regexp_substr(val,'^(\d)\1\1(\d)\1{3}$',1,1,NULL,2) FROM val);
-----------------------------
VAL MIDVAL TRIM_MIDVAL
5553555 3 5553555
-----------------------------
Littlefoot's non-regular expression solution appears to be the most straightforward here.