REGEXP_SUBSTR Is taking more time for execution in Oracle

REGEXP_SUBSTR Is taking more time for execution in Oracle - regex

I am trying to split a comma separated email string into individual email ids which are comma separated but each email id is enclosed inside single quotation.
My Input is 'one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com'
My Output Should be: 'one#gmail.com','two#gamil.com','three#gmail.com','four#gmail.com'
I am going to use the output string above in oracle query where condition like...
Where EmailId's in ( 'one#gmail.com','two#gamil.com','three#gmail.com','four#gmail.com');
I am using the following code to achieve this
WHERE EMAIL IN
(REGEXP_SUBSTR('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com' ,'[^,]+', 1, LEVEL))
CONNECT BY LEVEL <= LENGTH('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com' ) - LENGTH(REPLACE('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com' , ',', '')) +1;
But the above query taking 60 seconds to return only 16 records. Can any one suggest me the best approach for this...

Try this,
WHERE email IN (
select regexp_substr('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com','[^,]+', 1, level) from dual
connect by regexp_substr('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com', '[^,]+', 1, level) is not null );

Related

Replace brackets and splitting a column into multiple rows based on a delimiter in Postgres

I have a table with column with separated by ';'. The data looks like this:
row_id col
1 p.[D389R;D393_W394delinsRD]
2 p.[D390R;D393_W394delinsRD]
3 p.D389R
4. p.[D370R;D393_W394delinsRD]
I would like replace the '[]' brackets whereever they are and fetch the text. Later, I would like to split the string be ';' and concatenate 'p.' to the splitted text (if it is not there) and create a new row.
The expected output is:
row_id new_col
1 p.D389R
2 p.D393_W394delinsRD
3 p.D390R
4 p.D393_W394delinsRD
5 p.D389R
6 p.D370R
7 p.D393_W394delinsRD
I have tried below query to get the desired output.
SELECT *,
CASE
WHEN regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';') NOT LIKE 'p.[%'
THEN 'p.' || (regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';'))[1]
ELSE regexp_split_to_table(regexp_replace(col, '\[|\]', '', 'g'), E';')[2]
END AS new_col
FROM table;
Any suggestions would be really helpful.

I would first remove the constant values ( p.[ and ]) from the string and then unnest it.
with clean as (
select row_id, regexp_replace(col, '^p\.(\[){0,1}|\]$', '', 'g') as col
from the_table
)
select row_id, 'p.'|| t.c
from clean c
cross join unnest(string_to_array(c.col, ';')) as t(c)
The CTE (with ...) isn't really necessary, but that way the unnest(...) stays readable.
Online example

Inconsistent results from Oracle's REGEXP_SUBSTR

Given a string of key-value pairs: /* USER='Administrator'; UNV='Universe'; DOC='WebIntellignceReport'; */
My goal is to extract values associated with the USER, UNV, and DOC keys.
Using a pattern of (?<=UNV=')(.*?)(?='), I get the expected value of Universe associated the UNV key (Fiddle).
However, when I use the pattern with REGEXP_SUBSTR, I get a NULL:
SELECT text
,REGEXP_SUBSTR(text,'(?<=UNV='')(.*?)(?='')') UNV
FROM (
SELECT '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text
FROM dual
) v
What am I missing?

You may extract the contents of group 1:
SELECT text, REGEXP_SUBSTR(text,'UNV=''(.*?)''', 1, 1 ,NULL, 1) UNV
FROM (
SELECT '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text
FROM dual
) v
See the online demo.
With UNV='(.*?)' , you may extract just what is between the closest single quuotes afterUNV=.

I think the easiest thing to do is just grab the whole key-value pair using REGEXP_SUBSTR, and then do another substr to pull out the value you want.
with v as (select '/* USER=''Administrator''; UNV=''Universe''; DOC=''WebIntellignceReport''; */' as text from dual)
select text, key_val, substr(key_val, instr(key_val, '''')+1, length(key_val)-instr(key_val, '''')-2)
from (
select text,
regexp_substr(text, ' UNV=''[^'']*'';') key_val
from v);
Output:
TEXT KEY_VAL VAL
----------------------------------------------------------------------- ----------------------------------------------------------------------- -----------------------------------------------------------------------
/* USER='Administrator'; UNV='Universe'; DOC='WebIntellignceReport'; */ UNV='Universe'; Universe

sqlite regex: How return count of 'X' from values in column

i am using pandasql to transform data. Inside a query i would like to pull out for example the number of periods ('.') in the email address. SQLite does not seem to support regex.
In SQL i could write:
length(regexp_replace(email, '[^.]', '', 'g')) as email_period
*#applying this to the email (my.first_name#abc.com) would return 2*
Look forward to your expertise for a solution with SQLite. Thank you in advance.

You can use simple REPLACE and LENGTH to calculate number of periods:
CREATE TABLE tab(email VARCHAR(100));
INSERT INTO tab(email)
VALUES ('my.first_name#abc.com'),('my.first.name#abc.com.ru');
SELECT email, LENGTH(email) - LENGTH(REPLACE(email, '.', '')) AS email_period
FROM tab;
SqlFiddleDemo
EDIT:
Counting digits:
SELECT email,
LENGTH(email) -
LENGTH(REPLACE( REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(email, '0', '')
,'1', ''),'2', ''),'3', ''),'4', '')
,'5', ''),'6', ''),'7', ''),'8', '')
,'9', '')) AS email_digits
FROM tab
SqlFiddleDemo2

Oracle How do I transform this string field into structured data using regular expressions?

I did start at this answer:
Oracle 11g get all matched occurrences by a regular expression
But it didn't get me far enough. I have a string field that looks like this:
A=&token1&token2&token3,B=&token2&token3&token5
It could have any number of tokens and any number of keys. The desired output is a set of rows looking like this:
Key | Token
A | &token1
A | &token2
A | &token3
B | &token2
B | &token3
B | &token5
This is proving rather difficult to do.
I started here:
SELECT token from
(SELECT REGEXP_SUBSTR(str, '[A-Z=&]+', 1, LEVEL) AS token
FROM (SELECT 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[A-Z=&]+', ',')))
Where token is not null
But that yields:
A=&
&
&
B=&
&
&
which is getting me nowhere. I'm thinking I need to do a nested clever select where the first one gets me
A=&token1&token2&token3
B=&token2&token3&token5
And a subsequent select might be able to do a clever extract to get the final result.
Stumped. I'm trying to do this without using procedural or function code -- I would like the set to be something I can union with other queries so if it's possible to do this with nested selects that would be great.
UPDATE:
SET DEFINE OFF
SELECT SUBSTR(token,1,1) as Key, REGEXP_SUBSTR(token, '&\w+', 1, LEVEL) AS token2
FROM
(
-- 1 row per key/value pair
SELECT token from
(SELECT REGEXP_SUBSTR(str, '[^,]+', 1, LEVEL) AS token
FROM (SELECT 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^,]+', ',')))
Where token is not null
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(token, '&\w+'))
This gets me
A | &token1
A | &token2
B | &token3
B | &token2
A | &token2
B | &token3
Which is fantastic formatting except for the small problem that it's wrong (A should have a token3, and token4 and token5 are nowhere to be seen).

Great question! Thanks for it!
select distinct k, regexp_substr(v, '[^&]+', 1, level) t
from (
select substr(regexp_substr(val,'^[^=]+=&'),1,length(regexp_substr(val,'^[^=]+=&'))-2) k, substr(regexp_substr(val,'=&.*'),3) v
from (
select regexp_substr(str, '[^,]+', 1, level) val
from (select 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
connect by level <= length(str) - length(replace(str,','))+1
)
) connect by level <= length(v) - length(replace(v,'&'))+1
It is an answer, and one that seems to work... But I don't like the middle splitting the val into kand v- there must be a better way (if the Key is always one character, that makes it easy though) . And having to put a DISTINCT to get rid of duplicates is horrible... Maybe with further playing you can clean it up though (or someone else might)
EDIT based on keeping the leading & and the key being a single character:
select distinct k, regexp_substr(v, '&[^&]+', 1, level) t
from (
select substr(val,1,1) k
, substr(regexp_substr(val,'=&.*'),1) v
from (
select regexp_substr(str, '[^,]+', 1, level) val
from (select 'A=&token1&token2&token3,B=&token2&token3&token5' str from dual)
connect by level <= length(str) - length(replace(str,','))+1
)
) connect by level < length(v) - length(replace(v,'&'))+1

What is the best way to populate a load file for a date lookup dimension table?

Informix 11.70.TC4:
I have an SQL dimension table which is used for looking up a date (pk_date) and returning another date (plus1, plus2 or plus3_months) to the client, depending on whether the user selects a "1","2" or a "3".
The table schema is as follows:
TABLE date_lookup
(
pk_date DATE,
plus1_months DATE,
plus2_months DATE,
plus3_months DATE
);
UNIQUE INDEX on date_lookup(pk_date);
I have a load file (pipe delimited) containing dates from 01-28-2012 to 03-31-2014.
The following is an example of the load file:
01-28-2012|02-28-2012|03-28-2012|04-28-2012|
01-29-2012|02-29-2012|03-29-2012|04-29-2012|
01-30-2012|02-29-2012|03-30-2012|04-30-2012|
01-31-2012|02-29-2012|03-31-2012|04-30-2012|
...
03-31-2014|04-30-2014|05-31-2014|06-30-2014|
........................................................................................
EDIT : Sir Jonathan's SQL statement using DATE(pk_date + n UNITS MONTH on 11.70.TC5 worked!
I generated a load file with pk_date's from 01-28-2012 to 12-31-2020, and plus1, plus2 & plus3_months NULL. Loaded this into date_lookup table, then executed the update statement below:
UPDATE date_lookup
SET plus1_months = DATE(pk_date + 1 UNITS MONTH),
plus2_months = DATE(pk_date + 2 UNITS MONTH),
plus3_months = DATE(pk_date + 3 UNITS MONTH);
Apparently, DATE() was able to convert pk_date to DATETIME, do the math with TC5's new algorithm, and return the result in DATE format!
.........................................................................................
The rules for this dimension table are:
If pk_date has 31 days in its month and plus1, plus2 or plus3_months only have 28, 29, or 30 days, then let plus1, plus2 or plus3 equal the last day of that month.
If pk_date has 30 days in its month and plus1, plus2 or plus3 has 28 or 29 days in its month, let them equal the last valid date of those month, and so on.
All other dates fall on the same day of the following month.
My question is: What is the best way to automatically generate pk_dates past 03-31-2014 following the above rules? Can I accomplish this with an SQL script, "sed", C program?
EDIT: I mentioned sed because I already have more than two years worth of data and
could perhaps model the rest after this data, or perhaps a tool like awk is better?

The best technique would be to upgrade to 11.70.TC5 (on 32-bit Windows; generally to 11.70.xC5 or later) and use an expression such as:
SELECT DATE(given_date + n UNITS MONTH)
FROM Wherever
...
The DATETIME code was modified between 11.70.xC4 and 11.70.xC5 to generate dates according to the rules you outline when the dates are as described and you use the + n UNITS MONTH or equivalent notation.
This obviates the need for a table at all. Clearly, though, all your clients would also have to be on 11.70.xC5 too.
Maybe you can update your development machine to 11.70.xC5 and then use this property to generate the data for the table on your development machine, and distribute the data to your clients.
If upgrading at least someone to 11.70.xC5 is not an option, then consider the Perl script suggestion.

Can it be done with SQL? Probably, but it would be excruciating. Ditto for C, and I think 'no' is the answer for sed.
However, a couple of dozen lines of perl seems to produce what you need:
#!/usr/bin/perl
use strict;
use warnings;
use DateTime;
my #dates;
# parse arguments
while (my $datep = shift){
my ($m,$d,$y) = split('-', $datep);
push(#dates, DateTime->new(year => $y, month => $m, day => $d))
|| die "Cannot parse date $!\n";
}
open(STDOUT, ">", "output.unl") || die "Unable to create output file.";
my ($date, $end) = #dates;
while( $date < $end ){
my #row = ($date->mdy('-')); # start with pk_date
for my $mth ( qw[ 1 2 3 ] ){
my $fut_d = $date->clone->add(months => $mth);
until (
($fut_d->month == $date->month + $mth
&& $fut_d->year == $date->year) ||
($fut_d->month == $date->month + $mth - 12
&& $fut_d->year > $date->year)
){
$fut_d->subtract(days => 1); # step back until criteria met
}
push(#row, $fut_d->mdy('-'));
}
print STDOUT join("|", #row, "\n");
$date->add(days => 1);
}
Save that as futuredates.pl, chmod +x it and execute like this:
$ futuredates.pl 04-01-2014 12-31-2020
That seems to do the trick for me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEXP_SUBSTR Is taking more time for execution in Oracle - regex

Try this, WHERE email IN ( select regexp_substr('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com','[^,]+', 1, level) from dual connect by regexp_substr('one#gmail.com,two#gamil.com,three#gmail.com,four#gmail.com', '[^,]+', 1, level) is not null );

Related

Replace brackets and splitting a column into multiple rows based on a delimiter in Postgres

Inconsistent results from Oracle's REGEXP_SUBSTR

sqlite regex: How return count of 'X' from values in column

Oracle How do I transform this string field into structured data using regular expressions?

What is the best way to populate a load file for a date lookup dimension table?

Categories

Resources