PL/SQL optimize searching a date in varchar - regex

I have a table, that contains date field (let it be date s_date) and description field (varchar2(n) desc). What I need is to write a script (or a single query, if possible), that will parse the desc field and if it contains a valid oracle date, then it will cut this date and update the s_date, if it is null.
But there are one more condition - there are must be exactly one occurence of a date in the desc. If there are 0 or >1 - nothing should be updated.
By the time I came up with this pretty ugly solution using regular expressions:
----------------------------------------------
create or replace function to_date_single( p_date_str in varchar2 )
return date
is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
begin
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)((.|\n|\t|\s)*((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d))?';
pResStr := regexp_substr(p_date_str, pRegEx);
if not (length(pResStr) = 10)
then return null;
end if;
l_date := to_date(pResStr, 'dd.mm.yyyy');
return l_date;
exception
when others then return null;
end to_date_single;
----------------------------------------------
update myTable t
set t.s_date = to_date_single(t.desc)
where t.s_date is null;
----------------------------------------------
But it's working extremely slow (more than a second for each record and i need to update about 30000 records). Is it possible to optimize the function somehow? Maybe it is the way to do the thing without regexp? Any other ideas?
Any advice is appreciated :)
EDIT:
OK, maybe it'll be useful for someone. The following regular expression performs check for valid date (DD.MM.YYYY) taking into account the number of days in a month, including the check for leap year:
(((0[1-9]|[12]\d|3[01])\.(0[13578]|1[02])\.((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\.(0[13456789]|1[012])\.((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\.02\.((19|[2-9]\d)\d{2}))|(29\.02\.((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))
I used it with the query, suggested by #David (see accepted answer), but I've tried select instead of update (so it's 1 regexp less per row, because we don't do regexp_substr) just for "benchmarking" purpose.
Numbers probably won't tell much here, cause it all depends on hardware, software and specific DB design, but it took about 2 minutes to select 36K records for me. Update will be slower, but I think It'll still be a reasonable time.

I would refactor it along the lines of a single update query.
Use two regexp_instr() calls in the where clause to find rows for which a first occurrence of the match occurs and a second occurrence does not, and regexp_substr() to pull the matching characters for the update.
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where regexp_instr(desc,pattern,1,1) > 0 and
regexp_instr(desc,pattern,1,2) = 0
You might get even better performance with:
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where case regexp_instr(desc,pattern,1,1)
when 0 then 'N'
else case regexp_instr(desc,pattern,1,2)
when 0 then 'Y'
else 'N'
end
end = 'Y'
... as it only evaluates the second regexp if the first is non-zero. The first query might also do that but the optimiser might choose to evaluate the second predicate first because it is an equality condition, under the assumption that it's more selective.
Or reordering the Case expression might be better -- it's a trade-off that's difficult to judge and probably very dependent on the data.

I think there's no way to improve this task. Actually, in order to achieve what you want it should get even slower.
Your regular expression matches text like 31.02.2013, 31.04.2013 outside the range of the month. If you put year in the game,
it gets even worse. 29.02.2012 is valid, but 29.02.2013 is not.
That's why you have to test if the result is a valid date.
Since there isn't a full regular expression for that, you would have to do it by PLSQL really.
In your to_date_single function you return null when a invalid date is found.
But that doesn't mean there won't be other valid dates forward on the text.
So you have to keep trying until you either find two valid dates or hit the end of the text:
create or replace function fn_to_date(p_date_str in varchar2) return date is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
vn_findings number;
vn_loop number;
begin
vn_findings := 0;
vn_loop := 1;
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)';
loop
pResStr := regexp_substr(p_date_str, pRegEx, 1, vn_loop);
if pResStr is null then exit; end if;
begin
l_date := to_date(pResStr, 'dd.mm.yyyy');
vn_findings := vn_findings + 1;
-- your crazy requirement :)
if vn_findings = 2 then
return null;
end if;
exception when others then
null;
end;
-- you have to keep trying :)
vn_loop := vn_loop + 1;
end loop;
return l_date;
end;
Some tests:
select fn_to_date('xxxx29.02.2012xxxxx') c1 --ok
, fn_to_date('xxxx29.02.2012xxx29.02.2013xxx') c2 --ok, 2nd is invalid
, fn_to_date('xxxx29.02.2012xxx29.02.2016xxx') c2 --null, both are valid
from dual
As you are going to have to do try and error anyway one idea would be to use a simpler regular expression.
Something like \d\d[.]\d\d[.]\d\d\d\d would suffice. That would depend on your data, of course.
Using #David's idea you could filter the ammount of rows to apply your to_date_single function (because it's slow),
but regular expressions alone won't do what you want:
update my_table
set my_date = fn_to_date( )
where regexp_instr(desc,patern,1,1) > 0

Related

How to transfer data with multiple conditions with pgsql?

I would like to create a trigger with I can check the uploaded data and I can insert into another table as well. In the initial table (capture) I have 15 columns, but I would like to transfer only 5 columns (ring_number, code, year, date, species, location) to another table (ring).
The ring table is a background table in which I am collecting the combinations of ring_number and code, more specifically one ring_number could be paired only with one code. There is one exception, when the code include "X", than it can be changed later, and in this case this code can be paired with more ring_number, and if originally belongs to the ring_number a code with "X" it can be changed later.
In the capture table, could be possible to upload the same combination of code and ring_number multiple times with a condition of a third column. But still the ring_number can be paired only with one code, with exceptions of codes included "X". The name of the conditional column is recapture. If recapture (boolean column type) is "true", then you can upload the combination of code and ring_number again. If it is "empty" or "no" you can upload only new combinations of code and ring_number. If somebody uploads old combinations then the following error message has to raise: this combination already exists, please check your data and if it is a recapture, then set the recapture column to yes.
Additionally: ring_number is a not null column, but code can be empty. And different ring_number can be paired with empty code than later can be paired with actual value.
I have several problems with my code:
1: I would like to define the exception to X with regex, and the X can be anywhere in the code. But can not manage the regex in a good way. It does just not work.
2: I write conditional checkpoint with recapture column and if I have an old combination this is work on the right way and say please set the recapture column to yes. But! If I set the recapture column to yes I get the same error message.
Could you help to solve these issues?
Here is my code:
Declare
a integer := 0;
b integer := 0;
c integer := 0;
d integer := 0;
Begin
IF new.code <> '' THEN
--Az 'a' means whether the given ring_number already exist in the database with a code, which is not empty
SELECT INTO a COUNT(*) FROM plover_captures PC WHERE PC.ring_number = new.ring_number AND PC.code <> new.code AND PC.code <> '' AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'b' means the given code already exist in the database with a ring_number
SELECT INTO b COUNT(*) FROM plover_captures PC WHERE PC.ring_number <> new.ring_number AND PC.code = new.code AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'c' how much times exist the given ring_number with the given code in the database
SELECT INTO c COUNT(*) FROM plover_captures PC WHERE PC.ring_number = new.ring_number AND PC.code = new.code AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'd' means the given combination already exist in ring table or not
SELECT INTO d COUNT(*) FROM plover_rings PC WHERE PC.ring_number = new.ring_number AND PC.code = new.code;
IF a > 0 THEN
raise exception 'This ring_number is already paired with another code before. %', new.ring_number;
END IF;
IF b > 0 THEN
raise exception 'This code is already paired with another ring_number before. %', new.code;
END IF;
IF c > 0 AND (new.rettrap IS null OR new.rettrap IS false) THEN
raise exception 'This ring_number and code pair is already in the database. So it is a rettrap but the rettrap attribute set to false or null. %, %, %', new.ring_number, new.code, new.rettrap;
END IF;
IF c = 0 AND new.rettrap IS true THEN
raise exception 'The rettrap attribute set to true but this ring_number and code pair is not in this database yet. %, %, %', new.ring_number, new.code, new.rettrap;
END IF;
IF c = 0 AND d = 0 THEN
Insert into plover_rings values(new.ring_number,new.code,new.species,new.location,new.year, new.date);
END IF;
END IF;
Return new;
End

Date format substitution in PL/SQL. Example: from 5y 6m 20d to 050620

I am writing a query where I need to perform a date format transformation to meet the specified requirements.
In the database which I have to search, the date format looks like the one in the example: 5y 6m 10d with spaces in between and with optional digits (10y 30d; 1m 23d; 6m are also valid) and they are always ordered (first years, then month and then days).
The format transformation should be the following:
10y 6m 10d => 100610
1y 10m 1d => 011001
6m 2d => 000602
So that the output is always a 6-digit number.
I tried writing regular expressions within REGEX_SUBSTR to isolate the tokens and then concatenate them together in the type of SELECT REGEXP_SUBSTR(text_source, '(\d+)*y') FROM database and I also tried using the REGEX_REPLACE function. Nevertheless, I am not able to perform the transformation to two digits per token without spaces, nor replace one pattern by another, I can only replace the pattern by another string.
Although I am able to output the token separation without spaces by writing the function above. I am not able to get the whole transformation. Is there any possibility of writing a RegEx and combining it with any of the PL/SQL functions in order to transform the dates stated on the list above ? I am also open to hear any other solutions not involving RegEx, I just thought it was sensible to make a proper use of them here.
Here is a simple solution in SQL.
you get the values for year, month and day e.g. with regexp_substr.
with nvl you set the value to 0 if there it is null.
lpad it with 0
with tab as(
select '10y 6m 10d' as str from dual union all
select '1y 10m 1d ' as str from dual union all
select '6m 2d ' as str from dual
)
select lpad(nvl(y,0), 2,'0') ||lpad(nvl(m,0), 2,'0')|| lpad(nvl(d,0), 2,'0')
from (
select rtrim(regexp_substr(str, '[0-9]{1,2}y', 1),'y') as y
,rtrim(regexp_substr(str, '[0-9]{1,2}m', 1),'m') as m
,rtrim(regexp_substr(str, '[0-9]{1,2}d', 1),'d') as d
from tab
)
;
LPAD(N
------
100610
011001
000602
I hope it works
declare
myDate_ varchar2(50) := REPLACE('1y 10m 81d',' ','');
year_ varchar2(50);
month_ varchar2(50);
day_ varchar2(50);
begin
if instrb(myDate_,'y',1,1)>0 then
year_ := lpad(regexp_substr(substr(myDate_,0,instrb(myDate_,'y',1,1)), '[^y]+',1 , 1),2,0);
end if;
if instrb(myDate_,'m',1,1)>0 then
month_ := lpad(regexp_substr(substr(myDate_,instrb(myDate_,'y',1,1)+1,instrb(myDate_,'m',1,1)), '[^m]+',1 , 1),2,0);
end if;
if instrb(myDate_,'d',1,1)>0 then
day_ := lpad(regexp_substr(substr(myDate_,instrb(myDate_,'m',1,1)+1,instrb(myDate_,'d',1,1)), '[^d]+',1 , 1),2,0);
end if;
dbms_output.put_line(year_||month_||day_);
end;

display the content of a file split by a delimiter character

I am trying to display the content of a file, split by a delimiter character.
More exactly, starting from this topic, I am trying to display the result as:
bbb
aaa
qqq
ccc
but the data source to be taken from a file.
Until now, I tried:
DECLARE
l_bfile bfile;
BEGIN
l_bfile := bfilename(my_dir, my_file);
dbms_lob.fileopen(l_bfile);
FOR i IN
(SELECT TRIM(regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
EXCEPTION
WHEN No_Data_Found THEN
NULL;
END;
As result, I got
PL/SQL: ORA-00932: inconsistent datatypes: expected NUMBER got FILE
Can anyone give me a hint, please?
Have to write this as a new answer since this is too big for a comment to #SmartDumb:
Be advised the regex of the form '[^;]+' (commonly used for parsing delimited lists) fails when NULL elements are found in the list. Please see this post for more information: https://stackoverflow.com/a/31464699/2543416
Instead please use this form of the call to regexp_substr (note I removed the second element):
SELECT TRIM(regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level, null, 1) ) AS q
FROM dual
CONNECT BY regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level) IS NOT NULL
ORDER BY level
It may or may not be important in this example, it depends on if the order of the element in the string has importance to you or if you need to preserve the NULL. i.e. if you need to know the second element is NULL then this will work.
P.S. Do a search for external tables and see if that is a solution you could use. That would let you query a file as if it were a table.
You could possible try this if your file contains single line (hence the question about file structure):
DECLARE
utlFileHandle UTL_FILE.FILE_TYPE;
vLine varchar2(100);
BEGIN
utlFileHande := UTL_FILE.FOPEN(my_dir, my_file, 'r');
utl_file.get_line(utlFileHande, vLine);
FOR i IN
(SELECT TRIM(regexp_substr(vLine,'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(vLine,'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
utl_file.fclose(utlFileHande);
EXCEPTION
WHEN No_Data_Found THEN
utl_file.fclose(utlFileHande);
null;
END;

Crystal Reports Else If statement

I can't figure out why this if statement won't work.
I have a DateTime field DATEFROM and a String parameter (it HAS to be String) periodEnd.
I would like to calculate percentages depending if these two dates have 1, 2, 3 or more years difference.
When I use this formula I get either "100%" or "-" and never the other two options. It's like CR calculates the first IF and if it's true then: "100%" but if it's false, it never checks for the rest of the Else Ifs and goes dirreclty to Else
StringVar percentage:="";
If (cDate({?periodEnd})-{TABLE.DATEFROM})<=1 Then
percentage:="100%"
Else If (cDate({?periodEnd})-{TABLE.DATEFROM})<=2 Then
percentage:="66%"
Else If (cDate({?periodEnd})-{TABLE.DATEFROM})<=3 Then
percentage:="33%"
Else
percentage:="-"
Any idea?
a) I assume you already made sure your periodend format matches with what cdate interprets?
If you just subtract them, Crystal by default returns the number of days between.
Two ways you can do year:
datediff("yyyy",cDate({?periodEnd},{TABLE.DATEFROM})
or
year(cDate({?periodEnd})-year({TABLE.DATEFROM})
b) You've also no doubt accounted for when your {TABLE.DATEFROM} is greater than cDate({?periodEnd} and you have a negative result?
Not sure if the following is the behavior you would want, but I'm throwing it in for your information
ABS(datediff("yyyy",cDate({?periodEnd},{TABLE.DATEFROM}))
**make sure you check the help file for the datediff codes ("yyyy" etc) as they are not quite instinctive and can trip you up when you think you're using the obvious one

pl sql How to make my code less

Can anyone help me to make my code less? (If you can notice to both if-elsif statements I make the same Select.. so I wish there was a way to make this select once. and update with 1 or 0 depending on the pilot_action).
Below its my code.
create or replace
PROCEDURE F_16 (TRK_ID NUMBER, pilot_action NUMBER) IS
BEGIN
BEGIN
IF pilot_action=0 THEN
UPDATE "ControlTow"
SET "Intention"=0
WHERE "Id" IN (
SELECT "Id" FROM "ControlTow" WHERE "Id"=TRK_ID );
ELSIF pilot_action=1 THEN
UPDATE "ControlTow"
SET "Intention"=1
WHERE "Id" IN (
SELECT "Id" FROM "ControlTow" WHERE "Id"=TRK_ID );
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN dbms_output.put_line('False Alarm');
COMMIT;
END;
END F_16;
thank you , in advance.
Your code has several issues I have addressed in the comments below. Note that transaction management is not discussed as it's not clear based on the question when commit/rollback should take place.
-- #1 use of explicit parameter mode
create or replace procedure f_16(p_trk_id in number, p_pilot_action in number) is
begin
-- #2 use of in
if p_pilot_action in (0, 1)
then
-- #3 unnecessary subquery removed
update controltow
set intention = p_pilot_action
where id = p_trk_id;
-- #4 use pl/sql implicit cursor attribute to check the number of affected rows
if sql%rowcount = 0
then
dbms_output.put_line('false alarm');
end if;
end if;
end;
Since you seem to be assigning pilot_action to Intention, I would do following:
create or replace
PROCEDURE F_16 (TRK_ID NUMBER, pilot_action NUMBER) IS
BEGIN
BEGIN
IF pilot_action IN (0, 1) THEN
-- if the only condition in subselect is the ID then use it directly
UPDATE "ControlTow"
SET "Intention"= pilot_action
WHERE "Id"=TRK_ID;
-- if there are more conditions than just the ID then subselect may be the way to go
--(hard to say without more information)
-- WHERE "Id" IN (
-- SELECT "Id" FROM "ControlTow" WHERE "Id"=TRK_ID AND ... )
ELSE
Null; -- do whatever you need in this case. Raise exception?
END IF;
EXCEPTION
WHEN NO_DATA_FOUND THEN dbms_output.put_line('False Alarm');
COMMIT;
END;
END F_16;
EDIT: As #user272735 said, there was room for more improvement on the code. Specifically rewriting the if condition to use in and simplifying the where clause (supposing Id is really the only condition to select rows to be updated).