display the content of a file split by a delimiter character

display the content of a file split by a delimiter character - regex

I am trying to display the content of a file, split by a delimiter character.
More exactly, starting from this topic, I am trying to display the result as:
bbb
aaa
qqq
ccc
but the data source to be taken from a file.
Until now, I tried:
DECLARE
l_bfile bfile;
BEGIN
l_bfile := bfilename(my_dir, my_file);
dbms_lob.fileopen(l_bfile);
FOR i IN
(SELECT TRIM(regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(TO_CHAR(l_bfile),'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
EXCEPTION
WHEN No_Data_Found THEN
NULL;
END;
As result, I got
PL/SQL: ORA-00932: inconsistent datatypes: expected NUMBER got FILE
Can anyone give me a hint, please?

Have to write this as a new answer since this is too big for a comment to #SmartDumb:
Be advised the regex of the form '[^;]+' (commonly used for parsing delimited lists) fails when NULL elements are found in the list. Please see this post for more information: https://stackoverflow.com/a/31464699/2543416
Instead please use this form of the call to regexp_substr (note I removed the second element):
SELECT TRIM(regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level, null, 1) ) AS q
FROM dual
CONNECT BY regexp_substr('bbb;;qqq;ccc','(.*?)(;|$)',1,level) IS NOT NULL
ORDER BY level
It may or may not be important in this example, it depends on if the order of the element in the string has importance to you or if you need to preserve the NULL. i.e. if you need to know the second element is NULL then this will work.
P.S. Do a search for external tables and see if that is a solution you could use. That would let you query a file as if it were a table.

You could possible try this if your file contains single line (hence the question about file structure):
DECLARE
utlFileHandle UTL_FILE.FILE_TYPE;
vLine varchar2(100);
BEGIN
utlFileHande := UTL_FILE.FOPEN(my_dir, my_file, 'r');
utl_file.get_line(utlFileHande, vLine);
FOR i IN
(SELECT TRIM(regexp_substr(vLine,'[^;]+',1,level) ) AS q
FROM dual
CONNECT BY regexp_substr(vLine,'[^;]+',1,level) IS NOT NULL
ORDER BY level
)
LOOP
dbms_output.put_line(i.q);
END LOOP;
utl_file.fclose(utlFileHande);
EXCEPTION
WHEN No_Data_Found THEN
utl_file.fclose(utlFileHande);
null;
END;

Related

How to transfer data with multiple conditions with pgsql?

I would like to create a trigger with I can check the uploaded data and I can insert into another table as well. In the initial table (capture) I have 15 columns, but I would like to transfer only 5 columns (ring_number, code, year, date, species, location) to another table (ring).
The ring table is a background table in which I am collecting the combinations of ring_number and code, more specifically one ring_number could be paired only with one code. There is one exception, when the code include "X", than it can be changed later, and in this case this code can be paired with more ring_number, and if originally belongs to the ring_number a code with "X" it can be changed later.
In the capture table, could be possible to upload the same combination of code and ring_number multiple times with a condition of a third column. But still the ring_number can be paired only with one code, with exceptions of codes included "X". The name of the conditional column is recapture. If recapture (boolean column type) is "true", then you can upload the combination of code and ring_number again. If it is "empty" or "no" you can upload only new combinations of code and ring_number. If somebody uploads old combinations then the following error message has to raise: this combination already exists, please check your data and if it is a recapture, then set the recapture column to yes.
Additionally: ring_number is a not null column, but code can be empty. And different ring_number can be paired with empty code than later can be paired with actual value.
I have several problems with my code:
1: I would like to define the exception to X with regex, and the X can be anywhere in the code. But can not manage the regex in a good way. It does just not work.
2: I write conditional checkpoint with recapture column and if I have an old combination this is work on the right way and say please set the recapture column to yes. But! If I set the recapture column to yes I get the same error message.
Could you help to solve these issues?
Here is my code:
Declare
a integer := 0;
b integer := 0;
c integer := 0;
d integer := 0;
Begin
IF new.code <> '' THEN
--Az 'a' means whether the given ring_number already exist in the database with a code, which is not empty
SELECT INTO a COUNT(*) FROM plover_captures PC WHERE PC.ring_number = new.ring_number AND PC.code <> new.code AND PC.code <> '' AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'b' means the given code already exist in the database with a ring_number
SELECT INTO b COUNT(*) FROM plover_captures PC WHERE PC.ring_number <> new.ring_number AND PC.code = new.code AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'c' how much times exist the given ring_number with the given code in the database
SELECT INTO c COUNT(*) FROM plover_captures PC WHERE PC.ring_number = new.ring_number AND PC.code = new.code AND PC.code ~ '[X]{2}[\.]{1}[X]{2}[|]{1}[X]{2}[\.]{1}[X]{2}';
--Az 'd' means the given combination already exist in ring table or not
SELECT INTO d COUNT(*) FROM plover_rings PC WHERE PC.ring_number = new.ring_number AND PC.code = new.code;
IF a > 0 THEN
raise exception 'This ring_number is already paired with another code before. %', new.ring_number;
END IF;
IF b > 0 THEN
raise exception 'This code is already paired with another ring_number before. %', new.code;
END IF;
IF c > 0 AND (new.rettrap IS null OR new.rettrap IS false) THEN
raise exception 'This ring_number and code pair is already in the database. So it is a rettrap but the rettrap attribute set to false or null. %, %, %', new.ring_number, new.code, new.rettrap;
END IF;
IF c = 0 AND new.rettrap IS true THEN
raise exception 'The rettrap attribute set to true but this ring_number and code pair is not in this database yet. %, %, %', new.ring_number, new.code, new.rettrap;
END IF;
IF c = 0 AND d = 0 THEN
Insert into plover_rings values(new.ring_number,new.code,new.species,new.location,new.year, new.date);
END IF;
END IF;
Return new;
End

Can you reference an aggregate function on a temporary row in an insert statement within a stored procedure in postgresql?

I am writing a postgres stored procedure that loops through the rows returned from a select statement. For each row it loops through, it inserts values from that select statement into a new table. One of the values I need to insert into the second table is the average of a column. However, when I call the stored procedure, I get an error that the temporary row has no attribute for the actual column that I am averaging. See stored procedure and error below.
Stored Procedure:
create or replace procedure sendToDataset(sentence int)
as $$
declare temprow peoplereviews%rowtype;
BEGIN
FOR temprow IN
select rulereviewid, avg(rulereview)
from peoplereviews
where sentenceid = sentence
group by rulereviewid
loop
insert into TrainingDataSet(sentenceId, sentence, ruleCorrectId, ruleCorrect, dateAdded)
values(sentence, getSentenceFromID(sentence), tempRow.rulereviewid, tempRow.avg(rulereview), current_timestamp);
END LOOP;
END
$$
LANGUAGE plpgsql;
Error:
ERROR: column "rulereview" does not exist
LINE 2: ...omID(sentence), tempRow.rulereviewid, tempRow.avg(rulereview...
^
QUERY: insert into TrainingDataSet(sentenceId, sentence, ruleCorrectId, ruleCorrect, dateAdded)
values(sentence, getSentenceFromID(sentence), tempRow.rulereviewid, tempRow.avg(rulereview), current_timestamp)
CONTEXT: PL/pgSQL function sendtodataset(integer) line 11 at SQL statement
SQL state: 42703
Basically, I am wondering if it's possible to use that aggregate function in the insert statement or not and if not, if there is another way around it.

you don't need to use a slow and inefficient loop for this:
insert into TrainingDataSet(sentenceId, sentence, ruleCorrectId, ruleCorrect, dateAdded)
select getSentenceId(sentence), sentence, rulereviewid, avg(rulereview), current_timestamp
from peoplereviews
where sentenceid = sentence
group by rulereviewid
To answer the original question: you need to provide a proper alias for the aggregate:
FOR temprow IN
select rulereviewid, avg(rulereview) as avg_views
from peoplereviews
where sentenceid = sentence
group by rulereviewid
loop
insert into TrainingDataSet(sentenceId, sentence, ruleCorrectId, ruleCorrect, dateAdded)
values(sentence, getSentenceFromID(sentence), tempRow.rulereviewid,
tempRow.avg_views, current_timestamp);
END LOOP;

PL/SQL optimize searching a date in varchar

I have a table, that contains date field (let it be date s_date) and description field (varchar2(n) desc). What I need is to write a script (or a single query, if possible), that will parse the desc field and if it contains a valid oracle date, then it will cut this date and update the s_date, if it is null.
But there are one more condition - there are must be exactly one occurence of a date in the desc. If there are 0 or >1 - nothing should be updated.
By the time I came up with this pretty ugly solution using regular expressions:
----------------------------------------------
create or replace function to_date_single( p_date_str in varchar2 )
return date
is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
begin
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)((.|\n|\t|\s)*((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d))?';
pResStr := regexp_substr(p_date_str, pRegEx);
if not (length(pResStr) = 10)
then return null;
end if;
l_date := to_date(pResStr, 'dd.mm.yyyy');
return l_date;
exception
when others then return null;
end to_date_single;
----------------------------------------------
update myTable t
set t.s_date = to_date_single(t.desc)
where t.s_date is null;
----------------------------------------------
But it's working extremely slow (more than a second for each record and i need to update about 30000 records). Is it possible to optimize the function somehow? Maybe it is the way to do the thing without regexp? Any other ideas?
Any advice is appreciated :)
EDIT:
OK, maybe it'll be useful for someone. The following regular expression performs check for valid date (DD.MM.YYYY) taking into account the number of days in a month, including the check for leap year:
(((0[1-9]|[12]\d|3[01])\.(0[13578]|1[02])\.((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\.(0[13456789]|1[012])\.((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\.02\.((19|[2-9]\d)\d{2}))|(29\.02\.((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))
I used it with the query, suggested by #David (see accepted answer), but I've tried select instead of update (so it's 1 regexp less per row, because we don't do regexp_substr) just for "benchmarking" purpose.
Numbers probably won't tell much here, cause it all depends on hardware, software and specific DB design, but it took about 2 minutes to select 36K records for me. Update will be slower, but I think It'll still be a reasonable time.

I would refactor it along the lines of a single update query.
Use two regexp_instr() calls in the where clause to find rows for which a first occurrence of the match occurs and a second occurrence does not, and regexp_substr() to pull the matching characters for the update.
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where regexp_instr(desc,pattern,1,1) > 0 and
regexp_instr(desc,pattern,1,2) = 0
You might get even better performance with:
update my_table
set my_date = to_date(regexp_subtr(desc,...),...)
where case regexp_instr(desc,pattern,1,1)
when 0 then 'N'
else case regexp_instr(desc,pattern,1,2)
when 0 then 'Y'
else 'N'
end
end = 'Y'
... as it only evaluates the second regexp if the first is non-zero. The first query might also do that but the optimiser might choose to evaluate the second predicate first because it is an equality condition, under the assumption that it's more selective.
Or reordering the Case expression might be better -- it's a trade-off that's difficult to judge and probably very dependent on the data.

I think there's no way to improve this task. Actually, in order to achieve what you want it should get even slower.
Your regular expression matches text like 31.02.2013, 31.04.2013 outside the range of the month. If you put year in the game,
it gets even worse. 29.02.2012 is valid, but 29.02.2013 is not.
That's why you have to test if the result is a valid date.
Since there isn't a full regular expression for that, you would have to do it by PLSQL really.
In your to_date_single function you return null when a invalid date is found.
But that doesn't mean there won't be other valid dates forward on the text.
So you have to keep trying until you either find two valid dates or hit the end of the text:
create or replace function fn_to_date(p_date_str in varchar2) return date is
l_date date;
pRegEx varchar(150);
pResStr varchar(150);
vn_findings number;
vn_loop number;
begin
vn_findings := 0;
vn_loop := 1;
pRegEx := '((0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](19|20)\d\d)';
loop
pResStr := regexp_substr(p_date_str, pRegEx, 1, vn_loop);
if pResStr is null then exit; end if;
begin
l_date := to_date(pResStr, 'dd.mm.yyyy');
vn_findings := vn_findings + 1;
-- your crazy requirement :)
if vn_findings = 2 then
return null;
end if;
exception when others then
null;
end;
-- you have to keep trying :)
vn_loop := vn_loop + 1;
end loop;
return l_date;
end;
Some tests:
select fn_to_date('xxxx29.02.2012xxxxx') c1 --ok
, fn_to_date('xxxx29.02.2012xxx29.02.2013xxx') c2 --ok, 2nd is invalid
, fn_to_date('xxxx29.02.2012xxx29.02.2016xxx') c2 --null, both are valid
from dual
As you are going to have to do try and error anyway one idea would be to use a simpler regular expression.
Something like \d\d[.]\d\d[.]\d\d\d\d would suffice. That would depend on your data, of course.
Using #David's idea you could filter the ammount of rows to apply your to_date_single function (because it's slow),
but regular expressions alone won't do what you want:
update my_table
set my_date = fn_to_date( )
where regexp_instr(desc,patern,1,1) > 0

Oracle (pl-sql) - how to extract particular data from text into object type variable?

I have text provided as varchar2 variable, for exameple:
EXER/ATP-45/-//
MSGID/BIOCHEM3/-/-/-/-/-//
GEODATUM/UTM//
PAPAA/1KM/-//15KM/-//
So, every line separator is // (but there can be also spaces, new lines etc and they should be ignored). '-' is indicating blank field and should be ignored. I have also defined new object type, defined as follows:
TYPE t_promien IS RECORD(
EXER VARCHAR2,
MSGID VARCHAR2(1000),
PAPAA t_papaa
......
)
I need to extract data from corresponding rows into new variable that has t_promien type and set its field, for example - EXER should has 'ATP-45' value, MSGID should has 'BIOCHEM3', PAPAA should has ('1KM','15KM') value (t_papaa is my custom type too and it contains 2 varchar fields).
What is the best way to do this inside oracle PL-SQL procedure? I need to extract needed data into out parameter. Can I use regex for this (how?) Ufortunatelly, I'm totally newbie with oracle, so...
Can you give me some tips? Thanks.

You can do this with REGEXP_SUBSTR using something like this:
SELECT REGEXP_SUBSTR('EXER/ATP-45/-//
MSGID/BIOCHEM3/-/-/-/-/-//
GEODATUM/UTM//
PAPAA/1KM/-//15KM/-//', 'EXER/[^/]+/', 1, 1) AS EXER
FROM DUAL;
The important bit above is 'EXER/[^/]+/' which is looking for a string that starts with the literal EXER/ followed be a sequence of characters which are not / and ended by a final /.
The above query will return EXER/ATP-45/, but you can use standard string functions like SUBSTR, LTRIM or RTRIM to remove the bits you don't need.

A simple demonstration of the use of REGEXP_SUBSTR in PL/SQL.
CREATE OR REPLACE PROCEDURE TEST_REGEXP_PROC(VAR_PI_MSG IN VARCHAR2,
T_PO_PAPAA OUT T_PROMIEN) AS
VAR_L_EXER VARCHAR2(1000);
VAR_L_MSGID VARCHAR2(1000);
BEGIN
SELECT SUBSTR(REPLACE(REGEXP_SUBSTR(VAR_PI_MSG, 'EXER/[^/]+/', 1, 1),'/'),5)
INTO VAR_L_EXER
FROM DUAL;
T_PO_PAPAA.EXER := VAR_L_EXER;
SELECT SUBSTR(REPLACE(REGEXP_SUBSTR(VAR_PI_MSG, 'MSGID/[^/]+/', 1, 1),'/'),6)
INTO VAR_L_MSGID
FROM DUAL;
T_PO_PAPAA.MSGID := VAR_L_MSGID;
END;
Hope this will get you started.

Delphi - User specified string manipulation

I have a problem in Delphi7. My application creates mpg video files according to a set naming convention i.e.
\000_A_Title_YYYY-MM-DD_HH-mm-ss_Index.mpg
In this filename the following rules are enforced:
The 000 is the video sequence. It is incremented whenever the user presses stop.
The A (or B,C,D) specifies the recording camera - so video files are linked with up to four video streams all played simultaneously.
Title is a variable length string. In my application it cannot contain a _.
The YYYY-MM-DD_HH-mm-ss is the starting time of the video sequence (not the single file)
The Index is the zero based ordering index and is incremented within 1 video sequence. That is, video files are a maximum of 15 minutes long, once this is reached a new video file is started with the same sequence number but next index. Using this, we can calculate the actual start time of the file (Filename decoded time + 15*Index)
Using this method my application can extract the starting time that the video file started recording.
Now we have a further requirement to handle arbitrarily named video files. The only thing i know for certain is there will be a YYYY-MM-DD HH-mm-ss somewhere in the filename.
How can i allow the user to specify the filename convention for the files he is importing? Something like Regular expressions? I understand there must be a pattern to the naming scheme.
So if the user inputs ?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg into a text box, how would i go about getting the start time? Is there a better solution? Or do i just have to handle every single possibility as we come accross them?
(I know this is probably not the best way to handle such a problem, but we cannot change the issue - the new video files are recorded by another company)

I'm not sure if your trying to parse the user input into components '?(Camera)*_YYYY-MM-DD_HH-mm-ss_(Index).mpg` but if your just trying to grab the date and time something like this, the date is in group 1, time in group 2
(\d{4}-\d{2}-\d{2})_(d{2}-\d{2}-\d{2})
Otherwise, not sure what your trying to do.

Possibly you can use the underscores "_" as your positional indicator since you smartly don't allow them in the title.
In your example of a filename convention:
?_(Camera)_*_YYYY-MM-DD_HH-mm-ss_(Index).mpg
you can parse this user-specified string to see that the date YYYY-MM-DD is always between the 3rd and 4th underscore and the time HH-mm-ss is between the 4th and 5th.
Then it becomes a simple matter when getting the actual filenames following this convention, to find the 3rd underscore and know the date and time follow it.

If you want phone-calls 24/7, then you should go for the RegEx-thing and let the user freely enter some cryptography in a TEdit.
If you want happy users and a good night sleep, then be creative and drop the boring RegEx-approach. Create your own filename-decoder by using an Angry bird approach.
Here's the idea:
Create some birds with different string manipulation personalities.
Let the user select and arrange these birds.
Execute the user generated string manipulation.
Sample code:
program AngryBirdFilenameDecoder;
{$APPTYPE CONSOLE}
uses
SysUtils;
procedure PerformEatUntilDash(var aStr: String);
begin
if Pos('-', aStr) > 0 then
Delete(aStr, 1, Pos('-', aStr));
WriteLn(':-{ > ' + aStr);
end;
procedure PerformEatUntilUnderscore(var aStr: String);
begin
if Pos('_', aStr) > 0 then
Delete(aStr, 1, Pos('_', aStr));
WriteLn(':-/ > ' + aStr);
end;
function FetchDate(var aStr: String): String;
begin
Result := Copy(aStr, 1, 10);
Delete(aStr, 1, 10);
WriteLn(':-) > ' + aStr);
end;
var
i: Integer;
FileName: String;
TempFileName: String;
SelectedBirds: String;
MyDate: String;
begin
Write('Enter a filename to decode (eg. ''01-ThisIsAText-Img_01-Date_2011-03-08.png''): ');
ReadLn(FileName);
if FileName = '' then
FileName := '01-ThisIsAText-Img_01-Date_2011-03-08.png';
repeat
TempFileName := FileName;
WriteLn('Now, select some birds:');
WriteLn('Bird No.1 :-{ ==> I''ll eat letters until I find a dash (-)');
WriteLn('Bird No.2 :-/ ==> I''ll eat letters until I find a underscore (_)');
WriteLn('Bird No.3 :-) ==> I''ll remember the date before I eat it');
WriteLn;
Write('Chose your birds: (eg. 112123):');
ReadLn(SelectedBirds);
if SelectedBirds = '' then
SelectedBirds := '112123';
for i := 1 to Length(SelectedBirds) do
case SelectedBirds[i] of
'1': PerformEatUntilDash(TempFileName);
'2': PerformEatUntilUnderscore(TempFileName);
'3': MyDate := FetchDate(TempFileName);
end;
WriteLn('Bird No.3 found this date: ' + MyDate);
WriteLn;
WriteLn;
Write('Check filename with some other birds? (Y/N): ');
ReadLn(SelectedBirds);
until (Length(SelectedBirds)=0) or (Uppercase(SelectedBirds[1])<>'Y');
end.
When you'll do this in Delphi with GUI, you'll add more birds and more checking of course. And find some nice bird glyphs.
Use two list boxes. One one the left with all possible birds, and one on the right with all the selected birds. Drag'n'drop birds from left to right. Rearrange (and remove) birds in the list on the right.
The user should be able to test the setup by entering a filename and see the result of the process. Internally you store the script by using enumerators etc.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js