Oracle regular expression catching specific character - regex

here is what I am trying to do for more understanding i just wanna find away to get the substring and put it into variable
DECLARE
v_file_type thufitab.file_type%TYPE;
v_filename thufitab.filename%TYPE;
v_status thufitab.status%TYPE;
V_seq_FILENAME NUMBER (4);
CURSOR List_FILENAME_cur
IS
SELECT FILENAME
FROM thufitab
WHERE status = 2 AND ROWNUM <= 100;
BEGIN
FOR List_FILENAME_rec IN List_FILENAME_cur
LOOP
SELECT REGEXP_SUBSTR (FILENAME, '([1-9][0-9]{0,3})')
INTO V_seq_FILENAME
FROM thufitab;
DBMS_OUTPUT.PUT_LINE (V_seq_FILENAME);
END LOOP;
END;

Not sure I understand well, but, is this ok for you?
'CDR-([1-9][0-9]{0,3})_[0-9]{2}_[0-9]{2}_[0-9]{2}_[0-9]{4}_UK1\.FCDR'
^_______________^
group 1

Related

Using PRXNEXT to capture all instances of a keyword

I'm searching through medical notes to capture all instances of a phrase, in particular 'carbapenemase producing'. At times this phrasing can occur > 1 time in a string. From some research I think PRXNEXT would make the most sense but I'm having difficulty getting it to do what I want to. As an example for this string:
if amikacin results are needed please notify microbiology lab at ext
for further testing the organism will be held until meropenem result
obtained by disc diffusion presumptive carbapenemase producing cre see
spmi for carba r pcr results not confirmed carbapenemase producing cre
From this comment above, I'd like to extract the phrases
presumptive carbapenemase producing
and
not confirmed carbapenemase producing
I realize I can't extract, I don't think, those exact phrases but some variation of it with a substring. The code i've been using I found here. Here's what I have thus far but it's only capturing the 1st phrase:
carba_cnt = count(as_comments,'carba','i');
if _n_ = 1 then do;
retain reg1 neg1;
reg1 = prxparse("/ca[bepr]\w+ prod/");
end;
start = 1;
stop = length(as_comments);
position = 0;
length = 0;
/* Use PRXNEXT to find the first instance of the pattern, */
/* then use DO WHILE to find all further instances. */
/* PRXNEXT changes the start parameter so that searching */
/* begins again after the last match. */
call prxnext(reg1, start, stop, as_comments, position, length);
lastpos = 0;
do while (position > 0);
if lastpos then do;
length found $200;
found = substr(as_comments,lastpos,position-lastpos);
put found=;
output;
end;
lastpos = position;
call prxnext(reg1, start, stop, as_comments, position, length);
end;
if lastpos then do;
found = substr(as_comments,lastpos);
put found=;
output;
end;
You are correct to use PRXNEXT for locating each occurrence of a regex match in a source. The regex pattern can be modified to use a group capture to search for an optional leading "not confirmed". The scenario for the least likely 'coder fail' is to focus loop and extract around a single call to PRXNEXT.
This example uses pattern /((not confirmed\s*)?(ca[bepr]\w+ prod)) and outputs one row per match.
data have;
id + 1;
length comment $2000;
infile datalines eof=done;
do until (_infile_ = '----');
input;
if _infile_ ne '----' then
comment = catx(' ',comment,_infile_);
end;
done:
if not missing(comment);
datalines4;
if amikacin results are needed please notify microbiology lab at ext
for further testing the organism will be held until meropenem result
obtained by disc diffusion presumptive carbapenemase producing cre
see spmi for carba r pcr results not confirmed carbapenemase producing cre
----
if amikacin results are needed please notify microbiology lab at ext
for further testing the organism will be held until meropenem result
obtained by disc diffusion conjectured carbapenems producing cre
see spmi for carba r pcr results not confirmed carbapenemase producing cre
----
;;;;
run;
data want;
set have;
prx = prxparse('/((not confirmed\s*)?(ca[bepr]\w+ prod))/');
_start_inout = 1;
do hitnum = 1 by 1 until (pos=0);
call prxnext (prx, _start_inout, length(comment), comment, pos, len);
if len then do;
content = substr(comment,pos,len);
output;
end;
end;
keep id hitnum content;
run;
Bonus info: The prxparse does not need to be inside an if _n_=1 block. See PRXPARSE docs
If perl-regular-expression is a constant or if it uses the /o option, the Perl regular expression is compiled only once. Successive calls to PRXPARSE do not cause a recompile, but returns the regular-expression-id for the regular expression that was already compiled. This behavior simplifies the code because you do not need to use an initialization block (IF _N_ = 1) to initialize Perl regular expressions.

PostgreSQL return an Array or Record as a Row

I'm trying to return a variable with a PostgreSQL function that returns row/rows so I can use libpqxx on the client side to iterate over it for example using:
for (pqxx::result::const_iterator row = result.begin(); row != result.end(); row++)
{
for (pqxx::const_row_iterator field = row.begin(); field != row.end(); field++)
{
cout << field << '\n';
}
}
This is my PostgresSQL function:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING) RETURNS RECORD AS $outputVar$
DECLARE
tempVar1 CHARACTER VARYING;
tempVar2 CHARACTER VARYING;
outputVar1 TEXT[];
outputVar record;
sealArray TEXT[];
execTime NUMERIC[];
BEGIN
FOR i IN 1..2 LOOP
SELECT "Pickup_longitude", "Dropoff_longitude" INTO tempVar1, tempVar2 FROM public.nyc2015_09_enc WHERE id=i;
sealArray := (SELECT public.seal_diff_benchmark(tempVar1, tempVar2, sealparams));
outputVar1[i] := sealArray[1];
execTime[i] := sealArray[2];
END LOOP;
SELECT UNNEST(outputVar1) INTO outputVAR;
RETURN outputVar;
END;
$outputVar$ LANGUAGE plpgsql;
I also tried returning outputVar1 as TEXT[]. My field variable on the client side holds {foo, bar} if I use returns TEXT[] or (foo) if I use returns RECORD. But this is not what I need, which is a row like return from a TEXT[] array or a RECORD variable without any (), [], {} chars at the beginning and at the end of the output.
How can I change my PostgreSQL function to make it work? I think I'm missing something but I can't see what.
There are many approaches to do what you want.
If it really is just one column that you want, then you can simply do:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING)
RETURNS SETOF TEXT AS $outputVar$
DECLARE
tempVar1 CHARACTER VARYING;
tempVar2 CHARACTER VARYING;
sealArray TEXT[];
execTime NUMERIC[];
outputVar text;
BEGIN
FOR i IN 1..2 LOOP
SELECT "Pickup_longitude", "Dropoff_longitude" INTO tempVar1, tempVar2
FROM public.nyc2015_09_enc WHERE id=i;
sealArray := (SELECT public.seal_diff_benchmark(tempVar1, tempVar2, sealparams));
execTime[i] := sealArray[2];
FOREACH outputVar IN ARRAY sealArray[1] LOOP --iterate over that text array
RETURN NEXT outputVar;
END LOOP;
END LOOP;
END;
$outputVar$ LANGUAGE plpgsql;
Returned colum will be named just like the function.
SELECT seal_diff_benchmark_pgsql FROM seal_diff_benchmark_pgsql('stuff');
-- alternative
SELECT seal_diff_benchmark_pgsql('stuff');
You can also specify columns in function parameters:
CREATE OR REPLACE FUNCTION seal_diff_benchmark_pgsql(sealparams CHARACTER VARYING, OUT outputVar text)
Then returned column will be named outputVar. In case of returning just one column, Postgres forces RETURNS to be of that column type, so in this case SETOF TEXT or just TEXT if one row is expected. If you return more than one column, then you need to use RETURNS SETOF RECORD.
When you use named columns in function parameters, then you need to assign values to them just like you would to variables from DECLARE section:
LOOP
outputVar := 'some value';
outputVar2 := 'some value';
outputVar3 := 'some value';
RETURN NEXT;
END LOOP;
There are a few other examples on how to return sets from functions in my old answer here: How to return rows of query result in PostgreSQL's function?

Regular Expression - Match all between BEGIN and END

I'm trying to find out how to make a regular expression to match all text between the first "BEGIN" and the last "END" of a procedure block.
Here's the text which I want to filter:
PROCEDURE MyFirstFunction()#12345
VAR
TESTVAR#1 : Record 1;
TESTVAR#2 : Record 2;
BEGIN
// Here begins the code
IF 1 = 1 THEN BEGIN
IF 2 <> 1 THEN BEGIN
MESSAGE('2 is not equal to 1');
END;
MESSAGE('1 is equal to 1');
END;
END;
PROCEDURE MySecondFunction()#123456
VAR
TESTVAR#1 : Record 1;
TESTVAR#2 : Record 2;
BEGIN
// Here begins the code
IF 1 = 1 THEN BEGIN
IF 2 <> 1 THEN BEGIN
MESSAGE('2 is not equal to 1');
END;
MESSAGE('1 is equal to 1');
END;
END;
PROCEDURE MyThirdFunction()#123457
VAR
TESTVAR#1 : Record 1;
TESTVAR#2 : Record 2;
BEGIN
// Here begins the code
IF 1 = 1 THEN BEGIN
IF 2 <> 1 THEN BEGIN
MESSAGE('2 is not equal to 1');
END;
MESSAGE('1 is equal to 1');
END;
END;
I already tried it with a recursive regular expression, but this didn't work.
Here's the regular expression I worked on:
BEGIN(((?!BEGIN|END;).)|(?R))*END;
But I only get the second beginning of the first function.
Here's the link to regex101.com to test the regular expression:
https://regex101.com/r/ZoBm6h/1
I think the logic you want for the negative lookahead is that it should greedily consume everything after BEGIN until hitting the last END, provided that it also does not see the text PROCEDURE, which would mean that it's gone too far and has entered into the next procedure block.
BEGIN((?!PROCEDURE).)*END;
Demo
If you want to match all blocks, you can also use this Regex :
BEGIN((?!^(?!PROCEDURE)$).)*END

SAS: No matching %MACRO statement

I am following a published method to identify matched cases. I am getting the following error
ERROR: No matching %MACRO statement for this %MEND statement.
WARNING: Apparent invocation of macro MATCH not resolved.
137 %MEND MATCH;
138
139 %MATCH (g.ps_match,Match4,scase4,scontrol4, abuser, 0.0001);
_
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
How do I correctly call the macro?
I am using SAS University Edition.
The method is from
http://www2.sas.com/proceedings/sugi25/25/po/25p225.pdf
Part 2: Perform the Match
The next part of the macro program performs the match and
outputs the matched pairs. First, the cases data set is
selected. Curob is used to keep track of the current case.
Matchto is used to identify matched pairs of cases and
controls. Start and oldi are initialized to control processing of
the controls data set DO loop.
data &lib..&matched.
(drop=Cmatch randnum aprob cprob start
oldi curctrl matched);
set &lib..&SCase. ;
curob + 1;
matchto = curob;
if curob = 1 then do;
start = 1;
oldi = 1;
end;
Next, the controls data set is selected. Processing starts at
the first unmatched observation. The data set is searched
until a match is found, or it is determined no match can be
made. Error checking is performed to avoid an infinite loop.
Curctrl is used to keep track of current control.
DO i = start to n;
set &lib..&Scontrol. point = i nobs = n;
if i gt n then goto startovr;
if _Error_ = 1 then abort;
curctrl = i;
If the propensity score of the current case (aprob) matches the
propensity score of the current control (cprob), then a match
was found. Update Cmatch to 1=Yes. Output the control.
Update matched to keep track of last matched control. Exit
the DO loop. If the propensity score of the current control is
greater than the propensity score of the current case, then no
match will be found for the current case. Stop the DO loop
processing.
if aprob = cprob then
do;
Cmatch = 1;
output &lib..&matched.;
matched = curctrl;
goto found;
end;
else if cprob gt aprob then
goto nextcase;
startovr: if i gt n then
goto nextcase;
END;
/* end of DO LOOP */
nextcase:
if Cmatch=0 then start = oldi;
found:
if Cmatch = 1 then do;
oldi = matched + 1;
start = matched + 1;
set &lib..&SCase. point = curob;
output &lib..&matched.;
end;
retain oldi start;
if _Error_=1 then _Error_=0;
run;
%MEND MATCH;
MACRO MATCH CALL STATEMENT
The following are call statements to the macro
program MATCH. The first performs a 4-digit match;
the second performs a 3-digit match.
%MATCH(STUDY,Propen,Match4,SCase4,
SContrl4,Interven,.0001);
%MATCH(STUDY,Propen,Match3,SCase3,
SContrl3,Interven,.001);
Presumably, you didn't include the beginning of the macro (i.e., the %MACRO MATCH(... portion, earlier in the paper). This is a macro, it's not intended to be run in pieces the way it's written - you need to include all of the code from %MACRO MATCH to %MEND and then the calls.

Regex: How to remove English words from sentences using Regex?

I've number of rows in SQLite, each row has one column that contains data like this:
prosperکامیاب شدن ، موفق شدن ، رونق یافتن
As you can see, the sentence starts with English words, Now I want to remove English words at first of each sentence. Is there any way to do that via T-SQL query(using Regex)?
you may try this :) I have made it as a function to call upon
create function dbo.RemoveEngChars (#Unicode_string nvarchar(max))
returns nvarchar(max) as
begin
declare #i int = 1; -- must start from 1, as SubString is 1-based
declare #OriginalString nvarchar(100) = #Unicode_string collate SQL_Latin1_General_Cp1256_CS_AS
declare #ModifiedString nvarchar(100) = N'';
while #i <= Len(#OriginalString)
begin
if SubString(#OriginalString, #i, 1) not like '[a-Z]'
begin
set #ModifiedString = #ModifiedString + SubString(#OriginalString, #i, 1);
end
set #i = #i + 1;
end
return #ModifiedString
end
--To call the function , you can run the following script and pass the Unicode in N' prefix
select dbo.RemoveEngChars(N'prosperکامیاب شدن ، موفق شدن ، رونق یافتن')