Oracle Regexp - replace surrounding backets, but not inner string - regex

I'm trying to take the following string...
v VARCHAR2(100) := '<LABEL> FOR ABC';
And turn it into...
'LABEL FOR ABC'
Which seemed easy, but I can't get this done in one regexp statement. The <...> block will always start the string.
For example, my best attempt....
v := REGEXP_SUBSTR(v, '[^\<][A-Z]+[^\>][A-Z ]+')
Yields 'LABEL'.
I did manage to get this working by doing the following...
v := REGEXP_REPLACE(v, '^\<');
v := REGEXP_REPLACE(v, '\>', 1, 1);
But I was wondering if this would be possible to be done in a single regexp statement.
Thanks
Edit: I forgot to mention, I only want to remove the leading <...>. If the string was something like
<LABEL> FOR ABC <LEAVE ME>
I would want it to be...
LABEL FOR ABC <LEAVE ME>

You can use REGEXP_REPLACE('<LABEL> FOR ABC <LEAVE ME>','^<(.*?)>','\1').
Pattern:
^< --matches '<' at the beginning of the string.
.*? --non greedy quantifier to match 0 or more characters within '<' and '>'.
--This is enclosed in brackets to form first capture group.
--This is then used in replace_string as \1.
> --Matches the first '>' after the first '<'.
Example:
SELECT '<LABEL> FOR ABC <LEAVE ME>' str,
REGEXP_REPLACE ('<LABEL> FOR ABC <LEAVE ME>', '^<(.*?)>', '\1') replaced_str
FROM DUAL;
str replaced_str
----------------------------------------------------
<LABEL> FOR ABC <LEAVE ME> LABEL FOR ABC <LEAVE ME>

You just want to replace all instances of <and >with an empty string? Then this:
SELECT
REGEXP_REPLACE('<LABEL> blah', '[<>]', '') FROM DUAL

Related

Postgresql replace all occurrences of string+

I have this string:
this is the abcd xxx
string I want to abcd yyy
replace in my text abcd zzz
Now I want to replace abcd and anything after it with blank.
I want this result:
this is the
string I want to
replace in my text
I tried:
select regexp_replace(str, 'abcd.*','','gi')
But it just removed everything after the first match. Also other combos without luck.
What am I missing?
Thanks!
Use the flag n (newline-sensitive matching) in regexp_replace():
with my_table(str) as (
values(
'this is the abcd xxx
string I want to abcd yyy
replace in my text abcd zzz')
)
select regexp_replace(str, 'abcd.*','','gin')
from my_table
regexp_replace
-----------------
this is the +
string I want to +
replace in my text
(1 row)

R regular expression to obtain all text before the second underscore

s <- "1-343-43Hello_2_323.14_fdh-99H"
In R I want to use a regex to get the substring before the, say 2nd, underscore. How can this be done with one regex ? The alternative would be to split by '_' and then paste the first two - something along;
paste(sapply(strsplit(s, "_"),"[", 1:2), collapse = "_")
Gives:
[1] "1-343-43Hello_2"
But how can I make a regex expression to do the same ?
In general, for answering to the question in title, is
sub("^(([^_]*_){n}[^_]*).*", "\\1", s)
where n is the number of _ you are allowing.
You can use a sub:
sub("^([^_]*_[^_]*).*", "\\1", s)
See the regex demo
R code demo:
s <- "1-343-43Hello_2_323.14_fdh-99H"
sub("^([^_]*_[^_]*).*", "\\1", s)
## => [1] "1-343-43Hello_2"
Pattern details:
^ - start of string
([^_]*_[^_]*) - Group 1 capturing 0+ characters other than _, then a _ and again 0+ non-_s
.* - rest of the string (note that the TRE regex . matches newlines, too).
The \\1 replacement only returns the value inside Group 1.
echo preg_replace("/([^_])_([^_]).*/" , "$1_$2" , "1-343-43Hello_2_323.14_fdh-99H");
Or if you are looking for just matching int /^[^]*[^_]*/ would be the regex string to match it
<?php
echo preg_match("/^[^_]*_[^_]*/" , "1-343-43Hello_2_323.14_fdh-99H" , $test );
var_dump( $test );
?>
or in javascript
"1-343-43Hello_2_323.14_fdh-99H".match(/^[^_]*_[^_]*/);
sub('\\_\\d+\\..*$','',s)
#[1] "1-343-43Hello_2"
here it is with gsub (in a data.table), in case you need perl=TRUE, (fx look-ahead and look-behind), which does not work in str_match, unfortunately
dtx[, var_stringr := stringr::str_match(string, '([^_]+)(?:_[^_]+){5}$')[,2]][]
dtx[
# first select the ones with '_' so that the third element is NA
grepl('_', string),
var_gsub := sub('(.*_)([^_]+)(_[^_]+){5}$', '\\2', string)][]
The disadvantage of this method is that if you select a number higher than the nth occurence, instead of giving back NA like str_match, it gives back the whole string.

Need to form pattern for regexp_replace

I have input string something like :
1.2.3.4_abc_4.2.1.44_1.3.4.23
100.11.11.22_xyz-abd_10.2.1.2_12.2.3.4
100.11.11.22_xyz_123_10.2.1.2_1.2.3.4
I have to replace the first string formed between two ipaddress which are separated by _, however in some string the _ is part of the replacement string (xyz_123)
I have to find the abc, xyz-abd and xyz_123 from the above string, so that I can replace with another column in that table.
_.*?_(?=\d+\.)
matches _abc_, _xyz-abd_ and _xyz_123_ in your examples. Is this working for you?
DECLARE
result VARCHAR2(255);
BEGIN
result := REGEXP_REPLACE(subject, $$_.*?_(?=\d+\.)$$, $$_foo_$$);
END;
Probably this is enough:
_[^.]+_
and replace with
_Replacement_
See it here on Regexr.
[^.]+ uses a negated character class to match a sequence of at least one (the + quantifier) non "." characters.
I am also matching a leading and a trailing "_", so you have to put it in again in the replacement string.
If PostgreSQL supports lookbehind and lookahead assertions, then it is possible to avoid the "_" in the replacement string:
(?<=_)[^.]+(?=_)
See it on Regexr
In order to map match first two "" , as #stema and #Tim Pietzcker mentioned the regex works. Then in order to append "" to the column , which is what I was struggling with, can be done with || operator as eg below
update table1 set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_')
Then for using the another table for update query , the below eg can be helpfull
update table1 as t set column1=regexp_replace(column1,'.*?(?=\d+.)','' || column2 || '_') from table2 as t2 where t.id=t2.id [other criteria]

R regex: specifying output selections from wider string matches

One for the regex enthusiasts. I have a vector of strings in the format:
<TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Verdana" STYLE="font-size: 10px" size="10" COLOR="#FF0000" LETTERSPACING="0" KERNING="0">Desired output string containing any symbols</FONT></P></TEXTFORMAT>
I'm aware of the perils of parsing this sort of stuff with regex. It would however be useful to know how to efficiently extract an output sub-string of a larger string match - i.e. the contents of angle quotes >...< of the font tag. The best I can do is:
require(stringr)
strng = str_extract(strng, "<FONT.*FONT>") # select font statement
strng = str_extract(strng, ">.*<") # select inside tags
strng = str_extract(strng, "[^/</>]+") # remove angle quote symbols
What would be the simplest formula to achieve this in R?
Use str_match, not str_extract (or maybe str_match_all). Wrap the part that you want to extract match in parentheses.
str_match(strng, "<FONT[^<>]*>([^<>]*)</FONT>")
Or parse the document and extract the contents that way.
library(XML)
doc <- htmlParse(strng)
fonts <- xpathSApply(doc, "//font")
sapply(fonts, function(x) as(xmlChildren(x)$text, "character"))
As agstudy mentioned, xpathSApply takes a function argument that makes things easier.
xpathSApply(doc, "//font", xmlValue)
You can also do it with gsub but I think there are too many permutations to your input vector that may cause this to break...
gsub( "^.*(?<=>)(.*)(?=</FONT>).*$" , "\\1" , x , perl = TRUE )
#[1] "Desired output string containing any symbols"
Explanation
^.* - match any characters from the start of the string
(?<=>) - positive lookbehind zero-width assertion where the subsequent match will only work if it is preceeded by this, i.e. a >
(.*) - then match any characters (this is now a numbered capture group)...
(?=</FONT>) - ...until you match "</FONT>"
.*$ - then match any characters to the end of the string
In the replacement we replace all matched stuff by numbered capture group \\1, and there is only one capture group which is everything between > and </FONT>.
Use at your peril.

how to replace parts of search string in R

string <- c("tyuynmklabcwsqzp")
If my task is to substitute every "abc" with "abc123", the code is,
gsub("abc", "\\1123", string)
But, if I have to search for "abc" and then replace it with "c123", then how should I do it? Is there a way to divide the regular expression into parts so that I can have \2 like \1?
If it's possible, then my command would be,
gsub("abc", "\\2123", string).
Please help.
You can use parentheses to group together parts of a regular expression, subsequently applying a repetition operator or backreference to the matched group.
In your case, try this:
string <- c("tyuynmklabcwsqzp")
gsub("(ab)(c)", "\\2123", string)
# [1] "tyuynmklc123wsqzp"
Try to use
gsub("(abc)", "\\1123", string) # abc → abc123
and
gsub("ab(c)", "\\1123", string) # abc → c123