Postgresql replace all occurrences of string+

Postgresql replace all occurrences of string+ - regex

I have this string:
this is the abcd xxx
string I want to abcd yyy
replace in my text abcd zzz
Now I want to replace abcd and anything after it with blank.
I want this result:
this is the
string I want to
replace in my text
I tried:
select regexp_replace(str, 'abcd.*','','gi')
But it just removed everything after the first match. Also other combos without luck.
What am I missing?
Thanks!

Use the flag n (newline-sensitive matching) in regexp_replace():
with my_table(str) as (
values(
'this is the abcd xxx
string I want to abcd yyy
replace in my text abcd zzz')
)
select regexp_replace(str, 'abcd.*','','gin')
from my_table
regexp_replace
-----------------
this is the +
string I want to +
replace in my text
(1 row)

Related

Strange behaviour of Regexp_replace in a Hive SQL query

I have some input information where I'm trying to remove the part .0 from my input where an ID string ends with .0.
select student_id, regexp_replace(student_id, '.0','') from school_result.credit_records where student_id like '%.0';
Input:
01-0230984.03
12345098.0
34567.0
Expected output:
01-0230984.03
12345098
34567
But the result I'm getting is as follows: It's removing any character having with a 0 next to it instead of removing only the occurrences that end with .0
0129843
123498
34567
What am I doing wrong? Can someone please help?

Dot in regexp has special meaning (it means any character). If you need dot (.) literally, it should be shielded using double-slash (in Hive). Also add end-of-the-line anchor($):
with mydata as (
select stack(3,
'01-0230984.03',
'12345098.0',
'34567.0'
) as str
)
select regexp_replace(str,'\\.0$','') from mydata;
Result:
01-0230984.03
12345098
34567
Regexp '\\.0$' means dot zero (.0) literally, end of the line ($).

Match the string using inputs regex

I have a column which has values like :
col1
ABB
CDD
EFF
GHH
IJJ
KLL
If I input A,D then it should return
ABB
CDD
On inputing J,K it should return
IJJ
KLL
I'm trying to do this using Regex

If you have to use regex, remove commas, and add square brackets outside the input to construct a search expression:
A,D ---> [AD]
J,K ---> [JK]
If this expression matches anywhere in the string from your list, add the matched string to the output.

querydsl - remove whitespace in the middle of a COLUMN

Is there a way to do this query in querydsl?
SELECT *
FROM table
WHERE replace(column_name, ' ', '') = 'someValue';
The StringPath from the QClass has no .replace() function and it's necessary for some characters (specifically, spaces in the middle) to be removed from column_name before testing it with someValue.
Sample column_name contents: ABC, DEF, AB *
If someValue is ABC, ABC and AB* should appear.

You can express the replace invocation via
Expressions.stringTemplate("replace({0},' ','')", columnPath)

Oracle Regexp - replace surrounding backets, but not inner string

I'm trying to take the following string...
v VARCHAR2(100) := '<LABEL> FOR ABC';
And turn it into...
'LABEL FOR ABC'
Which seemed easy, but I can't get this done in one regexp statement. The <...> block will always start the string.
For example, my best attempt....
v := REGEXP_SUBSTR(v, '[^\<][A-Z]+[^\>][A-Z ]+')
Yields 'LABEL'.
I did manage to get this working by doing the following...
v := REGEXP_REPLACE(v, '^\<');
v := REGEXP_REPLACE(v, '\>', 1, 1);
But I was wondering if this would be possible to be done in a single regexp statement.
Thanks
Edit: I forgot to mention, I only want to remove the leading <...>. If the string was something like
<LABEL> FOR ABC <LEAVE ME>
I would want it to be...
LABEL FOR ABC <LEAVE ME>

You can use REGEXP_REPLACE('<LABEL> FOR ABC <LEAVE ME>','^<(.*?)>','\1').
Pattern:
^< --matches '<' at the beginning of the string.
.*? --non greedy quantifier to match 0 or more characters within '<' and '>'.
--This is enclosed in brackets to form first capture group.
--This is then used in replace_string as \1.
> --Matches the first '>' after the first '<'.
Example:
SELECT '<LABEL> FOR ABC <LEAVE ME>' str,
REGEXP_REPLACE ('<LABEL> FOR ABC <LEAVE ME>', '^<(.*?)>', '\1') replaced_str
FROM DUAL;
str replaced_str
----------------------------------------------------
<LABEL> FOR ABC <LEAVE ME> LABEL FOR ABC <LEAVE ME>

You just want to replace all instances of <and >with an empty string? Then this:
SELECT
REGEXP_REPLACE('<LABEL> blah', '[<>]', '') FROM DUAL

How to make regular expression correctly?

I need to get data from third-occurrence position of "*" to 4th. I do so:
with t as (select 'T*76031*12558*test*received percents' as txt from dual)
select regexp_replace(txt, '.*(.{4})[*][^*].*$', '\1')
from t
I receive "test" - it's right, but how to get any number of characters, not just 4?

This should work given the example you have used:
REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
So the SELECT would be:
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_REPLACE( txt, '(^.*\*.*\*.*\*)([[:alnum:]]*)(\*.*$)', '\2')
FROM t;
The regex looks for:
Group 1:
start of string. Any number of characters up to a ''. Any further characters up mto another ''. Any further characters up to the third '*'.
Group 2:
Any alphanumeric characters
Group 3:
A '*' followed by any other characters up to the end of the string.
Replace all of the above with whatever was found in Group 2.
Hope this helps.
EDIT:
Following on from a great answer from another thread by Rob van Wijk here:
Exracting substring from given string
WITH t
AS (SELECT 'T*76031*12558*test*received percents' AS txt FROM DUAL)
SELECT REGEXP_SUBSTR( txt,'[^\*]+',1,4)
FROM t;

How about the following?
^([^*]*[*]){3}([^*]*)
The first part matches 3 groups of * and the second part matches everything until the next * or end of line.

You are assuming that the last * of your text is also the fourth. If this assumption is true then this :
\b\w*\b(?=\*[^*]*$)
Will get you what you want. But of course this only matches the last word between * before the last star. It only matches test in this case or whatever word characters are inside the *.

Note: 10g REGEXP_SUBSTR doesn't support returning subexpressions, see comments below.
If you are really only selecting a part of the string I recommend using REGEXP_SUBSTR instead. I don't know if it's more efficient, but it will better document your intent:
SQL> select regexp_substr('T*76031*12558*test*received percents',
'^([^*]*[*]){3}([^*]*)', 1, 1, '', 2) from dual;
REGEXP_SUBST
------------
test
Above I have used regexp provided by Pieter-Bas.
See also http://www.regular-expressions.info/oracle.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Postgresql replace all occurrences of string+ - regex

Related

Strange behaviour of Regexp_replace in a Hive SQL query

Match the string using inputs regex

querydsl - remove whitespace in the middle of a COLUMN

Oracle Regexp - replace surrounding backets, but not inner string

How to make regular expression correctly?

Categories

Resources