psql uppercase backreferenced string in regexp_replace

psql uppercase backreferenced string in regexp_replace - regex

I have a string, that was previously processed with initcap(), and I wanto to uppercase part of it.
To be specific - I want to uppercase basic roman digits that might occur.
To be even more specific I'd like to replace
Jana Iii Sobieskiego
to
Jana III Sobieskiego
I suppose I could use some kind of a upper-substring-subquery combo to achieve it, but I am trying to make it work in a single regexp_replace, like this:
SELECT
ulica
--, regexp_matches(ulica , '((^|\s)([XxIiVv]+)(\s|$))', 'g')
, regexp_replace(ulica, '((^|\s)([XxIiVv]+)(\s|$))', '\2'||upper('q\3q')||'\4' , 'g')
FROM (
SELECT unnest(ARRAY['Jana Iii Sobieskiego', 'Xx Lecia', 'Xxx Lecia Panowania Zygmunta Iii Wazy'])::text AS ulica
) AS src
What happens, is that upper works on the 'static' part of replacement string (q...q), but not on the backreference.
I get
Jana QIiiQ Sobieskiego
Anyone has an idea how to do this?
PostgreSQL 9.1

SHORT ANSWER
Unfortunately, what you have tried is not possible with regexp_replace.
LONG ANSWER
INTRO
This line
regexp_replace(ulica, '((^|\s)([XxIiVv]+)(\s|$))', '\2'||upper('q\3q')||'\4' , 'g')
is equivalent to
regexp_replace(ulica, '((^|\s)([XxIiVv]+)(\s|$))', '\2Q\3Q\4' , 'g')
As you can see, regexp_replace won't upper case any backreferences.
WORKAROUND
You can create your own function that take an ulica as parameter and returns ulica with basic roman digits uppercased.
Step 1
In a first step this function would mark (I choose $$ as a marker but you can use any.) the part of ulica to be uppercased like this:
regexp_replace(ulica, '((^|\s)([XxIiVv]+)(\s|$))', '\2$$\3$$\4' , 'g')
Step 2
In a second step, go through the resulting string and uppercase each char located between two markers.

Related

Oracle regex and replace

I have varchar field in the database that contains text. I need to replace every occurrence of a any 2 letter + 8 digits string to a link, such as VA12345678 will return /cs/page.asp?id=VA12345678
I have a regex that replaces the string but how can I replace it with a string where part of it is the string itself?
SELECT REGEXP_REPLACE ('test PI20099742', '[A-Z]{2}[0-9]{8}$', 'link to replace with')
FROM dual;
I can have more than one of these strings in one varchar field and ideally I would like to have them replaced in one statement instead of a loop.

As mathguy had said, you can use backreferences for your use case. Try a query like this one.
SELECT REGEXP_REPLACE ('test PI20099742', '([A-Z]{2}[0-9]{8})', '/cs/page.asp?id=\1')
FROM DUAL;

For such cases, you may want to keep the "text to add" somewhere at the top of the query, so that if you ever need to change it, you don't have to hunt for it.
You can do that with a with clause, as shown below. I also put some input data for testing in the with clause, but you should remove that and reference your actual table in your query.
I used the [:alpha:] character class, to match all letters - upper or lower case, accented or not, etc. [A-Z] will work until it doesn't.
with
text_to_add (link) as (
select '/cs/page.asp?id=' from dual
)
, sample_strings (str) as (
select 'test VA12398403 and PI83048203 to PT3904' from dual
)
select regexp_replace(str, '([[:alpha:]]{2}\d{8})', link || '\1')
as str_with_links
from sample_strings cross join text_to_add
;
STR_WITH_LINKS
------------------------------------------------------------------------
test /cs/page.asp?id=VA12398403 and /cs/page.asp?id=PI83048203 to PT3904

Oracle 11g - REGEXP_REPLACE - Subexpressions/different matches

SQLFiddle: http://sqlfiddle.com/#!4/db1bd/49/0
I'm working on a query that returns an object's DN:(cn=name,ou=folder,dc=hostname,dc=com)
My goal is to return this information in a "prettier" output akin to AD:(name\folder\hostname.com)
I've accomplished this in a clunky way:
REGEXP_REPLACE(REGEXP_REPLACE(TEST, '.*CN=(.+?),DC=.*', '\1', 1, 1, 'i'), ',OU=', '\', 1, 0, 'i') -- grab everything between CN= and DC=, replace with \'s --
|| '\' ||
REGEXP_REPLACE(SUBSTR(TEST, REGEXP_INSTR(TEST, ',DC=', 1, 1, 0, 'i')+4),',DC=','.', 1, 0, 'i') -- grab everything after DC=, replace with .'s --
While that works I'm not thrilled with how overly complicated it is (and that it involves having to stitch two regex'd strings together).
I started clean and realized I was doing too much to get what I wanted and my starting point is now here:
REGEXP_REPLACE(test, '(,?(cn=|ou=)(.+?),)', '\3\')
I think I have a good understanding of how this one works but if I add an additional (...) it breaks what I already have working and returns the entire string. I've read that Oracle's regex engine is not as advanced as some others, but I'm struggling to grasp the order of how things are evaluated.
Example Input (can have multiple OUs/DCs):
cn=name,ou=subgroup,ou=group,dc=accounts,dc=hostname,dc=com
cn=name,ou=group,dc=hostname,dc=com
Expected Output
name\subgroup\group\accounts.hostname.com
name\group\hostname.com
The data coming in is dynamic and never a set number of OUs or DCs.

You may use
SELECT REPLACE(
REGEXP_REPLACE(
test,
'(^|,)(cn|ou)=([^,]*)(,dc=)?',
'\3\\'),
',dc=',
'.')
FROM regexTest
See the SQLFiddle.
The first (^|,)(cn|ou)=([^,]*)(,dc=)? regex matches , or start of string, then cn or ou, then =, then captures into Group 3 zero or more chars other than a comma, and then matches an optional ,dc= substring (thus, removing the first instance of ,dc=). The replacement is Group 3 contents and a backslash.
So, the second operation is easy, just replace all ,dc= with ., you do not even need a regex for this.

May be something like that:
SELECT nvl(regexp_replace(
regexp_replace(
nullif(
regexp_replace(test, '^cn=(.+?),DC=(.+?)$', '\1 \2',1,1,'i')
, test
) , ' |,(CN|OU)=', '\\', 1, 0,'i'
), ',DC=', '.', 1, 0,'i'
),test) result
FROM regexTest
This query does not change the input if there is no DC=.

Replace pair of % in oracle

please, I have in Oracle table this texts (as 2 records)
"Sample text with replace parameter %1%"
"You reached 90% of your limit"
I need replace %1% with specific text from input parameter in Oracle Function. In fact, I can have more than just one replace parameters. I have also record with "Replace this %12% with real value"
This functionality I have programmed:
IF poc > 0 THEN
FOR i in 1 .. poc LOOP
p := get_param(mString => mbody);
mbody := replace(mbody,
'%' || p || '%', parameters(to_number(p, '99')));
END LOOP;
END IF;
But in this case I have problem with text number 2. This functionality trying replace "90%" also and I then I get this error:
ORA-06502: PL/SQL: numeric or value error: NULL index table key value
It's a possible to avoid try replace "90%"? Many thanks for advice.
Best regards
PS: Oracle version: 10g (OCI Version: 10.2)

Regular expressions can work here. Try the following and build them into your script.
SELECT REGEXP_REPLACE( 'Sample text with replace parameter %1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
and
SELECT REGEXP_REPLACE( 'Sample text with replace parameter 1%',
'\%[0-9]+\%',
'db_size' )
FROM DUAL
The pattern is pretty simple; look for patterns where a '%' is followed by 1 or more numbers followed by a '%'.
The only issue here will be if you have more than one replacement to make in each string and each replacement is different. In that case you will need to loop round the string each time replacing the next parameter. To do this add the position and occurrence parameters to REGEXP_REPLACE after the replacement string, e.g.
REGEXP_REPLACE( 'Sample text with replace parameter %88888888888%','\%[0-9]+\%','db_size',0,1 )

You are getting the error because at parameters(to_number(p, '99')). Can you please check the value of p?
Also, if the p=90 then then REPLACE will not try to replace "90%". It will replace "%90%". How have you been sure that it's trying to replace "90%"?

Extract numbers from a field in PostgreSQL

I have a table with a column po_number of type varchar in Postgres 8.4. It stores alphanumeric values with some special characters. I want to ignore the characters [/alpha/?/$/encoding/.] and check if the column contains a number or not. If its a number then it needs to typecast as number or else pass null, as my output field po_number_new is a number field.
Below is the example:
SQL Fiddle.
I tired this statement:
select
(case when regexp_replace(po_number,'[^\w],.-+\?/','') then po_number::numeric
else null
end) as po_number_new from test
But I got an error for explicit cast:

Simply:
SELECT NULLIF(regexp_replace(po_number, '\D','','g'), '')::numeric AS result
FROM tbl;
\D being the class shorthand for "not a digit".
And you need the 4th parameter 'g' (for "globally") to replace all occurrences.
Details in the manual.
For a known, limited set of characters to replace, plain string manipulation functions like replace() or translate() are substantially cheaper. Regular expressions are just more versatile, and we want to eliminate everything but digits in this case. Related:
Regex remove all occurrences of multiple characters in a string
PostgreSQL SELECT only alpha characters on a row
Is there a regexp_replace equivalent for postgresql 7.4?
But why Postgres 8.4? Consider upgrading to a modern version.
Consider pitfalls for outdated versions:
Order varchar string as numeric
WARNING: nonstandard use of escape in a string literal

I think you want something like this:
select (case when regexp_replace(po_number, '[^\w],.-+\?/', '') ~ '^[0-9]+$'
then regexp_replace(po_number, '[^\w],.-+\?/', '')::numeric
end) as po_number_new
from test;
That is, you need to do the conversion on the string after replacement.
Note: This assumes that the "number" is just a string of digits.

The logic I would use to determine if the po_number field contains numeric digits is that its length should decrease when attempting to remove numeric digits.
If so, then all non numeric digits ([^\d]) should be removed from the po_number column. Otherwise, NULL should be returned.
select case when char_length(regexp_replace(po_number, '\d', '', 'g')) < char_length(po_number)
then regexp_replace(po_number, '[^0-9]', '', 'g')
else null
end as po_number_new
from test

If you want to extract floating numbers try to use this:
SELECT NULLIF(regexp_replace(po_number, '[^\.\d]','','g'), '')::numeric AS result FROM tbl;
It's the same as Erwin Brandstetter answer but with different expression:
[^...] - match any character except a list of excluded characters, put the excluded charaters instead of ...
\. - point character (also you can change it to , char)
\d - digit character

Since version 12 - that's 2 years + 4 months ago at the time of writing (but after the last edit that I can see on the accepted answer), you could use a GENERATED FIELD to do this quite easily on a one-time basis rather than having to calculate it each time you wish to SELECT a new po_number.
Furthermore, you can use the TRANSLATE function to extract your digits which is less expensive than the REGEXP_REPLACE solution proposed by #ErwinBrandstetter!
I would do this as follows (all of the code below is available on the fiddle here):
CREATE TABLE s
(
num TEXT,
new_num INTEGER GENERATED ALWAYS AS
(NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER) STORED
);
You can add to the 'ABCDEFG... string in the TRANSLATE function as appropriate - I have decimal point (.) and a space ( ) at the end - you may wish to have more characters there depending on your input!
And checking:
INSERT INTO s VALUES ('2'), (''), (NULL), (' ');
INSERT INTO t VALUES ('2'), (''), (NULL), (' ');
SELECT * FROM s;
SELECT * FROM t;
Result (same for both):
num new_num
2 2
NULL
NULL
NULL
So, I wanted to check how efficient my solution was, so I ran the following test inserting 10,000 records into both tables s and t as follows (from here):
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
INSERT INTO t
with symbols(characters) as
(
VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
)
select string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '')
from symbols
join generate_series(1,10) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
The differences weren't that huge but the regex solution was consistently slower by about 25% - even changing the order of the tables undergoing the INSERTs.
However, where the TRANSLATE solution really shines is when doing a "raw" SELECT as follows:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER
FROM s;
and the same for the REGEXP_REPLACE solution.
The differences were very marked, the TRANSLATE taking approx. 25% of the time of the other function. Finally, in the interests of fairness, I also did this for both tables:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
num, new_num
FROM t;
Both extremely quick and identical!

Need to parse the string separated by colon and eliminate part of the string if found the match in plsql

I know this topic was discussed multiple times, I looked at multiple posts and answers, but could not find exactly what I need to do.
I am trying to search the string, that has multiple values of varchar2 separated by ':', and if the match is found on another string, delete part of the string that matched, and update the table with the rest of the string.
I wrote the code using combination of str and instr functions, but looking for more elegant solution using regexp, or collections.
For example, the input string looks like this: ('abc:defg:klmnp). Need to find for example the piece of the string (could be at any position), and remove it, that result would look like this: (abc:klmnp)?
EDIT - copied from comment:
The input string (abc:defg:klmn:defgb). Let's say I am looking for defg, and only defg will have to be removed, not defgb. Now, like I mentioned before, next time around, I might be looking for the value in position 1, or the last position. So the desired part of the string to be removed might not always be wrapped in ':' from the both sides, but depending where it is in the string, either from the right, or from the left, or from both sides.

You can do this with a combination of LIKE, REPLACE and TRIM functions.
select trim(':' from
replace(':'||your_column||':',':'||search_string||':',':')
) from table_name
where ':'||your_column||':' like '%:'||search_string||':%';
Idea is,
Surround the column with colons and use LIKE function to find the match.
And on such matched rows, use REPLACE to replace the search string along with surrounding colons, with a single colon.
And then use TRIM to remove the surrounding colons.
Demo at sqlfiddle

EDIT (simplified) Perhaps this is what you need:
SELECT REGEXP_REPLACE(REPLACE('abc:defg:klmnop', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('defg:klmnop:abc', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('abc:klmnop:defg', ':defg:', ':'), '(^defg:|:defg$)', '')
, REGEXP_REPLACE(REPLACE('abc:klmnop:defgb:defg', ':defg:', ':'), '(^defg:|:defg$)', '')
FROM DUAL
;
which removes defg from start, middle, and end, and ignores defgb, giving:
abc:klmnop
klmnop:abc
abc:klmnop
abc:klmnop:defgb
And to update the table, you could:
UPDATE my_table
SET value = REGEXP_REPLACE(REGEXP_REPLACE(value, ':defg:', ':'), '(^defg:|:defg$)', '')
-- WHERE REGEXP_LIKE(value, '(^|.*:)defg(:.*|$)')
WHERE value LIKE '%defg%'
;
(though that final regex for the where may need to be tweaked to match, hard to test...)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

psql uppercase backreferenced string in regexp_replace - regex

Related

Oracle regex and replace

Oracle 11g - REGEXP_REPLACE - Subexpressions/different matches

Replace pair of % in oracle

Extract numbers from a field in PostgreSQL

Need to parse the string separated by colon and eliminate part of the string if found the match in plsql

Categories

Resources