postgres substring regex multiple results - regex

I'm trying to find image tags urls in a text field with multiple instances.
I'm currently using this code to extract the URL from the text field:
SUBSTRING(text_field FROM 'src="([^"]*).*')
The problem is it only returns the first instance of a image tag.
Is there a way to return multiple instances of matching from a single query?

Use the function regexp_matches() with the 'g' flag, example:
with my_table(text_field) as (
values ('src="first";src="second"')
)
select match[1] as result
from my_table
cross join lateral regexp_matches(text_field, 'src="([^"]*)', 'g') as match
result
--------
first
second
(2 rows)
Read about POSIX Regular Expressions in the documentation.

Related

How to use Postgres Regex Replace with a capture group

As the title presents above I am trying to reference a capture groups for a regex replace in a postgres query. I have read that the regex_replace does not support using regex capture groups. The regex I am using is
r"(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?"gm
The above regex almost does what I need it to but I need to find out how to only allow a match if the capture groups also capture something. There is no situation where a "username" should be matched if it just so happens to be a substring of a word. By ensuring its surrounded by one of the above I can much more confidently ensure its a username.
An example application of the regex would be something like this in postgres (of course I would be doing an update vs a select):
select *, REGEXP_REPLACE(reqcontent,'(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm') from table where column like '%username%' limit 100;
If there is any more context that can be provided please let me know. I have also found similar posts (postgresql regexp_replace: how to replace captured group with evaluated expression (adding an integer value to capture group)) but that talks more about splicing in values back in and I don't think quite answers my question.
More context and example value(s) for regex work against. The below text may look familiar these are JQL filters in Jira. We are looking to update our usernames and all their occurrences in the table that contains the filter. Below is a few examples of filters. We originally were just doing a find a replace but that doesn't work because we have some usernames that are only two characters and it was matching on non usernames (e.g je (username) would place a new value in where the word project is found which completely malforms the JQL/String resulting in something like proNEW-VALUEct = balh blah)
type = bug AND status not in (Closed, Executed) AND assignee in (test, username)
assignee=username
assignee = username
Definition of Answered:
Regex that will only match on a 'username' if its surrounded by one of the specials
A way to regex/replace that username in a postgres query.
Capturing groups are used to keep the important bits of information matched with a regex.
Use either capturing groups around the string parts you want to stay in the result and use their placeholders in the replacement:
REGEXP_REPLACE(reqcontent,'([\s\(\)\=\)\,])username([\s\(\)\=\)\,])?' ,'\1NEW-VALUE\2', 'gm')
Or use lookarounds:
REGEXP_REPLACE(reqcontent,'(?<=[\s\(\)\=\)\,])(username)(?=[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm')
Or, in this case, use word boundaries to ensure you only replace a word when inside special characters:
REGEXP_REPLACE(reqcontent,'\yusername\y' ,'NEW-VALUE', 'g')

How to combine multiple RegEx commands for Notepad++ using capture groups and alternations?

I am converting exported SQL views as files to a different syntax using a separate specialized conversion tool. This tool can't handle certain commands and formatting so I'm using Notepad++ with RegEx to alter the files ahead of time.
So far I am getting the results that I want, but it takes three separate Find/Replace actions. I'd like to reduce these three RegEx actions down to one if possible.
Find: (.*)(CREATE VIEW.*\nGO)(.*)
Replace: \2
Find: (CREATE VIEW )(.*)(\r\nAS)
Replace: \1"\2"\3
Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)```
I'm using Notepad++ 7.7.1 64-bit, Find/Replace with Regular Expression search mode - ". matches newline" check on.
You'll see in my code that I'm already using capture groups with alternation. I thought I could combine the first two RegEx steps as additional capture groups to Step 3 but it doesn't work out, possibly because they are nested.
I tried referencing the nested groups by incrementing the referencing number accordingly, but it doesn't work (blanks out the result).
Here is an example SQL view file. It's not a working view because I added "oldschema2" so the RegEx would have something to find for one of the replacements, but it's representative as an example here.
garbage
text
beforehand
CREATE VIEW [oldschema1].[viewname]
AS
SELECT DISTINCT
TOP (100) PERCENT oldschema1.TABLENAME.FIELD1, oldschema1.TABLENAME.FIELD2
FROM oldschema1.TABLENAME
WHERE (oldschema1.TABLENAME.FIELD3 = N'Z003') AND oldschema2.TABLENAME.FIELD2 = 1
ORDER BY oldschema1.TABLENAME.FIELD1
GO
garbage
text
after
Here is some additional details of what I'm trying to achieve with each pass.
Notepad++ RegEx Step 1 - isolate view block from CREATE VIEW to GO
Find:
(.*)(CREATE VIEW.*\nGO)(.*)
Replace:
\2
Step 2 - put quotes around view name
Find:
(CREATE VIEW )(.*)(\r\nAS)
Replace:
\1"\2"\3
Step 3 - remove/replace various texts and insert a line at the beginning of the file
Find:
(oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace:
(?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)
The expected output from the above example would be:
SET SCHEMA schemaname;
CREATE VIEW "viewname"
AS
SELECT DISTINCT
TABLENAME.FIELD1, TABLENAME.FIELD2
FROM TABLENAME
WHERE (TABLENAME.FIELD3 = N'Z003') AND TABLENAME.FIELD2 = 1
ORDER BY TABLENAME.FIELD1
;
which I achieve with the above three steps, but I'd like to do it in one Find/Replace if possible.
I'm pretty new to RegEx, and StackOverflow for that matter. Your help is greatly appreciated.
Step 1
I'm not so sure about it, but I'm guessing that maybe we would want an expression similar to:
[\s\S]*?(CREATE VIEW[\s\S]*GO\s*)[\s\S]*
to be replaced with $1, where our desired data is in this capturing group:
(CREATE VIEW[\s\S]*GO\s*)
and we can even remove \s*:
(CREATE VIEW[\s\S]*GO)
and just try:
[\s\S]*?(CREATE VIEW[\s\S]*GO)[\s\S]*
with an m flag.
In the right panel of this demo, the expression is further explained, if you might be interested.
Step 2
We can likely try:
(CREATE VIEW)(.*)
and replace with:
SET SCHEMA schemaname;\n\n$1 "viewname"
Demo
Step 3
This step would probably be done with an expression similar to:
TOP \(100\) PERCENT |oldschema1\.
being replaced with an empty string.
Demo
Step 4:
\s*GO being replaced with \n; or just ; and we might likely have the desired output, not sure though.
Demo

Mariadb: Regexp_substr not working with non-matching group regular expression

I am using a query to pull a user ID from a column that contains text. This is for a forum system I am using and want to get the User id portion out of a text field that contains the full message. The query I am using is
SELECT REGEXP_SUBSTR(message, '(?:member: )(\d+)'
) AS user_id
from posts
where message like '%quote%';
Now ignoring the fact thats ugly SQL and not final I just need to get to the point where it reads the user ID. The following is an example of the text that you would see in the message column
`QUOTE="Moony, post: 967760, member: 22665"]I'm underwhelmed...[/QUOTE]
Hopefully we aren’t done yet and this is nothing but a depth signing!`
Is there something different about the regular expression when used in mariadb REGEXP_SUBST? this should be PCRE and works within regex testers and should read correctly. It should be looking for the group "member: " and then take the numbers after that and have a single match on all those posts.
This is an ugly hack/workaround that works by using a lookahead for the following "] however will not work if there are multiple quotes in a post
(?<=member: )\.+(?="])

how to extract out a string with SYMBOLS after a pattern in a URL string in Google BigQuery

i have two possible forms of a URL string
http://www.abcexample.com/landpage/?pps=[Y/lyPw==;id_1][Y/lyP2ZZYxi==;id_2];[5403;ord];
http://www.abcexample.com/landpage/?pps=Y/lyPw==;id_1;unknown;ord;
I want to get out the Y/lyPw== in both examples
so everything before ;id_1 between the brackets
will always come after the ?pps= part
What is the best way to approach this? I want to use the big query language as this is where my data sits
Here is one way to build a regular expression to do it:
SELECT REGEXP_EXTRACT(url, r'\?pps=;[\[]?([^;]*);') FROM
(SELECT "http://www.abcexample.com/landpage/?pps=;[XYZXYZ;id_1][XYZZZZ;id_2];[5403;ord];"
AS url),
(SELECT "http://www.abcexample.com/landpage/?pps=;XYZXYZ;id_1;unknown;ord;"
AS url)
You can use this regex:
pps=\[?([^;]+)
Working demo
The idea behind this regex is:
pps= -> Look for the pps= pattern
\[? -> might have a [ or not
([^;]+) -> store the content up to the first semi colon
So, for your both url this regex will match (in blue) and capture (in green) as below:
For BigQuery you have to use
REGEXP_EXTRACT('str', 'reg_exp')
Quoting its documentation:
REGEXP_EXTRACT: Returns the portion of str that matches the capturing group within the regular expression.
You have to use a code like this:
SELECT
REGEXP_EXTRACT(word,r'pps=\[?([^;]+)') AS fragment
FROM
...
For a working example code you can use:
SELECT
REGEXP_EXTRACT(url,r'pps=\[?([^;]+)') AS fragment
FROM
(SELECT "http://www.abcexample.com/landpage/?pps=;[XYZXYZ;id_1][XYZZZZ;id_2];[5403;ord];"
AS url),
(SELECT "http://www.abcexample.com/landpage/?pps=;XYZXYZ;id_1;unknown;ord;"
AS url)
This regex should work for you
(\w+);id_1
It will extract XYZXYZ
It uses the concept of Group capture
See this Demo

Notepad++ Regular Expression Condition Replacement

I have a set of SQL script that wants to change schema.
create table Service.Table1 (col1 varchar(100));
create table Operation.Table2 (col1 varchar(100));
create table Support.Table3 (col1 varchar(100));
However, the schema is going to change
Service -> Sev
Operation -> Opn
Support -> Spt
The search regular expression is easy ([A-Za-z0-9_]+)\.([A-Za-z0-9_]+)
However, how to do the conditional replacement in Notepad++ or other tools if they can?
Thanks!
If you have a predefined set of the schemas, you may use the conditional replacement in Notepad++ like this:
Find: (?:(?<a>Service)|(?<b>Operation)|(?<c>Support))\.(?<n>[A-Z0-9_]+)
Replace: (?{a}Sev:(?{b}Opn:Spt)).$+{n}
Match Case must be ticked off, and Regular expression must be on.
I would run replace 3 times, once for each schema name:
Find:
create table Service\.
Replace with:
create table Svc.
Find:
create table Support\.
Replace with:
create table Spt.
Find:
create table Operation\.
Replace with:
create table Opn.
Or here is one that uses groups references:
Find:
Service(\.[^\s]+)(.*)
Replace with:
Svc\1\2
Here \1 will hold the dot operator and the table name and \2 holds the rest of the line.
Notepad++ regex implementation is not really powerfull; so,
other tools if they can?
Here is a way to do it:
perl -pi.back -e '%tr=(Service=>"Sev",Operation=>"Opn",Support=>"Spt");s/(?<=create table )(\w+)/$tr{$1}/e;' TheFile
You can add any number of Original => 'Modified' as you want within the hash %tr.
TheFile will be backuped into TheFile.back before processing.