Trimming text from a multi-line entry in postgres with regex - regex

I have a column "verbatim" where each entry contains multiple lines. Here's an example:
Dummy field1:Text
Tell Us More:Text to capture
Dummy field2:Text
I'd like to capture only Text to capture text in the second line Tell Us More: and put that value into the column verbatim_scrubbed. In the example above, Text to capture would be the entry in verbatim_scrubbed.
I'm not that great with postgres and regexp, so I was hoping somebody could help me out here. Was thinking of something similar to the following:
update TABLE
set verbatim_trimmed = array_to_string(regexp_matches(verbatim,'tell us more:(.*)','gi'));
This doesn't work, but I have a feeling something similar may work.

Perhaps there is a direct way to capture the: Text to capture without the cariage return \r and the new line \n charracters (without using the regexp_replace).
Here is what you can do:
select regexp_replace(array_to_string(regexp_matches(verbatim, '^Tell Us More:(.*)$','n'),'',''), E'[\\r\\n]', '' ) from my_table;

Related

How to combine multiple RegEx commands for Notepad++ using capture groups and alternations?

I am converting exported SQL views as files to a different syntax using a separate specialized conversion tool. This tool can't handle certain commands and formatting so I'm using Notepad++ with RegEx to alter the files ahead of time.
So far I am getting the results that I want, but it takes three separate Find/Replace actions. I'd like to reduce these three RegEx actions down to one if possible.
Find: (.*)(CREATE VIEW.*\nGO)(.*)
Replace: \2
Find: (CREATE VIEW )(.*)(\r\nAS)
Replace: \1"\2"\3
Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)```
I'm using Notepad++ 7.7.1 64-bit, Find/Replace with Regular Expression search mode - ". matches newline" check on.
You'll see in my code that I'm already using capture groups with alternation. I thought I could combine the first two RegEx steps as additional capture groups to Step 3 but it doesn't work out, possibly because they are nested.
I tried referencing the nested groups by incrementing the referencing number accordingly, but it doesn't work (blanks out the result).
Here is an example SQL view file. It's not a working view because I added "oldschema2" so the RegEx would have something to find for one of the replacements, but it's representative as an example here.
garbage
text
beforehand
CREATE VIEW [oldschema1].[viewname]
AS
SELECT DISTINCT
TOP (100) PERCENT oldschema1.TABLENAME.FIELD1, oldschema1.TABLENAME.FIELD2
FROM oldschema1.TABLENAME
WHERE (oldschema1.TABLENAME.FIELD3 = N'Z003') AND oldschema2.TABLENAME.FIELD2 = 1
ORDER BY oldschema1.TABLENAME.FIELD1
GO
garbage
text
after
Here is some additional details of what I'm trying to achieve with each pass.
Notepad++ RegEx Step 1 - isolate view block from CREATE VIEW to GO
Find:
(.*)(CREATE VIEW.*\nGO)(.*)
Replace:
\2
Step 2 - put quotes around view name
Find:
(CREATE VIEW )(.*)(\r\nAS)
Replace:
\1"\2"\3
Step 3 - remove/replace various texts and insert a line at the beginning of the file
Find:
(oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace:
(?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)
The expected output from the above example would be:
SET SCHEMA schemaname;
CREATE VIEW "viewname"
AS
SELECT DISTINCT
TABLENAME.FIELD1, TABLENAME.FIELD2
FROM TABLENAME
WHERE (TABLENAME.FIELD3 = N'Z003') AND TABLENAME.FIELD2 = 1
ORDER BY TABLENAME.FIELD1
;
which I achieve with the above three steps, but I'd like to do it in one Find/Replace if possible.
I'm pretty new to RegEx, and StackOverflow for that matter. Your help is greatly appreciated.
Step 1
I'm not so sure about it, but I'm guessing that maybe we would want an expression similar to:
[\s\S]*?(CREATE VIEW[\s\S]*GO\s*)[\s\S]*
to be replaced with $1, where our desired data is in this capturing group:
(CREATE VIEW[\s\S]*GO\s*)
and we can even remove \s*:
(CREATE VIEW[\s\S]*GO)
and just try:
[\s\S]*?(CREATE VIEW[\s\S]*GO)[\s\S]*
with an m flag.
In the right panel of this demo, the expression is further explained, if you might be interested.
Step 2
We can likely try:
(CREATE VIEW)(.*)
and replace with:
SET SCHEMA schemaname;\n\n$1 "viewname"
Demo
Step 3
This step would probably be done with an expression similar to:
TOP \(100\) PERCENT |oldschema1\.
being replaced with an empty string.
Demo
Step 4:
\s*GO being replaced with \n; or just ; and we might likely have the desired output, not sure though.
Demo

How to exclude delimiters inside text qualifiers using Regex?

I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.
In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:
"aabb"|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|gghh
"aa|bb"|ccdd|"ee|ff"|gghh
"aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gg|hh"
aabb|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|"gg|hh"
aabb|cc|dd|eeff|gghh
aabb|"cc||dd"|eeff|gghh
aabb|"c|c|dd"|eeff|gghh
"aa||bb"|ccdd|eeff|gghh
"a|a|b|b"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"g|g|hh"
"aabb"|ccdd|eeff|"gg||hh"
I want to exclude all delimiters that are in between text qualifiers.
I have tried to use Regex to replace the delimiters within text qualifiers with nothing.
So far, I have tried the following Regex code for my target:
(")(.*?[^"])\|+(.*?)(")
And I have used the following for my replace:
$1$2$3$4
However, this will not fix te lines 11, 13, 14 and 15.
I wish to obtain the following results:
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|"eeff"|gghh
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
aabb|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
aabb|cc|dd|eeff|gghh
aabb|"ccdd"|eeff|gghh
aabb|"ccdd"|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
"aabb"|ccdd|eeff|"gghh"
Thank you in advance for helping me out!
With kind regards,
Robin
I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.
However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.

Sublime Regex Search and Replace

Trying to use Sublime to update the urls of only some lines in a sql table dump.
in this case the line that I need to single out has the string 'themo_showcase_\d_image' which is easy to match. In the same string what I actually need to replace is the url column so that it reads 'https://www.example.com/' to 'http://www.example.com'
Anyone able to help shed some light on this? I've got thousands of these insert records that I need to modify.
ex:
original string:
('8630', '1328', 'themo_showcase_1_image', 'https://www.example.com/'),
to:
('8630', '1328', 'themo_showcase_1_image', 'http://www.example.com/'),
Find: 'themo_showcase_\d_image', 'http\Ks you could use \d+ if there are more than 1 digit
Replace: LEAVE EMPTY

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

Postgres: remove second occurrence of a string

I tried to fix bad data in postgres DB where photo tags are appended twice.
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
As you can see in the string, photo tags were added already, but they were appended to the text again. I want to remove the second occurrence: . The first occurrence has certain order and I want to keep them.
I wrote a function that could construct a regex pattern:
CREATE OR REPLACE FUNCTION dd_trip_photo_tags(tagId int) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
SELECT string_agg(concat('<photo=',media_name,'>.*?(<photo=',media_name,'>)'),'|') FROM t_ddtrip_media WHERE tag_id=tagId $$;
This captures the second occurrence of a certain photo tag.
Then, I use regex_replace to replace the second occurrence:
update t_ddtrip_content set content = regexp_replace(content,dd_trip_photo_tags(332761),'') from t_ddtrip_content where tag_id=332761;
Yet, it would remove all matched tags. I looked up online for days but still couldn't figure out a way to fix this. Appreciate any help.
This Should Work.
Regex 1:
<photo=.+?>
See: https://regex101.com/r/thHmlq/1
Regex 2:
<.+?>
See: https://regex101.com/r/thHmlq/2
Input:
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
Output:
<photo=2-1-1601981-7-1.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-7-1.jpg>