I tried to fix bad data in postgres DB where photo tags are appended twice.
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
As you can see in the string, photo tags were added already, but they were appended to the text again. I want to remove the second occurrence: . The first occurrence has certain order and I want to keep them.
I wrote a function that could construct a regex pattern:
CREATE OR REPLACE FUNCTION dd_trip_photo_tags(tagId int) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
SELECT string_agg(concat('<photo=',media_name,'>.*?(<photo=',media_name,'>)'),'|') FROM t_ddtrip_media WHERE tag_id=tagId $$;
This captures the second occurrence of a certain photo tag.
Then, I use regex_replace to replace the second occurrence:
update t_ddtrip_content set content = regexp_replace(content,dd_trip_photo_tags(332761),'') from t_ddtrip_content where tag_id=332761;
Yet, it would remove all matched tags. I looked up online for days but still couldn't figure out a way to fix this. Appreciate any help.
This Should Work.
Regex 1:
<photo=.+?>
See: https://regex101.com/r/thHmlq/1
Regex 2:
<.+?>
See: https://regex101.com/r/thHmlq/2
Input:
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
Output:
<photo=2-1-1601981-7-1.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-7-1.jpg>
Related
I am converting exported SQL views as files to a different syntax using a separate specialized conversion tool. This tool can't handle certain commands and formatting so I'm using Notepad++ with RegEx to alter the files ahead of time.
So far I am getting the results that I want, but it takes three separate Find/Replace actions. I'd like to reduce these three RegEx actions down to one if possible.
Find: (.*)(CREATE VIEW.*\nGO)(.*)
Replace: \2
Find: (CREATE VIEW )(.*)(\r\nAS)
Replace: \1"\2"\3
Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)```
I'm using Notepad++ 7.7.1 64-bit, Find/Replace with Regular Expression search mode - ". matches newline" check on.
You'll see in my code that I'm already using capture groups with alternation. I thought I could combine the first two RegEx steps as additional capture groups to Step 3 but it doesn't work out, possibly because they are nested.
I tried referencing the nested groups by incrementing the referencing number accordingly, but it doesn't work (blanks out the result).
Here is an example SQL view file. It's not a working view because I added "oldschema2" so the RegEx would have something to find for one of the replacements, but it's representative as an example here.
garbage
text
beforehand
CREATE VIEW [oldschema1].[viewname]
AS
SELECT DISTINCT
TOP (100) PERCENT oldschema1.TABLENAME.FIELD1, oldschema1.TABLENAME.FIELD2
FROM oldschema1.TABLENAME
WHERE (oldschema1.TABLENAME.FIELD3 = N'Z003') AND oldschema2.TABLENAME.FIELD2 = 1
ORDER BY oldschema1.TABLENAME.FIELD1
GO
garbage
text
after
Here is some additional details of what I'm trying to achieve with each pass.
Notepad++ RegEx Step 1 - isolate view block from CREATE VIEW to GO
Find:
(.*)(CREATE VIEW.*\nGO)(.*)
Replace:
\2
Step 2 - put quotes around view name
Find:
(CREATE VIEW )(.*)(\r\nAS)
Replace:
\1"\2"\3
Step 3 - remove/replace various texts and insert a line at the beginning of the file
Find:
(oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace:
(?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)
The expected output from the above example would be:
SET SCHEMA schemaname;
CREATE VIEW "viewname"
AS
SELECT DISTINCT
TABLENAME.FIELD1, TABLENAME.FIELD2
FROM TABLENAME
WHERE (TABLENAME.FIELD3 = N'Z003') AND TABLENAME.FIELD2 = 1
ORDER BY TABLENAME.FIELD1
;
which I achieve with the above three steps, but I'd like to do it in one Find/Replace if possible.
I'm pretty new to RegEx, and StackOverflow for that matter. Your help is greatly appreciated.
Step 1
I'm not so sure about it, but I'm guessing that maybe we would want an expression similar to:
[\s\S]*?(CREATE VIEW[\s\S]*GO\s*)[\s\S]*
to be replaced with $1, where our desired data is in this capturing group:
(CREATE VIEW[\s\S]*GO\s*)
and we can even remove \s*:
(CREATE VIEW[\s\S]*GO)
and just try:
[\s\S]*?(CREATE VIEW[\s\S]*GO)[\s\S]*
with an m flag.
In the right panel of this demo, the expression is further explained, if you might be interested.
Step 2
We can likely try:
(CREATE VIEW)(.*)
and replace with:
SET SCHEMA schemaname;\n\n$1 "viewname"
Demo
Step 3
This step would probably be done with an expression similar to:
TOP \(100\) PERCENT |oldschema1\.
being replaced with an empty string.
Demo
Step 4:
\s*GO being replaced with \n; or just ; and we might likely have the desired output, not sure though.
Demo
Trying to use Sublime to update the urls of only some lines in a sql table dump.
in this case the line that I need to single out has the string 'themo_showcase_\d_image' which is easy to match. In the same string what I actually need to replace is the url column so that it reads 'https://www.example.com/' to 'http://www.example.com'
Anyone able to help shed some light on this? I've got thousands of these insert records that I need to modify.
ex:
original string:
('8630', '1328', 'themo_showcase_1_image', 'https://www.example.com/'),
to:
('8630', '1328', 'themo_showcase_1_image', 'http://www.example.com/'),
Find: 'themo_showcase_\d_image', 'http\Ks you could use \d+ if there are more than 1 digit
Replace: LEAVE EMPTY
I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here
I have a column "verbatim" where each entry contains multiple lines. Here's an example:
Dummy field1:Text
Tell Us More:Text to capture
Dummy field2:Text
I'd like to capture only Text to capture text in the second line Tell Us More: and put that value into the column verbatim_scrubbed. In the example above, Text to capture would be the entry in verbatim_scrubbed.
I'm not that great with postgres and regexp, so I was hoping somebody could help me out here. Was thinking of something similar to the following:
update TABLE
set verbatim_trimmed = array_to_string(regexp_matches(verbatim,'tell us more:(.*)','gi'));
This doesn't work, but I have a feeling something similar may work.
Perhaps there is a direct way to capture the: Text to capture without the cariage return \r and the new line \n charracters (without using the regexp_replace).
Here is what you can do:
select regexp_replace(array_to_string(regexp_matches(verbatim, '^Tell Us More:(.*)$','n'),'',''), E'[\\r\\n]', '' ) from my_table;
Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.