EditPad: Find and Replace with RegEx Backreferences - regex

I'm trying my hand at regex again. In particular, using a backreference to found text in the replace string in the EditPad text editor.
Subject:
Product1 Desc,12 PIN,GradeA Qty Price
Product2 Desc,28 PIN,GradeA Qty Price
Goal:
Since the text is currently space-separated, I need to replace 12 PIN with 12||PIN, and 28 PIN with 28||PIN.
What I'm trying:
[(0-9)]+[(\s)]PIN seems to be finding what I want just fine.
When I try to replace with backrefereces, though, the only one I can get to work is \0.
For example, using \0||PIN as my replace gives me 12 PIN||PIN.
When I try to replace with \1||PIN, however, it gives ||PIN.
What am I missing?

I could have sworn that I saw a previous poster answer this...
Using this as your find string:
([0-9]+)[\s]*PIN
and this as your replace string:
\1||PIN
should do it.

Related

How to extract all IMDb ID's from string

I have a block of text where I want to search for IMDb link, if found I want to extract the IMDdID.
Here is an example string:
http://www.imdb.com/Title/tt2618986
http://www.google.com/tt2618986
https://www.imdb.com/Title/tt2618986
http://www.imdb.com/title/tt1979376/?ref_=nv_sr_1?ref_=nv_sr_1
I want to only extract 2618986 from lines 1, 3 and 4.
Here is the regex line I am currently using but am not having luck:
(?:http|https)://(?:.*\.|.*)imdb.com/(?:t|T)itle(?:\?|/)(..\d+)(.+)?
https://regex101.com/r/ERtoRz/1
If you are interested in only extracting the ID, so 2618986, none of the comments quite nail it, since they match tt2618986. Building on top of #The fourth bird answer, you will need to separate tt2618986 into two parts - tt and 2618986. So instead of a single ([a-zA-Z0-9]+), have [a-zA-Z]+([0-9]+).
^https?://www\.imdb\.com/[Tt]itle[?/][a-zA-Z]+([0-9]+)
Regex Demo
You can then extract the 2618986 part by calling group 1.
This expression might simply extract those desired digits:
^(?:https?://)(?:www\.)?imdb\.com/title/[a-z]+([0-9]+).*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Postgres: remove second occurrence of a string

I tried to fix bad data in postgres DB where photo tags are appended twice.
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
As you can see in the string, photo tags were added already, but they were appended to the text again. I want to remove the second occurrence: . The first occurrence has certain order and I want to keep them.
I wrote a function that could construct a regex pattern:
CREATE OR REPLACE FUNCTION dd_trip_photo_tags(tagId int) RETURNS text
LANGUAGE sql IMMUTABLE
AS $$
SELECT string_agg(concat('<photo=',media_name,'>.*?(<photo=',media_name,'>)'),'|') FROM t_ddtrip_media WHERE tag_id=tagId $$;
This captures the second occurrence of a certain photo tag.
Then, I use regex_replace to replace the second occurrence:
update t_ddtrip_content set content = regexp_replace(content,dd_trip_photo_tags(332761),'') from t_ddtrip_content where tag_id=332761;
Yet, it would remove all matched tags. I looked up online for days but still couldn't figure out a way to fix this. Appreciate any help.
This Should Work.
Regex 1:
<photo=.+?>
See: https://regex101.com/r/thHmlq/1
Regex 2:
<.+?>
See: https://regex101.com/r/thHmlq/2
Input:
The trip is wonderful.<photo=2-1-1601981-7-1.jpg><photo=2-1-1601981-5-2.jpg>We enjoyed it very much.<photo=2-1-1601981-5-2.jpg><photo=2-1-1601981-7-1.jpg>
Output:
<photo=2-1-1601981-7-1.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-5-2.jpg>
<photo=2-1-1601981-7-1.jpg>

what is regex doing in the background?

I played around with regex today and I stepped on something I don't really understand why it behave like this.
This is my working regex (I formatted it for better readability):
(?<name>[a-z\ ]+[a-zA-Z]+|[a-zA-Z]+)\
(?<firstname>[a-z-A-Z\ ]+)\n
(?<title>[a-zA-Z\.\ ]+)\n?
(?<company>[a-zA-Zäöü\.\ ]+)?\n
(?<street>[a-zA-Zäöü]+)\ (?<housenumber>[0-9]+)\n?
(?<postfach>Postfach [0-9]+)?\n
(?<zip>[0-9]+)\ (?<place>[a-zA-Zäöü]+)
And this is the string I want to parse through:
Smith John
Dr.
Foobar AG
Smithstrasse 1
Postfach 1
6500 Bellinzona
With this regex it'll work perfectly. But previously the \n before group street was nullable and not the \n before group company. The thing is that there's a case where the string has no company in it. The result with the previous version: The whole street exept for the last char was in the group company and the last char of the street in group street (I used regex101 for testing). Although group company is nullable it looks like it "forced" to be part of the string which is definetly not the thing I want.
And that's where my quesion comes. How does regex work exactly in the background? I think regex is trying to take the best solution out of all the possible groupings it can have in the string. But I have no clue why it takes this solution as the best one.
Here's a link to regex101 where you can see how it behaved previously: https://regex101.com/r/OmuPBn/1

RegEx backreference followed by a number in Dreamweaver

I want to search for a specific pattern which contains a numeric "1" and replace it with the same string followed by the numeric "2". But if I call $12 then the output is the literal "$12". The regex engine seemingly tries to find the memory slot 12, but I intended to address the memory slot 1, and then write "2".
I tried to create a fiddle but this doesn't reproduce the error, so apparently it has something to do with my editor. I am using Dreamweaver CS6. If not with Dreamweaver then maybe my Dreamweaver settings.
Also, I just found this question which refers to my exact same problem – but the answer provided there doesn't work for me. $012 just writes "$012". I guess the Dreamweaver RegExp engine is peculiar like that.
Any ideas?
EDIT:
Given the example text …
This is item 1
This is house 3
… and the pattern ((?:item|house) )\d
what I tried | what I'm getting
$12 | $12
$012 | $012
\g{1}2 | \g{1}2
$g{1}2 | $g{1}2
$12 | item2 // or "house"
${1}2 | ${1}2
"$1"+2 | "item"+2
The desired result is always:
This is item 2
This is house 2
Because it was asked: yes, I am sure that the RegExp checkbox is activated and yes, I am sure that I'm in the Code view, not the Design view. I always work in Code view.
My Dreamweaver is CS6 Version 12.0 Build 5861.
This is a well-known bug in Dreamweaver. Fortunately, there are workarounds.
For argument's sake, let's say you are looking for letters and want to append a 2.
Method 1
I tested the following in Dreamweaver CS6.
Input: abc
Search in code view: ([a-z]+)
Replace: $12
Output in code view: abc2
Output in design view: abc2
Note that the output in code view is abc2, but because 2 encodes 2, on the web page you see abc2
Method 2: Two-step approach
Same search.
Replace: $1SOMETHINGDISTINCTIVE
Then search for SOMETHINGDISTINCTIVE and replace with 2
Finally
Of course some would argue that the real workaround is to work in Komodo IDE (or whatever editor they fancy), but that is not your question. :)
Lets say the Test String i.e. the string to match or select, is
aabbbccbbbaacc2
Case 1: Using Backreference for Matching or Selecting
Find/Search:
(a+)(b+)(c+)\2\1\3\d
Case 2: Using Backreference for Match or Select & Replace
Say I Expect The Result as
aacc9bbb
Find/Search:
(a+)(b+)(c+)\2\1\3\d
Replace With:
\1\039\2
or
\1$039\2
So It's NOT \3 but \03 or $03, when it is followed by a numeric character, in the Replace With Field.

How do I extract a postcode from one column in SSIS using regular expression

I'm trying to use a custom regex clean transformation (information found here ) to extract a post code from a mixed address column (Address3) and move it to a new column (Post Code)
Example of incoming data:
Address3: "London W12 9LZ"
Incoming data could be any combination of place names with a post code at the start, middle or end (or not at all).
Desired outcome:
Address3: "London"
Post Code: "W12 9LZ"
Essentially, in plain english, "move (not copy) any post code found from address3 into Post Code".
My regex skills aren't brilliant but I've managed to get as far as extracting the post code and getting it into its own column using the following regex, matching from Address3 and replacing into Post Code:
Match Expression:
(?<stringOUT>([A-PR-UWYZa-pr-uwyz]([0-9]{1,2}|([A-HK-Ya-hk-y][0-9]|[A-HK-Ya-hk-y][0-9] ([0-9]|[ABEHMNPRV-Yabehmnprv-y]))|[0-9][A-HJKS-UWa-hjks-uw])\ {0,1}[0-9][ABD-HJLNP-UW-Zabd-hjlnp-uw-z]{2}|([Gg][Ii][Rr]\ 0[Aa][Aa])|([Ss][Aa][Nn]\ {0,1}[Tt][Aa]1)|([Bb][Ff][Pp][Oo]\ {0,1}([Cc]\/[Oo]\ )?[0-9]{1,4})|(([Aa][Ss][Cc][Nn]|[Bb][Bb][Nn][Dd]|[BFSbfs][Ii][Qq][Qq]|[Pp][Cc][Rr][Nn]|[Ss][Tt][Hh][Ll]|[Tt][Dd][Cc][Uu]|[Tt][Kk][Cc][Aa])\ {0,1}1[Zz][Zz])))
Replace Expression:
${stringOUT}
So this leaves me with:
Address3: "London W12 9LZ"
Post Code: "W12 9LZ"
My next thought is to keep the above match/replace, then add another to match anything that doesn't match the above regex. I think it might be a negative lookahead but I can't seem to make it work.
I'm using SSIS 2008 R2 and I think the regex clean transformation uses .net regex implementation.
Thanks.
Just solved this. As usual, it was simpler logic than I thought it should be. Instead of trying to match the non-post code strings and replace them with themselves, I have added another line matching the postcode again and replacing it with "".
So in total, I have:
Match the post code using the above regex and move it to the Post Code column
Match the post code using the above regex and replace it with "" in the Address3 column