Regex - How to remove last instance of the search value it finds? - regex

I have multiple XML files that I need to delete a line from. The same line exists in different sections of the file but I only need to delete the last instance it finds. For example -
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" />
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" />
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" /> - This is the line I need to delete
This is the line in file.
I am using the Find in Files feature in Notepad++ to achieve this. Tia.

Try the following find and replace, in regex mode (with dot all enabled):
Find: (.*)Same Text(?:\r?\n|$)(.*)
Replace: $1$2
This should work because the initial (.*) capture group should match and capture all content up to, but not including, the last occurrence of Same Text. Then, we also match and capture all content after this last occurrence. Finally, we replace with just the first two capture groups, to effectively splice out the line you want to remove.

Related

In Regex, how can I globally replace only the first matching character *after* a matching capturing group

I want to use regex select the first quote after each instance of img_ in the below code. In this example, it would be the " after the jpg.
I tried using the following regex:
\(?<=img_.*)"+?\g
but all the quotes after img_ in each line were selected, including the quotes around "533". How do I match the quotes after .jpg without matching any other quotes?
<img class="confluence-embedded-image confluence-content-image-border" height="399" src="img_78.jpg" width="533" />
<img class="confluence-embedded-image confluence-content-image-border" height="399" src="img_78.jpg" width="533" />
<img class="confluence-embedded-image confluence-content-image-border" height="399" src="img_78.jpg" width="533" />
I want to avoid using .jpg because that .jpg could be many other thing (.png, .jpeg, etc). I want to get the first quote after img_ regardless of what come between img_ and the quote.
I basically want to search in a file using that regex expression and only return the first quote after each instance of img_. I'm using the replace-in-file module in nodejs which takes in this regular expression and replaces it with a given expression. I tried the above regular expression but it replaces the entire match.
How do I match the quotes after .jpg without matching any other quotes?
\.jpg(")
Link
I want to avoid using .jpg because that .jpg could be many other thing (.png, .jpeg, etc). I want to get the first quote after img_ regardless of what come between img_ and the quote.
img_.*?(")
Link

Regex Pattern to Match A Href and Remove

I am trying to create a regex to match all a href links that contain my domain and I will end up removing the links. It is working fine until I run into an a href link that has another HTML tag within the tag.
Regex Statement:
(<a[^<]*coreyjansen\.com[^<]*>)([^"]*?)(<\/a>)
It matches the a href links in this statement with no problem
Need a lawyer? Contact <span style="color: #000000">Random text is great Corey is awesome</span>
It is unable to match both of the a href links this statement:
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /></a>
I have been trying to play with the neglected character set with no luck. If I remove the neglected character set what ends up happening is it will match two links that are right after each other such as example 2 as one match.
The issue here is that [^<]*> matches everything up until last >. That's the greedy behaviour of * asterisk. You can make it non-greedy by appending ? after asterisk(which you already do in other part of your query). It will then match everything until first occurrence of >. Then you have to change the middle part of your regex too ie. to catch everything until first tag </a> like this:
(<a[^<]*coreyjansen\.com[^<]*?>)(.*?)(<\/a>)
Use below regex which matches only a tag
(<a[^>]*coreyjansen\.com[^>]*>)
Example data
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /><a href="http://coreyjansen.com/"/>
Above regex will match all three a tag with your required domain.
Try above on regex
I'm playing with the following regex and it seems to be working:
<a.*coreyjansen\.com.*</a>
it captures anything between anchor tags that contain your site name. I am using javascript pattern matching from www.regexpal.com, depending on the language it could be slightly different
You need to match start of tag <a then match address before > char. You are matching wrong char. When you match that, then everithing between <a> and </a> is displayed link. I don't know why you compare to not contain quotes, every tag attribute (in HTML5) has value inside quotes, so you need to match everything except link ending tag </a>. It's done by ((?!string to not match).)* and after that should follow </a>. The result regex is:
(<a[^>]*coreyjansen\.com[^>]*>)((?!<\/a>).)*(<\/a>)

Regular Expression, only replace first occurrence of HTML tag

I've got several files that have double <body> tags in them (either on purpose or by accident). I'm looking to find the first occurrence only of the <body> tag and append it with additional HTML code. But the second occurrence shouldn't be affected. I'm using TextWrangler. The regex I'm using now replaces both occurrences rather than just the first.
Text:
<body someattribute=...>
existing content
<body onUnload=...>
RegEx I'm using:
Find: (\<body.*\>)
Replace with:
\n\1
appended HTML code
Current result:
<body someattribute=...>
appended HTML code
existing content
<body onUnload=...>
appended HTML code
So it's adding my appended code twice. I just want it to happen to the first <body...> only.
Regex:
(?s)(<body.*?>)(.*)
Replace:
\1\nappended content\n\2
Explanation:
(?s) makes the . character match new lines. Without this, the . character will match all characters until it hits a new line character.
(<body.*?>) Finds the first "body" and captures as group 1 (\1).
(.*) Finds everything after the first "body", and captures as group 2 (\2).Replaces everything that was found with group 1 + new line + appended content + new line + group 2
Tested in Notepad++

how to add after img tag using regex cs6

I many files of pages which has images in. I need to add a </center> after each IMG tag. I'm using dreamweaver cs6 and I got this regex so far.
find <img [^>]+> and replace $&</center>
But it doesnt work. It finds and replaces the <img> tags ok but it doesn't add the </center>
Thanks in advance.
I dont know how this are done in dreamweaver, but to keep the "found" value you should add \1 - first regexp, \2 second and so on
\1</center>
or try $1 as in htaccess, but \1 is your best bet
Try this as your regex:
(<img [^>]+>)
and this as your replace string:
$1</center>
You need to add round brackets to create a capturing group which you can then reference with $1.
NOTE: Make sure you have changed the Search field to Source Code, deselected the Ignore whitespace and Match whole word checkboxes and selected the Use regular expression checkbox in the Find and Replace dialog.

Regex - Not pick up second set of tags

Given the following line, how do I get the regex below from picking up the second set of SPAN tags. I want the zip, not the extend zip.
<TD width="20%">Zip Code: <B><SPAN class="TableBody clsBold">06902</SPAN>-<SPAN class="TableBody clsBold"> 2630</SPAN></B></TD></TR>
Regex:
<TD.+>([(\s)A-Za-z#]+:)\s*<B><SPAN class="TableBody.*">([\d\s#a-zA-Z$,]+)</SPAN>
<TD.+>([(\s)A-Za-z#]+:)\s*<B><SPAN class="TableBody.*?">([\d\s#a-zA-Z$,]+)</SPAN>
Your regex was close, but the TableBody.* is greedy, and adding a ? after .* makes it lazy so it doesn't grab the entire next portion of tags.