Invert regex match across lines - regex

How would you invert this expression to match everything BUT the contents between the <!-- LIST --> and <!-- /LIST --> tags?
((?s)<!-- LIST -->.*?<!-- /LIST -->)
Meaning I'd like to remove everything before <!-- LIST --> and after <!-- /LIST -->

The regex you have used already matches the section between the two tags, you have to simply add the prior and following sections and use the backreference to replace all the contents with the saved group (usually the slash / is to escape also).
This is a generic regex code:
s/(?s).*(<!-- LIST -->.*?<!-- \/LIST -->).*/\1/
Implementation online here

Related

Regex - How to remove last instance of the search value it finds?

I have multiple XML files that I need to delete a line from. The same line exists in different sections of the file but I only need to delete the last instance it finds. For example -
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" />
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" />
(Openning tag here)Simple name="DisplayValue" value="{?Consumer}" /> - This is the line I need to delete
This is the line in file.
I am using the Find in Files feature in Notepad++ to achieve this. Tia.
Try the following find and replace, in regex mode (with dot all enabled):
Find: (.*)Same Text(?:\r?\n|$)(.*)
Replace: $1$2
This should work because the initial (.*) capture group should match and capture all content up to, but not including, the last occurrence of Same Text. Then, we also match and capture all content after this last occurrence. Finally, we replace with just the first two capture groups, to effectively splice out the line you want to remove.

How to Match Redundant Lines From Contenteditable Div in Regex

I'm trying to process the html inside a contenteditable div. It might look like:
<div>Hi I'm Jack...</div>
<div><br></div>
<div><br></div>
<div>More text.</div> *<div><br></div>*
*<div><br></div>**<div><br></div>*
*<div><br></div>*
*<div>
<br>
</div>*
What regex expression would match all trailing <div><br></div> but not the ones sandwiched between useful divs containing text, i.e., <div> text (not html) </div>?
I have enclosed all expressions I want to match in asterisks. The asterisk are for reference only and are not part of my string.
Thanks,
Jack
You can use the pattern:
(?:<div>[\n\s]*<br>[\n\s]*<\/div>)(?!.*?<div>[^<]+<\/div>)
You can try it here.
Let me know if this works for all your cases and I will write a detailed explanation of the pattern.

Regex Pattern to Match A Href and Remove

I am trying to create a regex to match all a href links that contain my domain and I will end up removing the links. It is working fine until I run into an a href link that has another HTML tag within the tag.
Regex Statement:
(<a[^<]*coreyjansen\.com[^<]*>)([^"]*?)(<\/a>)
It matches the a href links in this statement with no problem
Need a lawyer? Contact <span style="color: #000000">Random text is great Corey is awesome</span>
It is unable to match both of the a href links this statement:
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /></a>
I have been trying to play with the neglected character set with no luck. If I remove the neglected character set what ends up happening is it will match two links that are right after each other such as example 2 as one match.
The issue here is that [^<]*> matches everything up until last >. That's the greedy behaviour of * asterisk. You can make it non-greedy by appending ? after asterisk(which you already do in other part of your query). It will then match everything until first occurrence of >. Then you have to change the middle part of your regex too ie. to catch everything until first tag </a> like this:
(<a[^<]*coreyjansen\.com[^<]*?>)(.*?)(<\/a>)
Use below regex which matches only a tag
(<a[^>]*coreyjansen\.com[^>]*>)
Example data
<strong><a href="http://coreyjansen.com/"><img class="alignright size-full
wp-image-12" src="http://50h0.com/wp-content/uploads/2014/06/lawyers.jpg"
alt="lawyers" width="250" height="250" /><a href="http://coreyjansen.com/"/>
Above regex will match all three a tag with your required domain.
Try above on regex
I'm playing with the following regex and it seems to be working:
<a.*coreyjansen\.com.*</a>
it captures anything between anchor tags that contain your site name. I am using javascript pattern matching from www.regexpal.com, depending on the language it could be slightly different
You need to match start of tag <a then match address before > char. You are matching wrong char. When you match that, then everithing between <a> and </a> is displayed link. I don't know why you compare to not contain quotes, every tag attribute (in HTML5) has value inside quotes, so you need to match everything except link ending tag </a>. It's done by ((?!string to not match).)* and after that should follow </a>. The result regex is:
(<a[^>]*coreyjansen\.com[^>]*>)((?!<\/a>).)*(<\/a>)

Remove content from string if it starts and ends with a certain string

Hello I am having a bit of trouble with this regex
I want to remove all content that starts and ends with the following
Starts with: <!--
Ends with: -->
Example of string to be removed:
<!-- <rdf:RDF xmlns:rdf="/web/20120124023607/http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="/web/20120124023607/http://purl.org/dc/elements/1.1/"
xmlns:trackback="/web/20120124023607/http://madskills.com/public/xml/rss/module/trackback/">
<rdf:Description rdf:about="/web/20120124023607/http://www.flatfeets.com/flat-feet-foot-definition/"
dc:identifier="/web/20120124023607/http://www.flatfeets.com/flat-feet-foot-definition/"
dc:title="What Is Flat Feet Or Flat Foot ?"
trackback:ping="/web/20120124023607/http://www.flatfeets.com/flat-feet-foot-definition/trackback/" />
-->
My code that is not working:
Function StripGarbage(ByVal article As String) As String
Return Regex.Replace(article, "<!--(.+?)-->", "")
End Function
Use [\S\s] instead of ., it'll allow the Regular Expression to match across multiple lines

Regular expression to select from X to Y

I have the folowing HTML:
<Some Html above....../>
<!--Template Start -->
<div>
<p>Some text</p>
...
<div>
<!--Template End -->
<Some Html below/>
Now how can I write regular expression to match all text from Template Start to
Template End
here it says that notepad++ use Scintilla engine.
Notepad++ non-greedy regular expressions
<!--Template Start -->(.*?)<!--Template End -->
s modifier should be switched on.
Assuming that there are no nested templates:
<!--Template Start -->(.*?)<!--Template End -->
Note to switch on mode DOT_ALL to also cover newlines.
It's a shame, but Notepad++ doesn't support matching newlines (\r\n) natively in regex mode. It does support matching newlines only in extended mode.
However it DOES support INSERTING newlines in both modes.
To achieve desired results, you can do a workaround:
Delete all newlines in extended mode (replace \r\n with nothing) so you have one-liner.
Do regex manipulations in regex mode.
Add newlines back in extended mode (e.g. replace <div> with <div>\r\n and so on) or regex mode.
I've read somewhere that PythonScript plugin for N++ adds better support for regexes but I haven't checked it.