Regex lookahead and behind?

Regex lookahead and behind? - regex

So I have a unordered list that looks like:
<ul class='radio' id='input_16_5'>
<li>
<input name='input_5' type='radio' value='location_1' id='choice_16_5_0' />
<label for='choice_16_5_0' id='label_16_5_0'>Location 1</label></li>
<li>
<input name='input_5' type='radio' value='location_2' id='choice_16_5_1' />
<label for='choice_16_5_1' id='label_16_5_1'>Location 2</label></li>
<li>
<input name='input_5' type='radio' value='location_3' id='choice_16_5_2' />
<label for='choice_16_5_2' id='label_16_5_2'>Location 3</label></li>
</ul>
I would like to pass a value (ie. location_2) to a regular expression that will then capture the whole list item that it's a part of in order to remove it. So if I pass it location_2 it will match the to the (including) <li> and the </li> of the list item that it's in.
I can match up to the end of the list item with /location_3.+?(?=<li|<\/ul)/ but is there something I can do to match before and not capture other items?

This should get what you want
<li>(?:(?!<li>)[\S\s])+location_1[\S\s]+?<\/li>
Exaplanation
<li>: open li tag,
(?:(?!<li>)[\S\s])+: match for any characters including a newline and use negative look ahead to make sure that your highlight will not consume two or more <li> tags,
location_1: keyword that you use for highlight the whole <li> tag,
[\S\s]+?: any characters including a newline. (Here, thanks #Tensibai for your comment that make this regex be more simple with non-greedy)
<\/li> close li tag.
DEMO: https://regex101.com/r/cU4eC6/5
Additional information:
/<li>(?:(?!<li>).)+location_2.+?<\/li>/s
This regex is also work where you use modifier s to handle a newline instead of [\S\s]. (Thanks again to #Tensibai)

Related

Regex to match two factors in one?

Using <div dir=.*?> works fine to match <div dir="auto">.
However, why does <div dir=.*?><br \/> not match <div dir="auto"><br />?
Code: https://regex101.com/r/5pP38n/1

The regexp starts matching at the first <div dir= in the input. Then it looks for the next ><br \/> in the input. .*? will match everything between them, which is
"auto">Please 🙏 sir my youtube channel delete <div dir="auto"
You don't match <div dir="auto"><br /> because it's contained inside this match, and a regexp doesn't return overlapping matches.
If you don't want .*? to match across multiple tags, you can use [^>]* instead.
<div dir=[^>]*><br \/>
DEMO

How to remove li tags with in Particular DIV tag in notepad ++ using regex

I have content like below
enter code here
<div class="content1">
<ul>
<li>line1</li>
<li>line2</li>
<li>line3</li>
</ul>
</div>
<div class="content2">
<ul>
<li>line4</li>
<li>line5</li>
<li>line6</li>
</ul>
</div>
I want to strip all li tags within and retain contents inside it. like below
enter code here
<div class="content1">
<ul>
line1
line2
line3
</ul>
</div>
<div class="content2">
<ul>
<li>line4</li>
<li>line5</li>
<li>line6</li>
</ul>
</div>
I have about 500 html files to edit.Is there any Regex code to achieve this in notepad++.

You can use a regex like this
<li>(.*?)<\/li>
With the replacement string:
$1
Working demo

The regex to match those tags are
\<li\>
\<\/li\>
The backslashes are used to treat special characters as 'normal' characters.
If you use terminal you can use stream edit which is
sed 's/\<li\>//' input.txt > output.txt
But in notepad++ i believe you can ctrl find and replace

regex to remove recurring instances of comment tag

Hello I want to remove all recurring instances of comment tag which occurs in a data.
Data which I am using is mentioned below
<!-- <li><a class="topitemlink" href="/About-Us/Career-Centre.aspx">Career Centre</a></li>
<li><img alt="" width="7" height="22" src="/images/common/separator.gif" /></li>-->
<li><a class="topitemlink" href="/ContactUs">Contact Us</a> <!-- <ul class="topcontactusmenu"><li>Contact Us</li><li>Contact the IR Team</li><li>Contact the Media Team</li></ul> --></li>
</ul>
</div>
<!--<img width="92" height="40" src="/ABMB/media/MyLibrary/Shared/Images/bizSmart_logo.gif" alt="" /><img width="76" height="40" src="/ABMB/media/MyLibrary/Shared/Images/sabah-run2015_top-icon.jpg" alt="" />-->
The regex I am using just captures the first instance but I want all instances to be captured.
<!--.*\s.*-->

You could use something like so: <!--.+?--> (Example here). Make sure that you have the sg flag enabled.
The s flag would allow the period character to also match new line feeds, thus allowing you to capture comments which span multiple lines.
The g flag will apply the pattern globally, that is, to the entire text.

You didn't specify the language you're using but for php you can use /<!--.*?-->/s , i.e.:
$html = '<!-- <li><a class="topitemlink" href="/About-Us/Career-Centre.aspx">Career Centre</a></li>
<li><img alt="" width="7" height="22" src="/images/common/separator.gif" /></li>-->
<li><a class="topitemlink" href="/ContactUs">Contact Us</a> <!-- <ul class="topcontactusmenu"><li>Contact Us</li><li>Contact the IR Team</li><li>Contact the Media Team</li></ul> --></li>
</ul>
</div>
<!--<img width="92" height="40" src="/ABMB/media/MyLibrary/Shared/Images/bizSmart_logo.gif" alt="" /><img width="76" height="40" src="/ABMB/media/MyLibrary/Shared/Images/sabah-run2015_top-icon.jpg" alt="" />-->';
$html = preg_replace('/<!--.*?-->/s', '', $html);
echo $html;
/*<li><a class="topitemlink" href="/ContactUs">Contact Us</a> </li>
</ul>
</div>*/
DEMO:
https://ideone.com/It6HvW
EXPLANATION:
<!--.*?-->
Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ don’t match at line breaks; Greedy quantifiers; Regex syntax only
Match the character string “<!--” literally «<!--»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character string “-->” literally «-->»

Regular expression for exactly one match

I am using the following regular expression in my code editor (sublime text) in order to search for the ASP.NET comments.
<%--.*(\n.*)*--%>
I want this regular expression to stop looking any forward as soon as the first --%> is found. But it keeps looking until the last comment's --%> is found. I have got this idea that i've to use some kind of flag to make it stop as soon as the first --%> but I am unable to figure it out.
Can anyone please tell me how may I modify this regex?
UPDATE
I forgot to post some sample markup. Here it is:
<div class="modal-footer">
<%--<button class="btn" data-dismiss="modal">
Close</button>
<button id="btnAddCountry" class="btn btn-primary" data-dismiss="modal">
Save changes</button>--%>
</div>
</div>
<div class="row-fluid">
<div class="span12">
<div class="box paint_hover">
<div class="title">
<h3>Sale Voucher</span>
</h3>
</div>
<div class="content">
<ul id="tabExample1" class="nav nav-tabs">
<li class="active"><a id="lnkAddEditVoucher" href="#AddEditVoucher" data-toggle="tab">Add/Update Sale Voucher</a></li>
<li><a id="lnkViewVouchers" href="#ViewVouchers" data-toggle="tab">Search Sale Voucher</a></li>
<%-- <li><a id="lnkViewParties" href="#ViewParties" data-toggle="tab">Search Parties</a></li>--%>
</ul>
I just want to match the first comment and not the second one.

You need to make the * quantifiers non-greedy. Usually this is done by adding a ? after them, e.g. .*? instead of just .*.
I've also simplified the regex a bit. Sublime Text supports the (?s) modifier at the beginning of the pattern to make the dot match even newlines:
(?s)<%--.*?--%>
If you prefer matching the newline explicitly:
<%--(.|\n)*?--%>

The problem you seem to have is that you use the greedy version of .*, which matches anything (including --%>). Try using <%--.*?(\n.*?)*?--%> instead to make it non-greedy.

Edit html document using regex replace and matching contents of only immediate child

I have html that looks like so:
<ul style="list-style-type: square;">
<br />
<li margin-left="80px">
<br />first line
<br />
<br />second line
</li>
<br />
<li margin-left="80px">
<br />text line 1
</li>
<br />
<li margin-left="80px">
<br />text line 2
</li>
<br />
</ul>
I want to match contents of the ul, but I don't want to match contents of the li elements
The end goal is to get rid of the <br /> tags that are directly under the <ul></ul> and not under the <li></li>
Note:For clarity of the example I did formate the above html, but in my real world scenario it comes as a single giant string without any /r/n's
here:
<p margin-left="40px"><br /> <b>[What is the nature of the Services?]</b></p><br /><p><br /> [What are the overarching goals, objectives and outcomes you want to achieve?]</p><br /><p margin-left="80px"><br /> <b><i><u>[How should the Services be delivered?]</u></i></b></p><br /><ul style="list-style-type: square;"><br /> <li margin-left="80px"><br /> gfhsdfsdf<br /><br /> some line here</li><br /> <li margin-left="80px"><br /> sfdsfsdfsdf</li><br /> <li margin-left="80px"><br /> sdfsdfsdf</li><br /></ul><br /><p><br /> [Is the appointment of this Supplier exclusive?]</p><br /><p><br /> [Refer to any proposal prepared by the Supplier if this helps describes any aspects of the Service]</p><br />
Anyway the first thing in my mind was to
use this to extract the contents of the <ul>
<ul[^>]*>(.*)</ul>
and then maybe do a subsequent one to select all the li
<li[^>]*>.*</li>
and then somehow get rid of anything else that's left over
but that's kind of lame and then again
<li[^>]*>.*</li>
matches whole bunch of li's
this entrie string gets captured:
<li margin-left="80px"><br />\t\tgfhsdfsdf<br /><br />\t\tsome line here</li><br />\t<li margin-left="80px"><br />\t\tsfdsfsdfsdf</li><br />\t<li margin-left="80px"><br />\t\tsdfsdfsdf</li>
i know it's because dot is greedy, but not sure how to avoid it
something like [^</li>]* wouldn't work cuz it treats it like list of characters not a string
any help much appreciated
So I have 2 problems
1) i don't like the way I'm approaching this - better ideas needed (I'm considering using set operations of linq to xml to achieve this) - still hope to do this with regex, but if anyone knows exactly how to do this then please share
2) how do I capture separate groups of lis instead of capturing entire first opening <li> and last closing </li>?

I think you should go look at this...
RegEx match open tags except XHTML self-contained tags
Then recognize that parsing html with a regex is not quite that easy. personally I would load the html in to an html dom object then crawl the document... you might look at this project for some help.
http://htmlagilitypack.codeplex.com/

Since you don't say which regex flavor you're using, here's a JavaScript-compatible regex to match a <br /> that's inside a <ul> element but not inside a <li> element:
<br\s*/>(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>)(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>)
Breaking that down,
<br\s*/> matches the BR tag, of course.
(?=[^<]*(?:<(?!/?ul\b)[^<]*)*</ul>) looks ahead for the next occurrence of </ul>, but only if it doesn't encounter a <ul> tag first.
(?![^<]*(?:<(?!/?li\b)[^<]*)*</li>) does the same thing with </li> and <li> tags, but this time negating the result.
Being JS compatible, this should work in Dreamweaver as well as in editors with solid regex support, like EditPad and TextMate. It's also compatible with most Perl-derived flavors (Python, .NET, Java, etc.), though some syntactic tweaking will probably be needed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex lookahead and behind? - regex

Related

Regex to match two factors in one?

How to remove li tags with in Particular DIV tag in notepad ++ using regex

regex to remove recurring instances of comment tag

Regular expression for exactly one match

Edit html document using regex replace and matching contents of only immediate child

Categories

Resources