XPath - invert selection - xslt

I'm writing an xpath and I wrote the xpath which matches some nodes in the document tree
But I'd like to match all the other elements instead, so I'd like to inverse the selection in the context of the whole document
So let's say we have the following xpath
//*[#id='menu']//*
and document
<body>
<div>
<h1>title</h1>
</div>
<div>
<div id="menu">
<UL>
<LI>home</LI>
<LI>about</LI>
</UL>
</div>
<div id="sidebar">
<ul>
<li>one</li>
<li>two</li>
</ul>
</div>
<div>
</body>
so the uppercased nodes are matched, but what I want to achieve is mark all the lowercase nodes.
Thanks for your help

Your original XPath expression
//*[#id='menu']//*
matches all element nodes that are descendants of the <div id="menu"> (but not the div element itself). Depending on exactly what you mean by the "inverse" you could try something like
//*[not(ancestor::*[#id='menu'])]
which matches all element nodes that do not have an ancestor with id="menu", which would include the div with id="menu" but not its children or grandchildren. If you want to exclude the div as well, use ancestor-or-self:: instead of ancestor::
If you have XPath 2.0 then a more general answer to your original question is to look at the except operator - in XPath 2.0 you can say X except Y to select all the nodes matched by expression X but not also by Y
//* except //*[#id='menu']//*
There are also union (nodes matched by either X or Y) and intersect (both X and Y) operators.

Related

Regexp: negative lookahead that don't allow an open tag inside a tag

I'm looking for a negative lookahead that don't allow an open tag inside a tag, I try
failing negative lookahead #1
/(<(\w+)[^>]*>)((?!<\2).*?)(<\/\2>)/gs
see the example
failing negative lookahead #2
/(<(\w+)[^>]*>)((?!<\2).*)(<\/\2>)/gs
see the example
alpha
<div>
alpha<div>
beta<div>
x < y divided by 4
</div>
</div>
</div>
<div>
<span style="font-size: 8pt;" disabled title="data">
<span>
infinite
</span>
<?= $record->id ?>
</span>
<div> equal </div>
</div>
<div> sum </div>
When x y and y > 0
<div style="font-size: 8pt;" >Summary</div>
Equation id <?= $equation->id ?>
In this exampled they're the once containing:
x < y divided by 4
infinite
equal
sum
summary
Here is a regex you may want to use:
.*?<(\w+)[^>]*>((?:(?!<\1>).)*?)<\/\1>|.*
Click on it for explanation and also to see how to use it.

Match values in pairs from Html using RegEx

I need to use Regex only to extract the following output:
Match 1: (Group 1: Packaged Quantity) (Group 2: 1)
Match 2: (Group 1: Width) (Group 2: 14.7 cm)
Given the following input:
<li>
<div class="col-3"> Packaged Quantity </div>
<div class="col-5"> 1 </div>
</li>
<li>
<div class="col-3"> Width </div>
<div class="col-5"> 14.7 cm </div>
</li>
So far I have tried using :
(?<=class=\"col-3\">)[^<]+|(?<=class=\"col-5\">)[^<]+
This gives me 4 different matches. But I want two matches, with two groups in each match. I know I could use xpath to do the same, but I am limited to use Regex for some constraints that I won't be able to comment on.
You can match the col-3"> at the start, then capture non-< characters for the first group, match </div> followed by non-> characters, and capture non-< characters again for the second group:
col-3">([^<]+)<\/div>[^>]+>([^<]+)
https://regex101.com/r/YAZFvV/1
(that said, if at all possible, it would be better to use a proper HTML parser for this sort of thing)

Regex lookahead and behind?

So I have a unordered list that looks like:
<ul class='radio' id='input_16_5'>
<li>
<input name='input_5' type='radio' value='location_1' id='choice_16_5_0' />
<label for='choice_16_5_0' id='label_16_5_0'>Location 1</label></li>
<li>
<input name='input_5' type='radio' value='location_2' id='choice_16_5_1' />
<label for='choice_16_5_1' id='label_16_5_1'>Location 2</label></li>
<li>
<input name='input_5' type='radio' value='location_3' id='choice_16_5_2' />
<label for='choice_16_5_2' id='label_16_5_2'>Location 3</label></li>
</ul>
I would like to pass a value (ie. location_2) to a regular expression that will then capture the whole list item that it's a part of in order to remove it. So if I pass it location_2 it will match the to the (including) <li> and the </li> of the list item that it's in.
I can match up to the end of the list item with /location_3.+?(?=<li|<\/ul)/ but is there something I can do to match before and not capture other items?
This should get what you want
<li>(?:(?!<li>)[\S\s])+location_1[\S\s]+?<\/li>
Exaplanation
<li>: open li tag,
(?:(?!<li>)[\S\s])+: match for any characters including a newline and use negative look ahead to make sure that your highlight will not consume two or more <li> tags,
location_1: keyword that you use for highlight the whole <li> tag,
[\S\s]+?: any characters including a newline. (Here, thanks #Tensibai for your comment that make this regex be more simple with non-greedy)
<\/li> close li tag.
DEMO: https://regex101.com/r/cU4eC6/5
Additional information:
/<li>(?:(?!<li>).)+location_2.+?<\/li>/s
This regex is also work where you use modifier s to handle a newline instead of [\S\s]. (Thanks again to #Tensibai)

Parse specific div from raw text using regex?

So I'm in a situation that requires parsing raw HTML data as a string, this is unavoidable unfortunately otherwise I wouldn't post this. I only need regex to match the class of a div that has an img tag as a child.
So this is the code example that I'm dealing with:
<div class="summary">
<h3>Example</h3>
<div class="explanation">
<span>This serves as an example for the site.</span>
</div>
<div class="user-details">
mheathershaw<br>
<img src="res/badge522.png"/> <span class="score">522</span>
</div>
<div class="help">
Help
</div>
</div>
And the div that I'd like to retrieve the class from is the div that contains the image. The exact capture from this example that I'd like (optimally) is user-details. The criteria for capturing it is simply if it has <img ... /> as a child.
Anyone able to help? Thanks!
You may try this,
/<div\b[^>]*\bclass="([^"]*)"[^>]*>(?:(?!<\/div>)[\s\S])*?<img\b[^>]*>(?:(?!<\/div>)[\s\S])*?<\/div>/
DEMO

Templating regex versus DOM nodes; Modifying attributes?

I've asked a few questions as of recent on this topic, and whether answered or not, I've been learning a fair amount about the tech involved. In any case;
I've been reworking a templating engine I had created previously, moving the parsing engine from being regular expression driven, to node (XML) driven. For comparison's sake here are the two:
Regex driven:
<body>
<!-- {{ region:myRegion }} -->
<div class="myClass">
<h1>{{ var:myHeading format:trim[200] }}</h1>
</div>
<!-- {{ region:myRegion }} -->
</body>
Node driven:
<body>
<zuq:region name="myRegion">
<div class="myClass">
<h1>
<zuq:data name="myHeading">
<zuq:format type="trim">
<zuq:param name="length" value="200" />
</zuq:format>
</zuq:data>
</h1>
</div>
</zuq:region>
</body>
Now while much more verbose, I figure the node driven approach here is preferred, giving much more flexibility for situations like formatting, where multiple format nodes can be inserted and processed in order of appearance.
Anyways, my problem lies in attributes. With the regex driven approach, if I want to have a template generated value in an attribute, it's as simple as:
Link
I'm trying to figure out how to incorporate a clean implementation of generating attribute values, while keeping the documents well formed. Something to consider is again, the formatting options, among other possible elements that the parser would read as modifiers to data.
Any ideas?
<a>
<zug:attr name="href">page.php?param=<zug:data name="myParam" /></zug:attr>
Link
</a>