xpath descendant and descendant-or-self work completely different - python-2.7

I try to find all seconds tds among the descendants of div with the specified id, i.e. 22 and 222. The first solution that comes to my mind was:
//div[#id='indicator']//td[2]
but it selects only the first table cell, i.e. 22 but not both 22 and 222.
Then I replaced // with /descendant-or-self::node()/ and got the same result (obviously). But when I removed '-or-self' the xpath expression started to work as expected
test1 = test_tree.xpath(u"//div[#id='indicator']/descendant-or-self::node()/td[2]")
print len(test1) #prints 1 (first one: 22)
test1 = test_tree.xpath(u"//div[#id='indicator']/descendant::node()/td[2]")
print len(test1) #prints 2 (22 and 222)
Here is test HTML
<html>
<body>
<div id='indicator'>
<table>
<tbody>
<tr>
<th>1</th>
<th>2</th>
<th>3</th>
</tr>
<tr>
<td>11</td>
<td>22</td>
<td>33</td>
</tr>
<tr>
<td>111</td>
<td>222</td>
<td>333</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
I'm wondering why both expressions don't work identically since all the tds are descendants of div element no matter div included or not.

I think you have found a bug in your XPath processor.

I think I've found the cause of this issue:
http://www.w3.org/TR/xpath20/#id-errors-and-opt
"In some cases, a processor can determine the result of an expression without accessing all the data that would be implied by the formal expression semantics. For example, the formal description of filter expressions suggests that $s[1] should be evaluated by examining all the items in sequence $s, and selecting all those that satisfy the predicate position()=1. In practice, many implementations will recognize that they can evaluate this expression by taking the first item in the sequence and then exiting."
So there is no remedy. It's xpath processor implementation dependent however I still don't understand why //div[#id='indicator']/descendant-or-self::node()/td[2] and //div[#id='indicator']/descendant::node()/td[2] produce different results.

I developed a web page contains the HTML you provided in your question.
When you use this xpath:
.//div[#id='indicator']//tr/td[2]
It works as expected and the result is:
[u'<td>22</td>', u'<td>222</td>']
However, according to your comment, you were asking when .//td[2] doesn't work. The reason is .//td gives you a list of all the td(s) in your DOM. Adding an index such as [2] will result in the second td in that list
To sum up:
These are the results of applying .//td and .//td[2] respectively:
and if you want to take the text inside these tds, you should add /text() as the following:
Update:
The OP said:
So why then //div[#id='indicator']/descendant::node()/td[2] produces ['22', '222']? According to your comment: "Adding an index such as [2] will result in the second td in that list" it should populate only ['22'].
I will try to explain what is going on here:
descendant:node() doesn't equal to //
the equal to // is: descendant-or-self::node()
It is explained at W3C specification:
I hope this code could help you:

Related

Is there documentation for uncommon arguments in R Shiny renderTable?

I am using the solution to the following question:
How can symbols be used in a Shiny table header?
My question is >> does anyone know where there might be some reference material for the uncommon arguments? I've looked at the R documentation and have come up short.
I'm referring to arguments such as 'include.colnames', and 'add.to.row' from #Minnow's code in the answer to the original question. Here is the code:
output$mytable2 <- renderTable({mytable()},include.colnames=FALSE,
add.to.row = list(pos = list(0),
command = " <tr> <th> &#931 </th> <th> σ</th> <th> ẟ</th> <th> 🂡</th> <th> ☺ </th> </tr>" ))
Any breadcrumbs are appreciated!
Yes, there is more documentation inside shiny/the used packages, but it's a bit hidden. If you look at the documentation of help(renderTable), you see that besides the explained arguments there is .... This means that the function passes further arguments to functions it calls. It is specified that renderTable will pass these additional arguments to xtable::xtable() and xtable::print.xtable(). So it's a good idea to look at these help pages, and indeed you find the documentation for add.to.row there.

XSLT 2.0 moving a node (created in first step of a multi-step transformation)

XML and the XSLT 2.0 files for this question are found at https://xsltfiddle.liberty-development.net/6qVRKwX/3
I am trying to 'move' an element ahead of outputting a section of HTML. This element was created during the first part of the transformation) using #mode to insert footnote numbers into the text. The first mode fn-add-marker creates <fn-marker/> to hold the footnote number. The second mode number then inserts incremented footnote numbers. All of this works fine (through to line 52 and then after 68 in the XSLT fiddle).
Now I need to 'move' an element into the sibling element that spawned it in the mode above. I've combined this with the HTML output: the final idea is that element <tei:seg> is transformed into HTML <p> such that this :
<seg type="dep_event">text</seg><fn-marker>incremented no.</fn-marker>
Now becomes this HTML (where seg = p, and fn-marker = sup:
<p>text<sup>incremented no.</sup></p>
ie. where a condition is met, the footnote is brought inside a sibling element to be contained in <p>.
The code I inserted (below) works for 3 of 4 needed steps to accomplish this move. It seems the code associated with step 3 does not locate a value in <fn-marker/>. But if I remove all this, the value is in fact there! It makes me think this is a problem of modes.
The code below does this:
output each instance of <tei:seg #type="dep_event>" into a <p> works
create the <sup>inside the <seg> that meets the sibling condition works
copy the text() content of <fn-marker> into the <sup> that meets the sibling condition does not work
destroy the old <fn-marker>1</fn-marker> works
Referring to line numbers at https://xsltfiddle.liberty-development.net/6qVRKwX/3:
line 56-63:
<xsl:template match="tei:seg[#type='dep_event']">
<p>
<xsl:apply-templates/>
<xsl:if test="following-sibling::node()[1][self::tei:fn-marker]">
<!-- next line of code does not find a value in /text() -->
<sup><xsl:value-of select="./following-sibling::node()[1][self::tei:fn-marker/]text()"/></sup>
</xsl:if>
</p>
</xsl:template>
line 66:
<xsl:template match="tei:fn-marker[preceding-sibling::node()[1][self::tei:seg[#type='dep_event']]]"/>
Thanks in advance.

pugixml: selecting nodes fails

I'm using pugixml to parse the following xml:
<td class="title">
<div class="random" />
Link1
</td>
<td class="title">
<div class="random" />
Link2
</td>
etc...
I want the value of every 'a href' in a td class ="title" (which appears an indeterminate number of times) but only the first such instance.
I am using the following code to try and get these values:
pugi::xpath_node_set link_nodes = list_doc.select_nodes("//td[#class='title']");
for (pugi::xpath_node_set::const_iterator it = link_nodes.begin();it != link_nodes.end();++it)
{
pugi::xpath_node single_link_node = *it;
std::cout << single_link_node.node().select_single_node("//a").node().attribute("href").value()<<std::endl;
}
which doesn't seem to work (it outputs number of times but with a value that doesn't even seem to appear within that element).
Thanks.
"//a" selects all "a" nodes in the document; you probably meant ".//a" that selects all "a" nodes in the subtree.
You can also use one XPath expression instead of multiple:
//td[#class='title']//a[1]
This selects the first tag for each td - i.e. [1] only applies to //a, not to the full expression.

How to use contains() with a set of strings in XSLT

I have the following XML snippet:
<figure customer="ABC DEF">
<image customer="ABC"/>
<image customer="XYZ"/>
</figure>
I'd like to check if the figure element's customer attribute contains the customer attributes of the image elements.
<xsl:if test="contains(#customer, image/#customer)">
...
</xsl:if>
I get an error saying:
a sequence of more than one item is not allowed as the second argument of contains
It's important to note that I cannot tell the values of the customer attributes in advance, thus using xsl:choose is not an option here.
Is it possible to solve this without using xsl:for-each?
In XSLT 2.0 you can use:
test="image/#customer/contains(../../#customer, .) = true()"
and you will get a true() result if any of them are true. Actually, that leads me to suggest:
test="some $cust in image/#customer satisfies contains(#customer, $cust)"
but that won't address the situation where the customer string is a subset of another customer string.
Therefore, perhaps this is best:
test="tokenize(#customer,'\s+') = image/#customer"
... as that will do a string-by-string comparison and give you true() if any of the tokenized values of the figure attribute is equal to one of the image attributes.

C++, subtract certain strings?

This is a homework, thus I hope you guys don't give me the direct answers/code, but guide me to the solution.
My problem is, I have this XXX.html file, inside have thousands of codes. But what I need is to extract this portion:
<html>
...
<table>
<thead>
<tr>
<th class="xxx">xxx</th>
<th>xxx</th> <th>xxx</th> </tr>
</thead>
<tbody>
<tr class=xxx>
<td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td> <td class="xxx">ZZZZ</td> </tr> <tr class=xxx>
<td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td> <td class="xxx">ZZZZ</td> </tr> <tr class=xxx>
<td class="xxxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td> <td class="xxxx">zzzz</td> </tr> <tr class=xxx>
<td class="xxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
... and so on
This is my current codes so far:
// after open the file
while(!fileOpened.eof()){
getline(fileOpened, reader);
if(reader.find("ZZZ")){
cout << reader << endl;
}
}
The "reader" is a string variable that I want to hold for each line of the HTML file. If the value of ZZZZ, as I need to get live, the value will change, what method should I use instead of using "find" method? (I am really sorry, for not mention this part)
But instead of display the value that I want, it display the some others portion of the html file. Why? Is my method wrong? If my method is wrong, how do I extract the ZZZZZ value?
std::string::find does not return a boolean value. It returns an index into the string where the substring match occurs if it is successful, else it returns std::string::npos.
So you would want to say:
if (reader.find("ZZZ") != std::string::npos){
cout << reader << endl;
}
In general using string matching just won't work to extract values from an HTML file. A proper HTML parser would be required -- they are available for C++ as standard code.
Otherwise I'd suggest using a regex library (boost::regex until C++0x comes out). You'll be able to write better expressions to capture the part of the file you are interested in.
Reading by line probably won't work since an HTML file could be one large line. Outputing then each line you find will simply emit the entire file. Thus try the regexes and look for small sections of the code and output those. The regex library will have a "match all" command (I forgot the exact name).
The skeleton code for reading lines from a file should look like this:
if( !file.good() )
throw "opening file failed!";
for(;;) {
std::string line;
std::getline(file, line);
if( !file.good() )
break;
// reading succeeded, process line
}
if(!file.eof())
// error before reaching EOF
(That funny looking loop is one that checks for the ending condition in the middle of the loop. There is not such thing in C++, so you have to use an endless loop with a break in the middle.)
However, as I said in a comment to your question, reading HTML code line-by-line isn't necessarily useful, as HTML doesn't rely on specific whitespaces.