How to get text from a preceding node using xpath - xslt

I am an experienced developer but I am something of a newbie to xpath and xslt which my company has suddenly decided to implement and I am hoping someone might be able to help me with a challenge I have.
If I have XML like this.....
<tr layoutcode="" type="categoryhead" level="2"
<td colname="caption">Alex</td> (a)
</tr>
<tr layoutcode="" type="categoryhead" level="3"
<td colname="caption">Miscellaneous</td>
</tr>
<tr layoutcode="" type="detail" level="4"
<td colname="caption">Something</td>
</tr>
<tr layoutcode="" type="detail" level="4"
<td colname="caption">This is a test</td> (b)
</tr>
... and I am on the 'This is a test' node (b), how do I get the get the text from (a) i.e. 'Alex'?
All I know is that I am looking for the first preceding 'tr' node with attributes of 'type' = 'categoryhead' and 'level' = 2, then I want the text from its 'td' child node. I guess I am looking for the right xpath query so I can assign it to a variable.
Many thanks in advance,
Alex

The path ../preceding-sibling::tr[#type = 'categoryhead' and #level = 2][1]/td[#colname = 'caption'] should select the td element.

Related

Xpath - Retrieveing Text value when condition contains a tag

I have section of a table and I am trying to get the value "Distributor 10"
<table class="d">
<tr>
<td class="ah">supplier<td>
<td class="ad">
Supplier 10
</td>
</tr>
<tr>
<td class="ah">distributor<pre><td>
<td class="ad">
Distributor 10
</td>
</tr>
</table>
If I am within Chrome Developer, I get this value by using the following xpath string
//tr/td[text()="distributor]/following-sibling::td[#class="ad"]/a/text()
But when I code this in python - it returns an empty list... From what I can see its is because of the <pre> tag next to "distributor"
When I amend the above mentioned xpath to look for "supplier" instead of distributor it works perfectly well
any suggestions would be welcome
Assuming you're using lxml you can use one of the following XPath to get this working :
//tr[contains(.,"distributor")]//a/text()
//a[parent::td[#class="ad"] and starts-with(#href,"/D")]/text()
Piece of code :
from lxml import etree
from io import StringIO
html = '''<table class="d">
<tr>
<td class="ah">supplier<td>
<td class="ad">
Supplier 10
</td>
</tr>
<tr>
<td class="ah">distributor<pre><td>
<td class="ad">
Distributor 10
</td>
</tr>
</table>'''
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
data = tree.xpath('//tr[contains(.,"distributor")]//a/text()')
print (data)
Output : ['Distributor 10']
Alternative : use lxml html cleaner class ("remove_tags") to remove the pre element from your page.
References :
https://lxml.de/api/lxml.html.clean.Cleaner-class.html
https://lxml.de/lxmlhtml.html#cleaning-up-html

Multidimensional list in Thymeleaf (Java) - List<List<Object>>

Have anyone of you any suggestion, how to iterate through a multidimensional list in Thymeleaf?
My multidimensional list looks as follow:
#Override
public List<List<PreferredZone>> findZonesByPosition(List<Position> positionList) {
List <PreferredZone> prefZone = new ArrayList<>();
List<List<PreferredZone>> listPrefZone = new ArrayList<>();
long positionId = 0;
for (int i = 0; i < positionList.size(); i++) {
positionId = positionList.get(i).getPositionId();
prefZone = prefZoneDAO.findFilteredZone(positionId);
listPrefZone.add(prefZone);
}
return listPrefZone;
}
In my controller as attribute:
List<List<PreferredZone>> prefZoneList = prefZoneService.findZonesByPosition(positionList);
model.addAllAttributes(prefZoneList);
Finally I try to iterate this two dimensional list in a HTML table:
<table th:each="prefList :#{prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<!-- Loop für die Daten -->
<tr th:each="row, iterState :${prefList}" class="clickable-row">
<td th:text="${row[__${iterState.index}__]}.zoneIso"></td>
<td th:text="${row[__${iterState.index}__]}.zoneName"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${${row[__${iterState.index}__]}.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
It doesn't work however. I don't have any other idea how to solve this.
I have to have a multidimensional list, because I have got a table with multiple records and each record contains a button to open a modal window. Each of this windows contains either a HTML table where I have to display the records.
Have you got any suggestion for me?
You have a mistake in #{prefZoneList} and (as noted in comments) in using iterState.index
Try it:
<table th:each="prefList : ${prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<tr th:each="row : ${prefList}" class="clickable-row">
<td th:text="${row.zoneIso}"></td>
<td th:text="${row.zoneName}"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${row.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
Syntax #{...} - a message Expressions
iterState.index is the current iteration index, starting with 0, using like ${prefList[__${iterState.index}__].element} where element - filed in prefList.

regular expression: what's wrong with my expression?

I have a difficulty building a regex.
Suppose there is a html clip as below.
I want to use Javascript to cut the <tbody> part with the link of "apple"(which <a> is inside of the <td class="by">)
I construct the following expression :
/<tbody.*?text[\s\S]*?<td class="by"[\s\S]*?<a.*?>apple<\/a>[\s\S]*?<\/tbody>/g
But the result is different from what I wanted. Each match contains more than one block of <tbody>. How it should be? Regards!!!!
(I tested with https://regex101.com/ and get the unexpected selection. Please forgive me I can't figure out the problem :( )
<tbody id="text_0">
<td class="by">
...lots of other tags
cat
...lots of other tags
</td>
</tbody>
<tbody id="text_1">
...lots of other tags
<td class="by">
apple
</td>
...lots of other tags
</tbody>
<tbody id="text_2">
...lots of other tags
<td class="by">
cat
</td>
...lots of other tags
</tbody>
<tbody id="text_3">
...lots of other tags
<td class="by">
...lots of other tags
tiger
</td>
...lots of other tags
</tbody>
<tbody id="text_4">
<td class="by">
banana
</td>
</tbody>
<tbody id="text_5">
<td class="by">
peach
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_7">
<td class="by">
banana
</td>
</tbody>
And this is what i expect to get
<tbody id="text_1">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
This is not an answer to the regex part of the question, but shouldn't the td elements be embedded in tr elements? tr stands for "table row", while tbody stands for "table body". tbody usually groups the table rows. It is not prohibited to have more than one tbody in the same table, but it is usually not necessary. (tbody is actually optional; you can have tr directly inside the table element.)
First, Regex is not a good solution for parsing anything like HTML or XML.
I can fix your pattern to work with this specific example but I can't guarantee that it will work in all cases. Regex just is not the right tool for the job.
But anyway, replace the first 2 instances of [\s\S] in your pattern with [^<].
<tbody.*?text[^<]*?<td class="by"[^<]*?<a.*?>apple<\/a>[\s\S]*?</tbody>
Start with this working regexp and go from there:
/<a href="(.*?)">apple<\/a>/g
If that is too broad and you want to make it more specific, add the next surrounding tag:
/<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Then continue:
/<tbody.*?>\s*<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Also, consider an alternate solution such as XPATH. Regular expressions can't really parse all variations of HTML.

Validating html-style table data with XSLT

I need to be able to check xml with html-style table data to ensure that it's "rectangular". For example this is rectangular (2x2)
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
This is not
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
</tr>
</table>
This is complicated by row and column spans and the fact that I need to accept two styles of markup, either where spanned cells are included as empty td or where span cells are omitted.
<!-- good (3x2), spanned cells included -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- also good (3x2), spanned cells omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
Here are a bunch of examples of bad tables where it's ambiguous how to deal with them
<!-- bad, looks like spanned cells are included but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
<!-- bad, can't tell if spanned cells are included or omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but a non-emtpy cell is overspanned -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td>BAD</td>
</tr>
</table>
I already have a working XSLT 2.0 solution for this problem that involves normalizing the data to the "spanned cells included" style then validating, however, my solution is cumbersome and starts to perform poorly for tables with an area of greater than 1000 cells. My normalization and validation routines involve iterating sequentially over the cells and passing along a param of cells that should be created by spans and inserting them when I pass their coordinates in the table. I'm not happy with either of them.
I'm looking for suggestions about cleverer ways in which to achieve this validation that hopefully would have better performance profiles on large tables. I need to account for th and td but omitted th from the examples for sake of simplicity, they can be included or ignored in any answers. I'm not checking to see if thead, tbody, and/or tfoot have the same width, this can also be included or omitted. I'm currently using XSLT 2.0 but I'd be interested in 3.0 solutions if they were significantly better than a solution implemented in 2.0.
I don't think this kind of problem is suited for XSLT - especially if you have to process very large tables.
I'd suggest to develop a solution using a procedural languge - maybe using XSLT to pre- or post- process the XML.

jmeter grab value from response data

I have a question about grabbing a certain value from the html response data in Jmeter.
I've been trying both regular expression and xpath extractor(see below) but having no luck.
This is part of the response data I receive:
<table border="0" cellpadding="2" cellspacing="1" style="border-collapse: collapse" id="AutoNumber2" bordercolorlight="#999999" bordercolordark="#999999" width="100%">
<tr>
<td class="head" align="center" colspan="2">Routing Sheet</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext">Today's Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> HCSC Received Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Package Log Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012 04:21PM</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Group Specialist:</td>
<td valign="top" width="50%" class="formtext">WATTS, JOHN</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Case Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Medical Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext">Case Number:</td>
<td valign="top" width="50%" class="formtext">7402628</td>
</tr>
And I'm trying to grab the case number.
I have been trying the regex extractor:
Case Number:</td><td valign="top" width="50%" class="formtext">(.+?)</td>
But got a null value back.
And for xpath extractor I tried this:
//table[#id='AutoNumber2']/tbody/tr[8]/td[2]
but it's not working either.
I've been thinking of using Beanshell to grab the source code as a string and parse the number.
Is there any better way of grabbing that number?
And how can I use beanshell to grab the source code of the response data?
I tried using xpath of /html but have no luck.
Thanks a lot
Try this, I tested it on your sample and it works :
Let me know if that works for you
Try this xpath:
//table[#id='AutoNumber2']/tr[8]/td[2]
If you are using XPath Extractor to parse HTML (not XML!..) response ensure that Use Tidy (tolerant parser) option is CHECKED.
Your xpath query should return value you want to extract.
Refine your xpath query, e.g.
//table[#id='AutoNumber2']/tbody/tr[td/text()='Case Number:']/td[2]/text()
To use Beanshell for parsing look into this: Using jmeter variables in xpath extractor.
You can first test your xpath query using any other tool - Firefox addons at least:
XPath Checker
XPather
XPath Finder
You can use ViewResultsTree listener component to test and tweak your regex expression on your actual response data.
To find out what happens in runtime use Debug component.
At the first glance I see that it doesn't match because you're missing new line in your regex expression (following Case Number:</td>).
See here for special characters that emulate new line.