Extra characters in XSLT output - xslt

I'm delving into XML and XSLT and am trying to generate a basic, tabular web page. The basic layout of the table seems to be ok, but I'm getting a column of " characters before I get my two-column table (which itself is in the second column of the web page). This is shown below:
There are exactly the number of " characters as there are elements of the XML file this is built from. The code that I think is causing the problem is listed below:
<tbody>
<xsl:for-each select="command">
<tr>
<td width="50%">
<xsl:value-of select="description"/>
</td>"
<td width="50%">
<xsl:value-of select="TLC"/>
</td>
</tr>
</xsl:for-each>
</tbody>
Is the character being generated in each xsl:for-each select? In the event that the above snippet of code looks good, I'll include the entirety of the XSLT file below. Feel free to let me know how dumpy my stuff looks, as I'm coming from a firmware and .NET background.
Thanks.
EDIT: Removed the full body of code since the answer, so obvious that I should smack myself, doesn't involve it.

The problem is right here:
</td>"
<td width="50%">
You're inserting a " which is not inside a cell, thus the browser displays it outside...

The code that I think is causing the
problem is listed below:
<tbody>
<xsl:for-each select="command">
<tr>
<td width="50%">
<xsl:value-of select="description"/>
</td>"
<td width="50%">
<xsl:value-of select="TLC"/>
</td>
</tr>
</xsl:for-each>
</tbody>
This is obvious: Just remove the " character in the 6th line of the above code snippet.

Related

How to get text from a preceding node using xpath

I am an experienced developer but I am something of a newbie to xpath and xslt which my company has suddenly decided to implement and I am hoping someone might be able to help me with a challenge I have.
If I have XML like this.....
<tr layoutcode="" type="categoryhead" level="2"
<td colname="caption">Alex</td> (a)
</tr>
<tr layoutcode="" type="categoryhead" level="3"
<td colname="caption">Miscellaneous</td>
</tr>
<tr layoutcode="" type="detail" level="4"
<td colname="caption">Something</td>
</tr>
<tr layoutcode="" type="detail" level="4"
<td colname="caption">This is a test</td> (b)
</tr>
... and I am on the 'This is a test' node (b), how do I get the get the text from (a) i.e. 'Alex'?
All I know is that I am looking for the first preceding 'tr' node with attributes of 'type' = 'categoryhead' and 'level' = 2, then I want the text from its 'td' child node. I guess I am looking for the right xpath query so I can assign it to a variable.
Many thanks in advance,
Alex
The path ../preceding-sibling::tr[#type = 'categoryhead' and #level = 2][1]/td[#colname = 'caption'] should select the td element.

Find a table's last cell by regular expression

I want to use Regular Expression (compatible with pcre) to select a table
cell in an XML or HTML file.This cell was expanded in several lines containing
other elements and relative attributes and values. Thiscell supposed to be at the last column.
for some reasons I can't and don't want to use ". matches newline" option.
for example in this code:
EDITED:
<table colcount="4">
<tr>
<td colspan="2">
<para><text> Mike</text></para>
</td>
<td>
<tab />
</td>
<td1>
<para><text>Jack</text></para>
<para><text>Sarah</text></para>
</td>
</tr1>
<tr>
<td>
<para><text>Bob</text></para>
<para><text>Rita</text></para>
</td>
<td2 colspan="3" with>
<para><text>Helen</text></para>
</td>
</tr2>
<tr>
<td style="with:445px;">
<para><text>Sam</text></para>
</td>
<td>
<para><text>Emma</text></para>
<para><text>George</text></para>
</td>
<td>
</td>
<td3 colspan="">
<tab />
</td>
</tr3>
</table>
/EDITED
I want to find and select the whole last cell together with its start and end tags (<td and </td>)
and the end tag of the corresponding row(</tr>), that is:
EDITED:
Here is what I want to select in the table like above using RegEx:
Either from <td1 to </tr1> - or from <td2 to </tr2> - or from <td3 to </tr3>
/EDITED
The format (indentation and new lines have to be preserved), I mean I can't put, for example
</tr> in front of of closing tag of the cell(</td>).
Indentation is only space character.
Thanks for any help...
Best you can do with regex is:
<td(([^<]|<(?!\/td>))*)<\/td>\s*<\/tr>(?!(.|\r|\n)*<tr)
But this is kinda ugly, resource intensive and breaks when you have nested tables. A better route is indeed to use an XML or HTML parser for whichever programming language you're using.
If you want to select the last cell from EVERY row, as your updated question suggests, leave out the negative lookahead like so:
<td(([^<]|<(?!\/td>))*)<\/td>\s*<\/tr>
Working example here: http://refiddle.com/gt2

Extract attribute value from html element

Been struggling with this for a couple of hours now...
I have the following regex:
(?<=\bdata-video-id=""."">)(.*?)(title=.*?>)
The following input:
<div class="cameras">
<table class="results">
<colgroup>
<col class="col0">
<col class="col1">
</colgroup>
<thead>
<tr>
<th title="Name">
Name
</th>
<th title="Date">
Date
</th>
</tr>
</thead>
<tbody>
<tr data-video-id="1">
<td title="149 - Cam123">
149 - Cam123
</td>
<td title="Feb 18 2013">
Feb 18 2013
</td>
</tr>
<tr data-video-id="2">
<td title="150 - Cam456">
150 - Cam456
</td>
<td title="Feb 18 2013">
Feb 18 2013
</td>
</tr>
</tbody>
</table>
</div>
The regex outputs this:
<td title="149 - Cam123">
<td title="150 - Cam456">
But what I'd like to get is the contents of the title attribute of the 1st cell from every table row:
149 - Cam123
150 - Cam456
The number of rows may obviously vary but the number of columns is fixed.
Please help me fine tune the above regex.
Thanks
NOTE: The solution MUST be a regular expression. I do not have access to the code base therefore an HTML parser or any other kind of code intervention is not possible. The only way I can hook into the application is by injecting a different regex.
Based on the OP requirements that it MUST be a regex, then my suggestion would be to add a group wrapper to the inner title information:
(?<=\bdata-video-id=""."">).*?title="(.*?)">
Otherwise, the general solution is to not use a regex:
Why are you using a regex? The typical solution for this due to the complexities of the tags is to use an HTML parser
Here is a SO about this topic
Here is another even more popular response on using regex for XHTML which was pointed out by Jeff Atwood in this blogpost

Validating html-style table data with XSLT

I need to be able to check xml with html-style table data to ensure that it's "rectangular". For example this is rectangular (2x2)
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
This is not
<table>
<tr>
<td>Foo</td>
<td>Bar</td>
</tr>
<tr>
<td>Baz</td>
</tr>
</table>
This is complicated by row and column spans and the fact that I need to accept two styles of markup, either where spanned cells are included as empty td or where span cells are omitted.
<!-- good (3x2), spanned cells included -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- also good (3x2), spanned cells omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
Here are a bunch of examples of bad tables where it's ambiguous how to deal with them
<!-- bad, looks like spanned cells are included but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td/>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but more cells in row 1 than 2 -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
<td>BAD</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
</tr>
</table>
<!-- bad, can't tell if spanned cells are included or omitted -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td/>
</tr>
</table>
<!-- bad, looks like spanned cells are omitted but a non-emtpy cell is overspanned -->
<table>
<tr>
<td colspan="2">Foo</td>
<td rowspan="2">Bar</td>
</tr>
<tr>
<td>Baz</td>
<td>Qux</td>
<td>BAD</td>
</tr>
</table>
I already have a working XSLT 2.0 solution for this problem that involves normalizing the data to the "spanned cells included" style then validating, however, my solution is cumbersome and starts to perform poorly for tables with an area of greater than 1000 cells. My normalization and validation routines involve iterating sequentially over the cells and passing along a param of cells that should be created by spans and inserting them when I pass their coordinates in the table. I'm not happy with either of them.
I'm looking for suggestions about cleverer ways in which to achieve this validation that hopefully would have better performance profiles on large tables. I need to account for th and td but omitted th from the examples for sake of simplicity, they can be included or ignored in any answers. I'm not checking to see if thead, tbody, and/or tfoot have the same width, this can also be included or omitted. I'm currently using XSLT 2.0 but I'd be interested in 3.0 solutions if they were significantly better than a solution implemented in 2.0.
I don't think this kind of problem is suited for XSLT - especially if you have to process very large tables.
I'd suggest to develop a solution using a procedural languge - maybe using XSLT to pre- or post- process the XML.

Something about regular expression

If I want to get the current price 416.00 of the following content, what regexp I can use to get it? There are some places in the webpage with similar content, except the one I want has the word Discount in a few lines after the current price. 416,520 and 20% are variable. Thanks.
<tr>
<td class="txt_11px_b_EB6495" width="50" nowrap>Current Price?</td>
<td class="txt_11px_b_EB6495">HK$ 416.00</td>
</tr>
<tr>
<td class="txt_11px_n_999999">Original price?</td>
<td class="txt_11px_n_999999">HK$ 520.00</td>
</tr>
<tr>
<td class="txt_9px_n_999999"> </td>
<td class="txt_9px_n_999999">Discount 20%</td>
</tr>
You can use
" (\d+\.\d*)</td>"
That will match 520.00, 2.00, 123.1, and 123.
Use a HTML parser to get the text node, then extract the price using a regex.
You would use something like...
\d+(?:\.\d{2}|%)
I just tested it and it matched...
416.00
520.00
20%
I assumed (it was unclear to me) you want the prices and the percentage discount. I also matched the % so you can tell what are the percentages in the matches.