XSLT/XPath - count adjacent nodes - xslt

I want to transform a table with row span from docx-xml to html. The problem is to count the number of cells that are spanned. I need the number of rows for the html-attribute "rowspan".
In docx-xml first cell with rowspan is indicated with <span-element/attribute>. The following spanned cells are indicated only by <span-element> - without an attribute. So I need to count the number of <span-elements>, immediate following the <span-element> with attribute.
I need to count cells in every column separately - since in every column can be another different rowspan.
I also can´t count just the total amount of <span-element> in a column - since there can be more than one rowspan.
I tried different approaches: I counted with <xsl:value-of select="count(following-sibling::w:tc[1][//w:vmerge])"/> - which is fine as long as there is only one occurance of rowspan. I tried to group adjacent nodes with for-each-group, but don´t know how to count nodes in it.
Simplified structure of original code:
<tbl>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element/start-attribute>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element/start-attribute>
</tc>
<tc>
<span-element/start-attribute>
</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element>
</tc>
<tc>
<span-element>
</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
</tbl>
My template:
<xsl:template match="tc">
<xsl:choose>
<xsl:when test="span-element[#start-attribute]">
<td>
<xsl:attribute name="rowspan">
<xsl:value-of select="count(following-sibling::tc[1][span-element])"/>
</xsl:attribute>
<xsl:value-of select="."/>
</td>
</xsl:when>
<xsl:when test="span-element[not(#start-attribute)]"/>
<xsl:otherwise>
<td>
<xsl:apply-templates/>
</td>
</xsl:otherwise>
</xsl:choose>
The second <xsl:when> deletes occurances of cells with "span-element" without "start-attribute" as needed in wanted HTML-result. There is also gridspan in the real document, which I need to also integrate later (this is why in the actual result provided below in the 4th and 7th row only is one cell less than expected).
Actual result:
<!DOCTYPE HTML>
<table xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" border="1">
<tr>
<th>Zeile 1, Zelle 1</th>
<th>Zeile 1, Zelle 2</th>
<th>Zeile 1, Zelle 3</th>
<th>Zeile 1, Zelle 4</th>
</tr>
<tr>
<td rowspan="3">Zeile 2 + 3, Zelle 1</td>
<td>Zeile 2, Zelle 2</td>
<td>Zeile 2, Zelle 3</td>
<td>Zeile 2, Zelle 4</td>
</tr>
<tr>
<td>Zeile 3, Zelle 2</td>
<td>Zeile 3, Zelle 3</td>
<td>Zeile 3, Zelle 4</td>
</tr>
<tr>
<td>Zeile 4, Zelle 1</td>
<td>Zeile 4, Zelle 2 + 3</td>
<td>Zeile 4, Zelle 4</td>
</tr>
<tr>
<td rowspan="3">Zeile 5 + 6 +7, Zelle 1</td>
<td rowspan="2">Zeile 5 + 6, Zelle 2</td>
<td>Zeile 5, Zelle 3</td>
<td>Zeile 5, Zelle 4</td>
</tr>
<tr>
<td>Zeile 6, Zelle 3</td>
<td>Zeile 6, Zelle 4</td>
</tr>
<tr>
<td>Zeile 7, Zelle 2 + 3</td>
<td>Zeile 7, Zelle 4</td>
</tr>
<tr>
<td>Zeile 8, Zelle 1</td>
<td>Zeile 8, Zelle 2</td>
<td>Zeile 8, Zelle 3</td>
<td>Zeile 8, Zelle 4</td>
</tr>
<tr>
<td>Zeile 9, Zelle 1</td>
<td>Zeile 9, Zelle 2</td>
<td>Zeile 9, Zelle 3</td>
<td>Zeile 9, Zelle 4</td>
</tr>
</table>
Wanted result:
<!DOCTYPE HTML>
<table xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" border="1">
<tr>
<th>Zeile 1, Zelle 1</th>
<th>Zeile 1, Zelle 2</th>
<th>Zeile 1, Zelle 3</th>
<th>Zeile 1, Zelle 4</th>
</tr>
<tr>
<td rowspan="2">Zeile 2 + 3, Zelle 1</td>
<td>Zeile 2, Zelle 2</td>
<td>Zeile 2, Zelle 3</td>
<td>Zeile 2, Zelle 4</td>
</tr>
<tr>
<td>Zeile 3, Zelle 2</td>
<td>Zeile 3, Zelle 3</td>
<td>Zeile 3, Zelle 4</td>
</tr>
<tr>
<td>Zeile 4, Zelle 1</td>
<td>Zeile 4, Zelle 2 + 3</td>
<td>Zeile 4, Zelle 4</td>
</tr>
<tr>
<td rowspan="3">Zeile 5 + 6 +7, Zelle 1</td>
<td rowspan="2">Zeile 5 + 6, Zelle 2</td>
<td>Zeile 5, Zelle 3</td>
<td>Zeile 5, Zelle 4</td>
</tr>
<tr>
<td>Zeile 6, Zelle 3</td>
<td>Zeile 6, Zelle 4</td>
</tr>
<tr>
<td>Zeile 7, Zelle 2 + 3</td>
<td>Zeile 7, Zelle 4</td>
</tr>
<tr>
<td>Zeile 8, Zelle 1</td>
<td>Zeile 8, Zelle 2</td>
<td>Zeile 8, Zelle 3</td>
<td>Zeile 8, Zelle 4</td>
</tr>
<tr>
<td>Zeile 9, Zelle 1</td>
<td>Zeile 9, Zelle 2</td>
<td>Zeile 9, Zelle 3</td>
<td>Zeile 9, Zelle 4</td>
</tr>
</table>

In XSLT/XPath 1.0 context, traversing the following axis is better for grouping adjacents. The selection of nodes in the current group and nodes in the following groups is combined with set operations by means of the Kayian method.
This stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="span-element[#start-attribute]"/>
<xsl:template match="tc[span-element[not(#start-attribute)]]"/>
<xsl:template match="tc">
<td>
<xsl:apply-templates select="." mode="rowspan">
<xsl:with-param name="position" select="position()"/>
</xsl:apply-templates>
<xsl:apply-templates select="#*|node()"/>
</td>
</xsl:template>
<xsl:template match="node()" mode="rowspan"/>
<xsl:template match="tc[span-element/#start-attribute]" mode="rowspan">
<xsl:param name="position"/>
<xsl:variable
name="followings"
select="../following-sibling::tr/tc[$position]/span-element"/>
<xsl:variable
name="mark"
select="$followings[#start-attribute][1]"/>
<xsl:attribute name="rowspan">
<xsl:value-of
select="count($followings[count(.|$mark|$mark/following::*)
!=count($mark|$mark/following::*)]) + 1"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
With this well-formed input:
<tbl>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element start-attribute=""/>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element/>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element start-attribute=""/>
</tc>
<tc>
<span-element start-attribute=""/>
</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element/>
</tc>
<tc>
<span-element/>
</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>
<span-element/>
</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
<tr>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
<tc>...</tc>
</tr>
</tbl>
Output:
<tbl>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td rowspan="2"/>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td rowspan="3"/>
<td rowspan="2"/>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
</tbl>
Note: this assumes that the input source has the correct fortmat (e.g. no overlapping).

Related

Show a List of value vertically in jsp page

<table class="table">
<thead>
<tr>
<th scope="col">Book Id</th>
<th scope="col">Book Name</th>
<th scope="col">Book Author</th>
</tr>
</thead>
<tbody>
<tr>
<td><c:forEach items="${searchedBook}" var="book">
${book.book_id}
</c:forEach></td>
</tr>
<tr>
<td><c:forEach items="${searchedBook}" var="book">
${book.book_name}
</c:forEach></td>
</tr>
<tr>
<td><c:forEach items="${searchedBook}" var="book">
${book.book_author}
</c:forEach></td>
</tr>
</tbody>
</table>
how the table looks like now
but how I want it is like this,
the way I want it to look like
Is there a way to make it possible?
You declare forEach loop inside <td> tag it print all data in single <td>
Here down modified code:
<table class="table">
<thead>
<tr>
<th scope="col">Book Id</th>
<th scope="col">Book Name</th>
<th scope="col">Book Author</th>
</tr>
</thead>
<tbody>
<tr>
<c:forEach items="${searchedBook}" var="book">
<td>${book.book_id}</td>
</c:forEach>
</tr>
<tr>
<c:forEach items="${searchedBook}" var="book">
<td>${book.book_name}</td>
</c:forEach>
</tr>
<tr>
<c:forEach items="${searchedBook}" var="book">
<td>${book.book_author}</td>
</c:forEach>
</tr>
</tbody>
</table>

Graphviz/ Table/ How to merge cells

I would like to draw a graph like this -
I have Graphviz code like this -
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr>
<td port="f0" border="1" bgcolor="darkorange">TEST</td>
<td port="f1" border="1" bgcolor="darkorange"></td>
</tr>
<tr>
<td port="f2" border="1" bgcolor="cyan">A</td>
<td>
<table border="0" cellspacing="0">
<tr><td port="f3" border="1" bgcolor="azure">A1</td></tr>
<tr><td port="f4" border="1" bgcolor="azure">A2</td></tr>
<tr><td port="f5" border="1" bgcolor="azure">A3</td></tr>
</table>
</td>
</tr>
<tr>
<td port="f5" border="1" bgcolor="gray">Else</td>
<td port="f6" border="1" bgcolor="gray"></td>
</tr>
</table>>
shape = "none"
];
}
But it gives the graph like this
Would you please suggest how can we tweak the code to achieve the objective - merging f0, f1 on top and f5,f6 at bottom?
You can use HTML <td>s with colspan and rowspan attributes in GraphViz. These allow one cell to span multiple columns and/or rows inside a table.
This also simplifies your digraph, as only one table is needed.
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr>
<td colspan="2" port="f0" border="1" bgcolor="darkorange">TEST</td>
</tr>
<tr>
<td rowspan="3" port="f5" border="1" bgcolor="blue">A</td>
<td port="f6" border="1" bgcolor="white">A1</td>
</tr>
<tr>
<td port="f6" border="1" bgcolor="white">A2</td>
</tr>
<tr>
<td port="f6" border="1" bgcolor="white">A3</td>
</tr>
<tr>
<td colspan="2" port="f0" border="1" bgcolor="grey">Else</td>
</tr>
</table>>
shape = "none"
];
}
This gives you the following basic output, which you can then customize for spacing, line colors, etc:
This one also works. what's is the difference?
digraph G {
"test" [
label = <<table border="0" cellspacing="0">
<tr><td colspan="2" port="f0" border="1" bgcolor="darkorange">TEST</td> </tr>
<tr><td rowspan="4" port="f5" border="1" bgcolor="blue">A</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A1</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A2</td></tr>
<tr><td port="f6" border="1" bgcolor="white">A3</td></tr>
<tr><td colspan="2" port="f0" border="1" bgcolor="grey">Else</td></tr>
</table>>
shape = "none"
];
}

How to transpose into the "same" column?

I have a table like
<table class="tg">
<tr>
<th class="tg-0lax">date</th>
<th class="tg-0lax">organic</th>
<th class="tg-0lax">referrer</th>
<th class="tg-0lax">direct</th>
</tr>
<tr>
<td class="tg-0lax">01.01.2019</td>
<td class="tg-0lax">12345</td>
<td class="tg-0lax">123</td>
<td class="tg-0lax">23</td>
</tr>
<tr>
<td class="tg-0lax">25.01.2019</td>
<td class="tg-0lax">23456</td>
<td class="tg-0lax">234</td>
<td class="tg-0lax">34</td>
</tr>
<tr>
<td class="tg-0lax">03.03.2019</td>
<td class="tg-0lax">34567</td>
<td class="tg-0lax">345</td>
<td class="tg-0lax">56</td>
</tr>
<tr>
<td class="tg-0lax">15.04.2019</td>
<td class="tg-0lax">45678</td>
<td class="tg-0lax">456</td>
<td class="tg-0lax">78</td>
</tr>
</table>
I want to get the data into this view, where all data are placed into the same three columns, not dependently of whether points in the first column are repeated:
<table class="tg">
<tr>
<th class="tg-0lax">type</th>
<th class="tg-0lax">source</th>
<th class="tg-0lax">date</th>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">12345</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">123</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">23</td>
<td class="tg-0lax">01.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">23456</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">234</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">34</td>
<td class="tg-0lax">25.01.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">34567</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">345</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">56</td>
<td class="tg-0lax">03.03.2019</td>
</tr>
<tr>
<td class="tg-0lax">organic</td>
<td class="tg-0lax">45678</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
<tr>
<td class="tg-0lax">referrer</td>
<td class="tg-0lax">456</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
<tr>
<td class="tg-0lax">direct</td>
<td class="tg-0lax">78</td>
<td class="tg-0lax">15.04.2019</td>
</tr>
</table>
The "normal" transposing is pretty close to what I want, but even not exactly this, and I miss the point, how to pivot the data.
Another one example:
Got an error:
What I'm doing wrong? The formula is:
=ARRAYFORMULA({"type","source","date";SPLIT(TRANSPOSE(SPLIT(CONCATENATE(IF(B2:D<>"","♠"&B1:D1&"♦"&B2:D&"♦"&A2:A, )),"♠")),"♦")})
The line breaks are from the formula away - I've deleted them. Could the error cause be that my Google Spreadsheets used in Germany - formula language issue?
=ARRAYFORMULA({"type", "source", "date";
SPLIT(TRANSPOSE(SPLIT(CONCATENATE(IF(B2:D<>"",
"♠"&B1:D1&"♦"&B2:D&"♦"&A2:A, )), "♠")), "♦")})

DAX - filtering out whole row from data table that meets certain condition(s)

I have a table like following:
<style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
</style>
<table class="tg">
<tr>
<th class="tg-baqh">Store Name</th>
<th class="tg-baqh">Week Number</th>
<th class="tg-baqh">Sales</th>
<th class="tg-baqh">Sales-LY</th>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">20</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">25</td>
<td class="tg-baqh">20</td>
</tr>
<tr>
<td class="tg-baqh">Store A</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">30</td>
<td class="tg-baqh">25</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">15</td>
<td class="tg-baqh">10</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">15</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store B</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">20</td>
<td class="tg-baqh">15</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">1</td>
<td class="tg-baqh">30</td>
<td class="tg-baqh">25</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">2</td>
<td class="tg-baqh">0</td>
<td class="tg-baqh">20</td>
</tr>
<tr>
<td class="tg-baqh">Store C</td>
<td class="tg-baqh">3</td>
<td class="tg-baqh">25</td>
<td class="tg-baqh">20</td>
</tr>
</table>
I would like to return (lets' say) a pivot table with
Salex IDX = SUM(Sales)/SUM(Sales-LY) as a measure, ignoring the data points for "Week 2 for Store C".
So it's not a filter on just Week Number or Store, but its a filter on specific row(s) identified by multiple parameters.
Essentially, i would like to get 'like for like' results, excluding any weeks where Sales or Sales-LY columns are zero(or null)
Any ideas?
I would add a calculated column to concatenate Store and Week, something like this:
=[Store Name]&", Week: "&[Week Number]
Then you can use a filter or slicer to exclude the Store & Week combination you want.

Linux (sed|PCRE|grep) Regular expression to isoloate XML (KML) style data conditionally

Updates after sample code.
Solution: as provided by BeniBela He figured out what I failed to make clear...It has to be command line, not necessarily regex, and offered up this solution:
xpath -e '//Placemark[contains(description, "Iron")]'
as promised:
|
( )
/ \
_______
| _ |
| | | | All must enter and pay homage! (Shrine of BeniBela)
Problem: I need some form of command line regex to accomplish the following: Detect in one file of a set of Placemarks, Placemarks which include a keyword (in this case Iron) in a contained CDATA tag. without grabbing Placemarks which do not have the keywod. (All data from <Placemark> to </Placemark> needs to be captured.)
Explanation:
Two code samples are given below, one showing three full placemarks, two of which are useless to me, the third of which I want. The second code sample shows just the one I am interested in.
I need to extract the valid Placemark from the data file (which contains hundreds of placemarks) and append it into another file. I will then merge this file into a properly formatted KML later. The data sets are from the US Geological Survey and are very large.
The idea here is to recover placemarks for mines which are extracting a given kind of Ore (Iron for this example), and create a specialized KML (Keyhole Markup Language) file for display in a Google Earth type application.
sample1 (Multiple data with one valid entry):
<Placemark>
<name>
Las Antos Prospect</name>
<Snippet>
Record 10005251</Snippet>
<description>
<![CDATA[<p>
Record <a href="http://mrdata.usgs.gov/mrds/show.php?labno=10005251">
10005251</a>
of the <a href="http://mrdata.usgs.gov/mrds/">
Mineral Resources Data System</a>
</p>
<table border='1' padding='3' cellspacing='0'>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
oper_type</th>
<td>
Unknown</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
dev_stat</th>
<td>
Occurrence</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
ore</th>
<td>
Limestone</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
model</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod1</th>
<td>
Limestone, General</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod2</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod3</th>
<td>
</td>
</tr>
</table>
]]>
</description>
<styleUrl>
#defaultStyleMap</styleUrl>
<Point>
<altitudeMode>
relativeToGround</altitudeMode>
<coordinates>
-64.88273,-24.87527,0</coordinates>
</Point>
</Placemark>
<Placemark>
<name>
Unnamed Occurence</name>
<Snippet>
Record 10005252</Snippet>
<description>
<![CDATA[<p>
Record <a href="http://mrdata.usgs.gov/mrds/show.php?labno=10005252">
10005252</a>
of the <a href="http://mrdata.usgs.gov/mrds/">
Mineral Resources Data System</a>
</p>
<table border='1' padding='3' cellspacing='0'>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
oper_type</th>
<td>
Unknown</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
dev_stat</th>
<td>
Occurrence</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
ore</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
model</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod1</th>
<td>
Iron</td> ######################Iron here makes it valid
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod2</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod3</th>
<td>
</td>
</tr>
</table>
]]>
</description>
<styleUrl>
#defaultStyleMap</styleUrl>
<Point>
<altitudeMode>
relativeToGround</altitudeMode>
<coordinates>
-64.81607,-24.67527,0</coordinates>
</Point>
</Placemark>
<Placemark>
<name>
Merced I Quarry</name>
<Snippet>
Record 10005254</Snippet>
<description>
<![CDATA[<p>
Record <a href="http://mrdata.usgs.gov/mrds/show.php?labno=10005254">
10005254</a>
of the <a href="http://mrdata.usgs.gov/mrds/">
Mineral Resources Data System</a>
</p>
<table border='1' padding='3' cellspacing='0'>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
oper_type</th>
<td>
Unknown</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
dev_stat</th>
<td>
Producer</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
ore</th>
<td>
Limestone</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
model</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod1</th>
<td>
Limestone, General</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod2</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod3</th>
<td>
</td>
</tr>
</table>
]]>
</description>
<styleUrl>
#ProducerStyleMap</styleUrl>
<Point>
<altitudeMode>
relativeToGround</altitudeMode>
<coordinates>
-65.46052,-24.9586,0</coordinates>
</Point>
</Placemark>
The above sample contains two Placemarks which I have no use for, bracketing one which I need to extract.
Sample 2 (Showing just a 'valid' entry):
(The capture would need to grab all of this)
<Placemark>
<name>
Unnamed Occurence</name>
<Snippet>
Record 10005252</Snippet>
<description>
<![CDATA[<p>
Record <a href="http://mrdata.usgs.gov/mrds/show.php?labno=10005252">
10005252</a>
of the <a href="http://mrdata.usgs.gov/mrds/">
Mineral Resources Data System</a>
</p>
<table border='1' padding='3' cellspacing='0'>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
oper_type</th>
<td>
Unknown</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
dev_stat</th>
<td>
Occurrence</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
ore</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
model</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod1</th>
<td>
Iron</td> ######################Iron here makes it valid
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod2</th>
<td>
</td>
</tr>
<tr valign='top'>
<th align='right' bgcolor='#ddffee'>
commod3</th>
<td>
</td>
</tr>
</table>
]]>
</description>
<styleUrl>
#defaultStyleMap</styleUrl>
<Point>
<altitudeMode>
relativeToGround</altitudeMode>
<coordinates>
-64.81607,-24.67527,0</coordinates>
</Point>
</Placemark>
Update 1:
I got this to work in a regex tester, but I am still working on how to get it into grep et.al.
<Placemark>\n<name>\n.*</name>\n<Snippet>\n.*\n<description>\n(?:(?:.*\n){48}.*Iron.*\n|(?:.*\n){41}.*Iron.*\n|(?:.*\n){35}.*Iron.*\n)(?:.*\n){3,16}\]\]>\n</description>\n(?:.*\n){8,12}</Placemark>
That is trivial with XPath instead regex:
/Placemark[contains(description, "Iron")]
(or /*/Placemark[contains(description, "Iron")] if your xml contains a (required) root element)