Access attributes from XML in shell - xslt

I'm trying to parse out values from a Widget config.xml using shell. I do want to use sed for this task. If there is something that sucks less than xsltproc, I'd love to know.
In this example I am after the id attribute value from the config.xml below:
<?xml version="1.0" encoding="UTF-8"?>
<widget xmlns="http://www.w3.org/ns/widgets" id="http://example.org/exampleWidget" version="2.0 Beta" height="200" width="200">
<name short="123">Foo Widget</name>
</widget>
I wish it was as simple as Jquery's attr: var id = $("widget").attr("id");
Currently this shell code utilising xsltproc fails:
snag () {
TMP=$(tempfile)
cat << EOF > $TMP
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" indent="no"/>
<xsl:template>
<xsl:value-of select="$1"/>
</xsl:template>
</xsl:stylesheet>
EOF
echo $(xsltproc $TMP config.xml)
rm -f $TMP
}
ID=$(snag "widget/#id")
if test "$ID" = "http://example.org/exampleWidget"
then
echo Mission accomplished.
else
echo "<$ID> is wrong."
fi

XMLStarlet (http://xmlstar.sourceforge.net/) is a nice command line tools that supports such queries:
xmlstarlet sel -N w=namespace -T -t -m "/w:widget/#id" -v . -n config.xml

template match="widget"
select value-of="#id"

<xsl:template xmlns:wgt="http://www.w3.org/ns/widgets" match="/wgt:widget">
<xsl:select value-of="#id" />
</xsl:template>

You don't need XSLT if you're not doing a transform.
If you only need to grab a value use XPath.
There's an xpath program that comes with Perl's XML::XPath module.
From the shell:
ID=$(xpath config.xml 'string(/widget/#id)' )
( The string() function is to get only the value of the id.
/widget/#id by itself returns "id=value" )
If you only need to produce some other output depending on the value, you could
do it all in xslt. There are also other XPath implementations available from
other scripting languages: I've used Java's XPath from both rhino and Jython.
There's also XQuery from the command line with Saxon.

Related

update xsl file with xmlstaret, grep or similar

I have a series of similar XSL files and to all of them, I have to add the same XSL element in a specific position.
Here you can find a portion of the XSL to be updated and the element to be inserted is
<xsl:call-template name="distributor.xsl"/>
and it has to be after the </mrd:distributionFormat> and before the <mrd:transferOptions> tags
Is there a way to automatize this update to all my XSL files using XmlStarlet grep or similar?
...
<mdb:distributionInfo>
<mrd:MD_Distribution>
<mrd:distributionFormat>
<mrd:MD_Format>
<mrd:formatSpecificationCitation>
<cit:CI_Citation>
<cit:title>
<gco:CharacterString>WCS</gco:CharacterString>
</cit:title>
<cit:date gco:nilReason="unknown"/>
<cit:edition>
<gco:CharacterString>2.0</gco:CharacterString>
</cit:edition>
</cit:CI_Citation>
</mrd:formatSpecificationCitation>
</mrd:MD_Format>
</mrd:distributionFormat>
<!-- call-template -->
<xsl:call-template name="distributor.xsl"/>
<!-- call-template -->
<mrd:transferOptions>
...
I tried with
xmlstarlet ed -P -S -L -s //mrd:MD_Distribution -t elem -i xsl:include -t attr -n "name" -v "distributor.xsl" main.xsl
where main.xsl is the file to be updated
After changed requirements
Following commmand inserts the desired xsl:call-template node where
you want (for details see my first posting),
xmlstarlet edit \
-i '//mrd:transferOptions[1]' \
-t elem -n 'xsl:call-template' -v '' \
-s '$prev' -t attr -n name -v 'distributor.xsl' \
main.xsl
when run on the following XML file (adjust namespaces as needed):
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:cit="urn:so70244776_cit"
xmlns:gco="urn:so70244776_gco"
xmlns:mdb="urn:so70244776_mdb"
xmlns:mrd="urn:so70244776_mrd"
>
<xsl:template name="x">
<mdb:distributionInfo>
<mrd:MD_Distribution>
<mrd:distributionFormat>
<mrd:MD_Format>
<mrd:formatSpecificationCitation>
<cit:CI_Citation>
<cit:title>
<gco:CharacterString>WCS</gco:CharacterString>
</cit:title>
<cit:date gco:nilReason="unknown"/>
<cit:edition>
<gco:CharacterString>2.0</gco:CharacterString>
</cit:edition>
</cit:CI_Citation>
</mrd:formatSpecificationCitation>
</mrd:MD_Format>
</mrd:distributionFormat>
<mrd:transferOptions/>
</mrd:MD_Distribution>
</mdb:distributionInfo>
</xsl:template>
</xsl:transform>
In a POSIX shell the following xmlstarlet command:
xmlstarlet edit \
-N xsl="http://www.w3.org/1999/XSL/Transform" \
-N mdb="urn:so70244776_mdb" \
-N mrd="urn:so70244776_mrd" \
--var templatename "'distributor-N.xsl'" \
--var anchornode '//mrd:distributionFormat[1]' \
-d '$anchornode/following-sibling::xsl:call-template' \
-a '$anchornode' -t elem -n 'xsl:call-template' -v '' \
-a '$xstar:prev' -t attr -n name -v '' \
-u '$xstar:prev' -x '$templatename' \
main.xsl
declares a number of namespace bindings
selects //mrd:distributionFormat[1] as anchor node
deletes any existing xsl:call-template sibling nodes following the anchor
appends a new xsl:call-template element with a name attribute
xmlstarlet edit code can use the convenience $xstar:prev (aka
$prev) node to refer to the node created by the most recent
-i / --insert, -a / --append, or -s / --subnode option.
Examples of $xstar:prev are given in
doc/xmlstarlet.txt
and the source code's examples/ed-backref*.
In the command shown its first use refers to the xsl:call-template
element, the second to the name attribute.
EDIT: It turns out xmlstarlet edit isn't as picky as I thought
it was so an alternative, shorter command would be (attribute nodes
can be added with -s, -i, or -a):
xmlstarlet edit \
--var anchor '//mrd:distributionFormat[1]' \
-d '$anchor/following-sibling::xsl:call-template' \
-a '$anchor' -t elem -n 'xsl:call-template' -v '' \
-s '$prev' -t attr -n name -v 'distributor-N.xsl' \
main.xsl
Given the following (demo) input:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:mdb="urn:so70244776_mdb"
xmlns:mrd="urn:so70244776_mrd"
>
<xsl:template name="q">
<mdb:distributionInfo>
<mrd:MD_Distribution>
<mrd:distributionFormat>
<mrd:MD_Format>
<mrd:formatSpecificationCitation/>
</mrd:MD_Format>
</mrd:distributionFormat>
<xsl:call-template name="distributor.xsl"/>
<!-- x -->
<xsl:call-template name="distributor-1.xsl"/>
<mrd:transferOptions/>
</mrd:MD_Distribution>
</mdb:distributionInfo>
</xsl:template>
</xsl:transform>
either command above produces the following output:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mdb="urn:so70244776_mdb" xmlns:mrd="urn:so70244776_mrd" version="1.0">
<xsl:template name="q">
<mdb:distributionInfo>
<mrd:MD_Distribution>
<mrd:distributionFormat>
<mrd:MD_Format>
<mrd:formatSpecificationCitation/>
</mrd:MD_Format>
</mrd:distributionFormat>
<xsl:call-template name="distributor-N.xsl"/>
<!-- x -->
<mrd:transferOptions/>
</mrd:MD_Distribution>
</mdb:distributionInfo>
</xsl:template>
</xsl:transform>

shell script to recognise jira key

Below lines in my jenkins job configuration Execute shell retrieves jira key
JIRA_KEY=$(curl --request GET "http://jenkins-server/job/myProject/job/mySubProject/job/myComponent/${BUILD_NUMBER}/api/xml?xpath=/*/changeSet/item/comment" | sed -e "s/<comment>\(.*\)<\/comment>/\1/")
JIRA_KEY=$(echo $JIRA_KEY | cut -c10-17)
But in case if text doesn't start with jira key then as per the current logic it will assign any text in the range of 10-17. I need to store empty string "" in the variable JIRA_KEY when jira key is not present in the <comment>, how can we do that?
Here is the xml
<freeStyleBuild _class="hudson.model.FreeStyleBuild">
<changeSet _class="hudson.plugins.git.GitChangeSetList">
<item _class="hudson.plugins.git.GitChangeSet">
<comment>
JRA-1011 This is commit
message.
</comment>
</item>
</changeSet>
</freeStyleBuild>
As I mentioned in comment section it is not clear which output you need, so based on some assumptions, could you please try following and let me know on same.
I- If you need all the strings between to then you could run following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next}a' Input_file
II- If you need to find only JRA string between to then you could do following.
awk '/<\/comment>/{a="";next}/<comment>/{a=1;next} a && /JRA/{match($0,/[a-zA-Z]+[^ ]*/);print substr($0,RSTART,RLENGTH)}' Input_file

Searching and replacing a block of XML formatted text in Bash

I have been trying to figure out how to search a block of XML formatted text and modify it in Bash. The file I want to process is a simulation file with XML fomatting. Assume that the file contains multiple blocks of XML stataments as:
<mote>
<breakpoints />
<interface_config>
org.contikios.cooja.interfaces.Position
<x>0.0</x>
<y>75.0</y>
<z>0.0</z>
</interface_config>
<interface_config>
org.contikios.cooja.mspmote.interfaces.MspClock
<deviation>1.0</deviation>
</interface_config>
<interface_config>
org.contikios.cooja.mspmote.interfaces.MspMoteID
<id>4</id>
</interface_config>
<motetype_identifier>sky2</motetype_identifier>
</mote>
What I want to search is a block of XML statements here:
<id>4</id>
</interface_config>
<motetype_identifier>sky2</motetype_identifier>
And replace it with
<id>4</id>
</interface_config>
<motetype_identifier>sky3</motetype_identifier>
Rest of the XML statements before and after these statements will remain unchanged. This will enable me to change the mote type Node 4 from sky2 to sky3 in a script in Bash.
xmlstarlet ed --omit-decl -u "//mote[interface_config/id='4']/motetype_identifier" -v 'sky3' file.xml
Output:
<mote>
<breakpoints/>
<interface_config>
org.contikios.cooja.interfaces.Position
<x>0.0</x>
<y>75.0</y>
<z>0.0</z>
</interface_config>
<interface_config>
org.contikios.cooja.mspmote.interfaces.MspClock
<deviation>1.0</deviation>
</interface_config>
<interface_config>
org.contikios.cooja.mspmote.interfaces.MspMoteID
<id>4</id>
</interface_config>
<motetype_identifier>sky3</motetype_identifier>
</mote>
If you want to edit file.xml inplace, add option -L.
See: xmlstarlet ed --help

BASH script to rename XML file to an attribute value

I have a lot of .xml files structured the same way:
<parent id="idvalue" attr1="val1" attr2="val2" ...>
<child attr3="val3" attr4="val4" ... />
<child attr3="val5" attr4="val6" ... />
...
</parent>
Each file has exactly one <parent> element with exactly one id attribute.
All of those files (almost 1,700,000 of them) are named as part.xxxxx where xxxxx is a random number.
I want to name each of those files as idvalue.xml, according to the sole id attribute from the file's content.
I believe doing it with a bash script would be the fastest and most automated way. But if there are other suggestions, I would love to hear them.
My main problem is that I am not able (don't know how) to get the idvalue in a specific file, so that I could use it with the mv file.xxxxx idvalue.xml command.
First I would iterate through the xml files using find:
find -maxdepth 1 -name 'part*.xml' -exec ./rename_xml.sh {} \;
The line above will execute rename_xml.sh for every xml file, passing the file name as command argument to the script.
rename_xml.sh should look like this:
#!/bin/bash
// Get the id using XPath. You might probably need
// to install xmllint for that if it is not already present.
// The xpath query will return a string like this (try it!):
//
// id="idvalue"
//
// We are using sed to extract the value from that
id=$(xmllint --xpath '//parent/#id' "$1" | sed -r 's/[^"]+"([^"]+).*/\1/')
mv -v "$1" "$id.xml"
Don't forget to
chmod +x rename_xml.sh
Use a proper XML handling tool to extract the id from the files. For example,
xsh:
for file in part.* ; do
mv "$file" $(xsh -aC 'open { shift }; echo /parent/#id' "$file").xml
done
Like I mentioned in my comment that I am not sure about the performance of XSLT in compared to bash scripts, but I created the XSLT for you to try out.
In the stylesheet below, Dire is the directory that contains the xml files.The select "tokenize(document-uri(.), '/')[last()]"
retrieves the filename and the second line concatenates the directory name with the filename to get the path of the file.The line with xsl:copy..is used to copy the entire xml.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxml="urn:schemas-microsoft-com:xslt" xmlns:random="http://www.microsoft.com/msxsl">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="collection('Dire/?select=*.xml')" >
<xsl:variable name="filename" select="tokenize(document-uri(.), '/')[last()]"/>
<xsl:variable name="filepath" select="concat('Dire/',$filename)"/>
<xsl:variable name="doc" select="document($filepath)"/>
<xsl:variable name="outname" select="$doc/parent/#id"/>
<xsl:result-document href="{$outname}.xml" method="xml">
<xsl:copy-of select="$doc/node()"/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
I ran the xslt using saxon8. Unfortunately I could not find any way to rename the xml directly.But the above code should be worth a try.

Removing specific tags in a KML file

I have a KML file which is a list of places around the world with coordinates and some other attributes. It looks like this for one place:
<Placemark>
<name>Albania - Durrës</name>
<open>0</open>
<visibility>1</visibility>
<description>(Spot ID: 275801) show <![CDATA[forecast]]></description>
<styleUrl>#wgStyle001</styleUrl><Point>
<coordinates>19.489747,41.277806,0</coordinates>
</Point>
<LookAt><range>200000</range><longitude>19.489747</longitude><latitude>41.277806</latitude></LookAt>
</Placemark>
I would like to remove everything except the name of the place. So in this case that would mean I would like to remove everything except
<name>Albania - Durrës</name>
The problem is, this KML file includes more than 1000 of these places. Doing this manually obviously isn't an option, so then how can I remove all tags except for the name tags for all of the items in the list? Can I use some kind of program for that?
Use a specialized command line tool that understands XML documents.
One such tool is xmlstarlet, which is available here for Linux, Windows and Solaris.
To address your particular problem, I used the xmlstarlet executable xml.exe like this (on Windows):
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v /ns:kml/ns:Document/ns:Placemark/ns:name places.kml
This produces this output:
Albania - Durrës
Second Name
Third Name
...
Final Name
If you can guarantee that <name> occurs only as a child of <Placemark>, then this abbreviated version will produce the same result:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -t -v //ns:name places.kml
(This is because this shorter version finds all <name> elements no matter where they occur in the document.)
If you really want an XML document, you'll need to do a little post-processing. Here's an example of a complete XML document:
<?xml version='1.0' encoding='utf-8'?>
<items>
<item>Albania - Durrës</item>
<item>Second Name</item>
<item>Third Name</item>
<!-- ... -->
<item>Final Name</item>
</items>
This first line is the XML declaration. It declares the Unicode encoding utf-8. You'll need to include this line so that XML processors recognize that your document includes Unicode characters. (As in Durrës.)
More: Here's an enhanced 'xmlstarlet' command that will produce the XML document above:
xml.exe sel -N ns=http://www.opengis.net/kml/2.2 -T -t -o "<?xml version='1.0' encoding='utf-8'?>" -n -t -v "'<items>'" -n -t -m //ns:Placemark -v "concat('<item>',ns:name,'</item>')" -n -t -o "</items>" -n places.kml
If you are on linux or similar:
grep "<name>" your_file.kml > file_with_only_name_tags
On windows, see What are good grep tools for Windows?