Schematron rule to check if files have corresponding elements - xslt

I have a file structure like this:
bookmap.ditamap
├── en-US/
│ └── CTR_MyProduct.ditamap
├── es-ES/
│ └── CTR_MyProduct.ditamap
└── fr-FR/
└── CTR_MyProduct.ditamap
The content of bookmap.ditamap is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookmap PUBLIC "-//OASIS//DTD DITA BookMap//EN" "bookmap.dtd">
<bookmap>
<booktitle><mainbooktitle/></booktitle>
<part>
<mapref href="en-US/CTR_Product.ditamap"/>
</part>
<part>
<mapref href="fr-FR/CTR_Product.ditamap"/>
</part>
</bookmap>
I'd like to have a Schematron rule which should crawl through the subdirectories looking for files starting with CTR_ and ending with .ditamap which complains if there is no corresponding <part> element, like:
<part>
<mapref href="xx-YY/CTR_Product.ditamap"/>
</part>
In this example, there is no <part> element for the Spanish (es-ES) map. This should be reported. Do you think this is possible to validate in Schematron?

With underlying XSLT/XPath 3 support and Saxon 9 or 10 or 11 you can probably do e.g.
every $uri in uri-collection('?select=CTR_Product.ditamap;recurse=yes')
satisfies
some $part in part
satisfies contains($uri, $part/mapref/#href)
If those elements are in a namespace you will need to set that up too as the default XPath selection namespace.
You can also use wildcards e.g. uri-collection('?select=CTR_*.ditamap;recurse=yes').
Perhaps using ends-with($uri, $part/mapref/#href) instead of contains($uri, $part/mapref/#href) is a better way to check.
More complete in the context of Schematron:
<sch:rule context="bookmap">
<sch:let name="parts" value="part"/>
<sch:let name="uris" value="uri-collection('?select=*.ditamap;recurse=yes')"/>
<sch:assert test="every $uri in $uris
satisfies
some $part in part
satisfies ends-with($uri, $part/mapref/#href)">
Not every URI exists.
<sch:value-of select="$uris[not(some $ref in $parts/mapref satisfies ends-with(., $ref/#href))]"/>
</sch:assert>
</sch:rule>

Related

Is this a bug in xmllint or xmlstarlet pattern matching?

Here is my product.xml file contents:
<ProductCode>ABC</ProductCode>
And here is the corresponding validating schema, product.xsd file contents:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema
version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="ProductCode">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
<xsd:maxLength value="15"/>
<xsd:pattern value="[\P{Ll}]*"></xsd:pattern>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
I open a command line shell, and used xmlstarlet to validate the xml:
xmlstarlet val -e --xsd product.xsd product.xml
product.xml:1.31: Element 'ProductCode': [facet 'pattern'] The value 'ABC' is not accepted by the pattern '[\P{Ll}]*'.
product.xml:1.31: Element 'ProductCode': 'ABC' is not a valid value of the local atomic type.
product.xml - invalid
Then, i tried to use xmllint to validate the xml:
└xmllint -schema product.xsd product.xml
<?xml version="1.0"?>
<ProductCode>ABC</ProductCode>
Element 'ProductCode': [facet 'pattern'] The value 'ABC' is not accepted by the pattern '[\P{Ll}]*'.
Element 'ProductCode': 'ABC' is not a valid value of the local atomic type.
product.xml fails to validate
I spent a couple of hours tinkering with it and I found that I can make it work by removing the enclosing brackets:
<xsd:pattern value="\P{Ll}*"></xsd:pattern>
I can retain the enclosing brackets and make it work by using the /p inclusive pattern category and preceding it by a negation ^:
<xsd:pattern value="[^\p{Ll}]*"></xsd:pattern>
It seems, that there is a bug in the underlying implementation of xmllint and xmlstarlet and I need confirmation if indeed this is the case.
The versions I used are:
xmllint:
xmllint --version
xmllint: using libxml version 20904
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma
xmlstarlet:
xmlstarlet --version
1.6.1
compiled against libxml2 2.9.1, linked with 20904
compiled against libxslt 1.1.28, linked with 10129
Additional Info
Using python as coded in the snippets in python XML schema validation snippets, i found that product.xsd does not validate product.xml also. It's hard to believe that python also has this bug. So therefore, I am now seeking some kind of explanation why the pattern expression in product.xsd is not working.
The question is: why is the enclosing bracket not able to work with the exclusive /P{Ll} ?
More Additional Info
On the other hand, using the scala snippet here,it is able to validate product.xml via product.xsd. So now, we can confirm that the pattern syntax in product.xsd is correct. Yet, xmllint, xmlstarlet and python could not validate it. What is going on here?

Delete comments from xml file while parsing it using libxml

Following is the XML file with one of its node(i.e. <date>) being commented.
<?xml version="1.0"?>
<story>
<info>
<author>Abc Xyz</author>
<!--<date>June 2, 2017</date> -->
<keyword>example keyword</keyword>
</info>
</story>
What I want is to remove that commented line/node completely from the XML file using libxml library and it should look as below:
<?xml version="1.0"?>
<story>
<info>
<author>Abc Xyz</author>
<keyword>example keyword</keyword>
</info>
</story>
I also referred the libxml documentation but that didn't helped me much with the "comment/s" in XML file.
I tried in a different way and it worked. Looks like using xmlreader for modifying the xml will not help much, instead I did xmlReadMemory(), then while parsing did following check:
if(node->type == XML_COMMENT_NODE){ //node is of type xmlNodePtr
xmlUnlinkNode(node);
xmlFreeNode(node);
}
And finally xmlDocDumpFormatMemory() to store the modified xml in xmlbuffer.
You can use NodeType() while parsing the xml and check for each node if it’s a comment (8 means comment, see here: http://xmlsoft.org/xmlreader.html#Extracting) and then remove it with xmlUnlinkNode() and xmlFreeNode().

Removing specific element from xml using sax with xerces library in c++

My problem is that I want to remove lastfiles and all it's child element using sax with the API of Xerces in c++ language??
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE config>
<config datecreated="20011210">
<user>
John Smith
</user>
<login>jsmith</login>
<password>topsecret</password>
<lastfiles>
<lastfile timestamp="20011210T1002">accounts.txt</lastfile>
<lastfile timestamp="20011190T1132">/home/jsmith/docs/letter.doc</lastfile>
</lastfiles>
</config>

Syntax mode for Coda not working - Regex issue

I have the following SyntaxDefinition.xml file that I am using to create syntax highlighting for SilverStripe(.ss) files. However, I get a regex error with the following code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE syntax SYSTEM "syntax.dtd">
<syntax>
<head>
<name>SilverStripe Syntax</name>
<charsintokens><![CDATA[_0987654321abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ#]]></charsintokens>
</head>
<states>
<default id="Base" color="#000">
<state id="String" color="#760f15">
<begin><regex>"</regex></begin>
<end><regex>(((?<!\\)(\\\\)*)|^)"</regex></end>
</state>
<state id="Variable" color="#ff0000">
<begin><regex>^\$([a-z])(?:)</regex></begin>
<end><regex>[\n\r]</regex></end>
</state>
<import mode="PHP-HTML"/>
</default>
</states>
</syntax>
I want the "Variable" part of this code to color any code starting with a dollar sign, e.g. $Content.
Try:
<begin><regex>^\$[^\r\n]+</regex></begin>
or
<begin><regex>^\$</regex></begin>
depending on how it works
I found a bit of code which seems to work:
<regex>(\$([\w\d])+)</regex>

xslt test if a variable value is contained in a node set

I have the following two files:
<?xml version="1.0" encoding="utf-8" ?>
<!-- D E F A U L T H O S P I T A L P O L I C Y -->
<xas DefaultPolicy="open" DefaultSubjectsFile="subjects.xss">
<rule id="R1" access="deny" object="record" subject="roles/*[name()!='Staff']"/>
<rule id="R2" access="deny" object="diagnosis" subject="roles//Nurse"/>
<rule id="R3" access="grant" object="record[#id=$user]" subject="roles/member[#id=$user]"/>
</xas>
and the other xml file called subjects.xss is:
<?xml version="1.0" encoding="utf-8" ?>
<subjects>
<users>
<member id="dupont" password="4A-4E-E9-17-5D-CE-2C-DD-43-43-1D-F1-3F-5D-94-71">
<name>Pierre Dupont</name>
</member>
<member id="durand" password="3A-B6-1B-E8-C0-1F-CD-34-DF-C4-5E-BA-02-3C-04-61">
<name>Jacqueline Durand</name>
</member>
</users>
<roles>
<Staff>
<Doctor>
<member idref="dupont"/>
</Doctor>
<Nurse>
<member idref="durand"/>
</Nurse>
</Staff>
</roles>
</subjects>
I am writing an xsl sheet which will read the subject value for each rule in policy.xas and if the currently logged in user (accessible as variable "user" in the stylesheet) is contained in that subject value (say roles//Nurse), then do something.
I am not being able to test whether the currently logged in user ($user which is equal to say "durand") is contained in roles//Nurse in the subjects file (which is a different xml file). Hope that clarifies my question. Any ideas? Thanks in advance.
I suspect your $user variable holds a member node, correct? In which case the test would be:-
/roles/Nurse[member/idref=$user/#id]
BTW, using tag names to carry data such as "Nurse" and "Doctor" is not a good practice. You are effectively saying that each new role is a new type. Better would be:-
<roles>
<role>
<name>Nurse</name>
<member idref="durand" />
</role>
...
</roles>
Your test would be:-
/roles/role[name='Nurse' and member/idref=$user/#id]