Is this a bug in xmllint or xmlstarlet pattern matching? - regex

Here is my product.xml file contents:
<ProductCode>ABC</ProductCode>
And here is the corresponding validating schema, product.xsd file contents:
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema
version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="ProductCode">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
<xsd:maxLength value="15"/>
<xsd:pattern value="[\P{Ll}]*"></xsd:pattern>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:schema>
I open a command line shell, and used xmlstarlet to validate the xml:
xmlstarlet val -e --xsd product.xsd product.xml
product.xml:1.31: Element 'ProductCode': [facet 'pattern'] The value 'ABC' is not accepted by the pattern '[\P{Ll}]*'.
product.xml:1.31: Element 'ProductCode': 'ABC' is not a valid value of the local atomic type.
product.xml - invalid
Then, i tried to use xmllint to validate the xml:
└xmllint -schema product.xsd product.xml
<?xml version="1.0"?>
<ProductCode>ABC</ProductCode>
Element 'ProductCode': [facet 'pattern'] The value 'ABC' is not accepted by the pattern '[\P{Ll}]*'.
Element 'ProductCode': 'ABC' is not a valid value of the local atomic type.
product.xml fails to validate
I spent a couple of hours tinkering with it and I found that I can make it work by removing the enclosing brackets:
<xsd:pattern value="\P{Ll}*"></xsd:pattern>
I can retain the enclosing brackets and make it work by using the /p inclusive pattern category and preceding it by a negation ^:
<xsd:pattern value="[^\p{Ll}]*"></xsd:pattern>
It seems, that there is a bug in the underlying implementation of xmllint and xmlstarlet and I need confirmation if indeed this is the case.
The versions I used are:
xmllint:
xmllint --version
xmllint: using libxml version 20904
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma
xmlstarlet:
xmlstarlet --version
1.6.1
compiled against libxml2 2.9.1, linked with 20904
compiled against libxslt 1.1.28, linked with 10129
Additional Info
Using python as coded in the snippets in python XML schema validation snippets, i found that product.xsd does not validate product.xml also. It's hard to believe that python also has this bug. So therefore, I am now seeking some kind of explanation why the pattern expression in product.xsd is not working.
The question is: why is the enclosing bracket not able to work with the exclusive /P{Ll} ?
More Additional Info
On the other hand, using the scala snippet here,it is able to validate product.xml via product.xsd. So now, we can confirm that the pattern syntax in product.xsd is correct. Yet, xmllint, xmlstarlet and python could not validate it. What is going on here?

Related

Custom np++ functionlist expressions

Oh, I'm having trouble just understanding what I'm doing here.
I'm creating a custom Notepad++ FunctionList. I know how to add it, and it's parsing, but I can't figure out from the docs how to specify the regex correctly.
In my code file (for a program called Squiffy), it has sections, kind of like an .ini file, so I started by copying the ini file's functionlist code.
I'm looking for sections like this: [[something]]: on it's own line, and for subsections like this: [somesubsection]:.
<?xml version="1.0" encoding="UTF-8" ?>
<NotepadPlus>
<functionList>
<!-- File format used for: .sq -->
<parser displayName="Squiffy" id="Squiffy"
commentExpr=""
>
<function mainExpr=mainExpr="^\[\[.*\]\]\:$" >
<functionName>
<nameExpr expr="^\[\[.*\]\]*"/>
</functionName>
</function>
</parser>
</functionList>
</NotepadPlus>
My problem is that I don't really understand what mainExpr and nameExpr are supposed to find. I know I can find the sections with the regex I have in mainExpr, but I'm not sure what to with the nameXpr field.
Hmmmph. Nobody had anything to say! So I posted on the notepad++ forum, which is where this solution comes from:
<parser
id="Squiffy" displayName="Squiffy" commentExpr=""
>
<function
mainExpr="(?-s)^\[(?:\[([^\[\]\r\n]+)\]|(?1))\]:"
>
<functionName>
<nameExpr expr="[^\[\]\r\n]+" />
</functionName>
</function>
</parser>
And here is a thread describing just notepad++ functionlists are configured.

java.lang.IllegalArgumentException:An invalid character [34] was present in the Cookie value

This is how my tomcat-users file looks like:
<tomcat-users>
<role rolename="admin"/>
<role rolename="analyst"/>
<role rolename="user"/>
<role rolename="kie-server"/>
<role rolename="developer"/>
<role rolename="manager"/>
<user username="w" password="w" roles="admin"/>
<user username="k" password="k" roles="kie-server"/>
<user username="u" password="u" roles="user,developer,analyst"/>
</tomcat-users>
After entering correct credentials in the KIE IDE WORKBENCH, I get the following exception:
java.lang.IllegalArgumentException: An invalid character [34] was present in the Cookie value
org.apache.tomcat.util.http.Rfc6265CookieProcessor.validateCookieValue(Rfc6265CookieProcessor.java:182)
org.apache.tomcat.util.http.Rfc6265CookieProcessor.generateHeader(Rfc6265CookieProcessor.java:115)
org.apache.catalina.connector.Response.generateCookieString(Response.java:1019)
org.apache.catalina.connector.Response.addCookie(Response.java:967)
org.apache.catalina.connector.ResponseFacade.addCookie(ResponseFacade.java:386)
org.uberfire.ext.security.server.SecurityIntegrationFilter.doFilter(SecurityIntegrationFilter.java:61)
CookieProcessor is a new configuration element, introduced in Tomcat 8.0.15.
The CookieProcessor element allows different cookie parsing configuration in each web application, or globally in the default conf/context.xml file.
According to official docs at Apache Tomcat 8 Configuration Reference
Version 8.0.47 :
The standard implementation of CookieProcessor is: org.apache.tomcat.util.http.LegacyCookieProcessor. Note that it is anticipated that this will change to org.apache.tomcat.util.http.Rfc6265CookieProcessor in a future Tomcat 8 release.
Later..
According to official docs at Apache Tomcat 8 Configuration Reference
Version 8.5.23
The standard implementation of CookieProcessor is org.apache.tomcat.util.http.Rfc6265CookieProcessor
To resolve this issue: add this line in conf/context.xml at location %CATALINA_HOME% (i.e. C:\apache-tomcat-8.5.20\conf\context.xml in my case):
<CookieProcessor className="org.apache.tomcat.util.http.LegacyCookieProcessor" />
This is how it looks like after adding:
<?xml version="1.0" encoding="UTF-8"?>
<Context reloadable="true">
<WatchedResource>WEB-INF/web.xml</WatchedResource>
<WatchedResource>${catalina.base}/conf/web.xml</WatchedResource>
<Transaction factory="bitronix.tm.BitronixUserTransactionObjectFactory"/>
<CookieProcessor className="org.apache.tomcat.util.http.LegacyCookieProcessor" />
</Context>

Delete comments from xml file while parsing it using libxml

Following is the XML file with one of its node(i.e. <date>) being commented.
<?xml version="1.0"?>
<story>
<info>
<author>Abc Xyz</author>
<!--<date>June 2, 2017</date> -->
<keyword>example keyword</keyword>
</info>
</story>
What I want is to remove that commented line/node completely from the XML file using libxml library and it should look as below:
<?xml version="1.0"?>
<story>
<info>
<author>Abc Xyz</author>
<keyword>example keyword</keyword>
</info>
</story>
I also referred the libxml documentation but that didn't helped me much with the "comment/s" in XML file.
I tried in a different way and it worked. Looks like using xmlreader for modifying the xml will not help much, instead I did xmlReadMemory(), then while parsing did following check:
if(node->type == XML_COMMENT_NODE){ //node is of type xmlNodePtr
xmlUnlinkNode(node);
xmlFreeNode(node);
}
And finally xmlDocDumpFormatMemory() to store the modified xml in xmlbuffer.
You can use NodeType() while parsing the xml and check for each node if it’s a comment (8 means comment, see here: http://xmlsoft.org/xmlreader.html#Extracting) and then remove it with xmlUnlinkNode() and xmlFreeNode().

How to read a value from property file and replace it on xml using shell script

I have a xml file which contains some path at multiple places.
Now I want to fetch value from a .properties file mentioned and replace part of path where ever it is present in xml.
Like,let's consider I have a xml file as below.
<?xml version="1.0" encoding="ISO-8859-1"?>
...
...
<classpath>
<pathelement location="/profiles/sh/finalFolder/Apache/example.jar" />
</classpath>
<property name="executable" value="/profiles/sh/finalFolder/Apache/instjamr/install" />
<fileset dir="/profiles/sh/finalFolder/Apache/ant"/>
this xml file conatins path /profiles/sh/finalFolder with some suffix at many places.
Now, I have a path.properties file which contains (key,value) pairs such as
FinalFolder=/new/final/exit (user can edit value anytime in property file)
I want to replace the path with the value mentioned in .properties file for the key FinalFolder.
so now finally, I need to write a code in .sh file to do the job.
Please help,Thanks in advance.
(please don't mark this question as duplicate as I din't find a approriate/implementable answer for my question)

Unable to load '#' symbol with loadxml

I tried to execute the following lines
VARIANT_BOOL vBoolTestConnection;
vBoolTestConnection=m_spXMLDom->loadXML(bstrInput.m_str);
bstrInput has the following XML specifications. loadXML is returning false for the XML specified. bstrInput has '#' in the password tag. If I replace the # symbol with anyother password characters, Load XML is working fine. Could you please help me to find the solution?
"<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-
ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-
ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" SOAP-
ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><SOAP-
ENV:Body><CheckValidUser
xmlns="http://systemsys"><UserName>HGDXJHSAD</UserName><Password>&</Password></CheckValidUs
er></SOAP-ENV:Body></SOAP-ENV:Envelope>"
BSTR is usually UTF-16. The XML string that you posted claims that it's using encoding UTF-8.