SAXParseException that only occurs locally. Works on WebServers - xslt

I am writing a junit test that is testing and older piece of code. This code works on our iplanet webservers and our local Tomcat servers and runs with no problems. However when run by the JUNIT test I get this exception.
Background: It pulls an XSL file from a JAR then transforms it with an xml document that is read in from a resource file.
I have tried changing transformer factories, changing encoding, and checked all files for null characters using a hex editor. Any ideas?
[Fatal Error] :2251:46: An invalid XML character (Unicode: 0x0) was found in the value of attribute "test" and element is "xsl:when".
SystemId Unknown; Line #2251; Column #46; org.xml.sax.SAXParseException; lineNumber: 2251; columnNumber: 46; An invalid XML character (Unicode: 0x0) was found in the value of attribute "test" and element is "xsl:when".
**UPDATE
I have found that if I use the project's class folder where the XSL is held and move it about the jar's dependency it works, but if it uses the xsl out of the jar it breaks

SUGGESTIONS:
1) Make sure your library versions all match.
2) I suspect that " An invalid XML character (Unicode: 0x0) was found" might be caused by any of several completely different things. You should investigate each of them.
3) First, most obvious - check your input for a null character :)
4) Second, check your encoding - perhaps your sender is writing UTF-16, but your reader is expecting UTF-8. Here's a good link:
*
Error about invalid XML characters on Java
This is an encoding issue. Either you read it the inputstream as UTF8
and it isn't or the other way around.
You should specify the encoding explicitly when you read the content.
E.g. via
new InputStreamReader(getInputStream(), "UTF-8")
Another problem could be the tomcat. Try to add URIEncoding="UTF-8" in
your tomcat’s connector settings in the server.xml file
5) The root cause might also be a failed read or a missing object of some kind. A missing definition, perhaps.
Q: What is "SystemId"? What might cause it to "go missing"?
6) One possibility is that "resolveEntity()" is failing:
InputSource resolveEntity(String publicId, String systemId)
Here are a couple of links regarding that problem:
Java SAX Parser raises UnknownHostException
how to disable dtd at runtime in java's xpath?
7) Both of these links suggest "resolveEntity() might be failing because you can't connect to a specified host. Check the network host names listed in your XML, and make sure you can "ping" them.

If it's got as far as line 2251 that suggests strongly that there's something wrong with the file contents around that location. If there's nothing wrong with the file at that location, my next suggestion would be that something's wrong with the parser. I know it sounds crazy, but the XML parser built in to the JDK is seriously buggy, and I would check whether the problem goes away if you install the Apache version of Xerces in its place. In many cases this is simply a question of putting the relevant JAR files in the lib/endorsed directory of the JDK installation.

This was caused because the XSL files I was trying to transform were still in a JAR. I had to have Maven extract the files into the target directory first.

Related

PyAIML failing with use of "secure" predicate

I've just started playing with PyAIML. My notes are at http://webseitz.fluxent.com/wiki/AIML.
I'm trying to use a starting AIML file that loads a whole directory of AIML files. But getting the You are not permitted to load AIML sets response. Even after adding k.setBotPredicate("secure", "yes") in the code.
What am I missing?

web.config vs. text file for storing a comma-separated value

We have a collection of VB.NET / IIS web services on some of our servers, and they have web.config files in the websites' root directories that they're already reading configurations from. There is a new configuration that needed to be added that will immediately be quite a bit longer than the others, and it'll only stand to grow. It's essentially a comma-separated value, and I'm wanting to keep it specifically in a configuration file of some sort.
At first I started doing this with a text file, but there was a problem with that. The text file's contents could change while web service threads and processes are running, so they would need to essentially re-read the file every time they needed to access its values. I thought about using some sort of caching, but unless the web services are completely restarted each time the file is updated, caching would block updates to the file from being used immediately. But reading from a text file each time is slow...
Then came the idea of putting that value in web.config, along with the other configurations the services are already using. When web.config is altered, the changes are able to be cached in the code, on top of coming into play immediately. However web.config is, well, web.config, and it's not a totally trivialized text file that is simply read out of in the code. IIS treats web.config in a special manner.
I'm tempted to think any negative consequences of putting a comma-separated value in web.config would be outweighed, in comparison to storing them in a text file (or a database, which probably can't be used for this anyway), but I guess I better ask.
What are the implications of storing a possibly lengthy, comma-separated value in web.config, instead of in its own little text file? Is either file a particularly good or bad idea? To me, it seems like web.config would be easy to get along with without having to re-read the file over and over, but there's certainly more to it than the common user is aware. Thanks!
I recommend using the Application Cache for this:
http://msdn.microsoft.com/en-us/library/vstudio/6hbbsfk6(v=vs.100).aspx

xmlReadFile() (C++ Ubuntu) core dumps on broken XML

I am using the libxml2 libraries to parse XML sent to me (my program) as a file from another program. With care that should mean that I never get bad XML, but twice already I've made hand tweaks that broke the XML in the received file. By broken I mean that the elements have errors, end tags not matching start tags, random characters in between tags, etc.
The file is small so there are no particular memory worries about loading all of it into the parser, so I use xmlReadFile() to read in the doc.
My problem comes when the XML is broken. xmlReadFile() does an abend and core dumps. I can't catch it with an exception nor does setting the flag to "recover" work.
I've looked at Google with minimum success. I found xmllint, but I really would like not to have to call system() or popen() every time I get a new XML file. I looked at DTDs but can't seem to figure out how to tell a DTD to actually validate the value passed in a . (Many of the tags in the doc have values that are one of a set of, say, 5 possible answers.) Of course, if DTD worked I at least wouldn't crash the xmlReadFile().
Any suggestions on how to validate the XML before xmlReadFile() or with xmlReadFile() and how to prevent the crashes? Does xmllint have a C++ interface that I just haven't found?
No boost. No changing libraries.
Have you tried xmlReaderForFile(... XML_PARSE_RECOVER ...) ?

document() function for a file on another computer/server

I understand the use of document() as follows.
<xsl:value-of select="document('path\to\docuemnt.xml')/RootElement/Element"/>
And this has to be a relative path to the parent XSL file. But what if I need to reference a file which is hosted on another server on the local network? I've tried such things as.
<xsl:value-of select="document('\\servername\path\to\document.xml')/RootElement/Element"/>
But this throws an error, because it looks in
C:\path\to\xsl\\servername\path\to\document.xml
Which of course doesn't exist.
This solution only relates to the Saxon-HE 9.4.0.3N XSLT processor, in the console application form, on Windows 7.
In my experimentation, I found that the document() function will accept file names or URIs. However I would avoid filenames because they need to be short-form. If you use long-form, the file-name will be rejected.
Suppose your document is ...
c:\path\to\document.xml
on server 'servername' which is mapped to drive 'j'.
To form a URI from this use as the document() parameter value...
file:///j:/path/to/document.xml
In relation to the URI, I was mistaken about Saxon not accepting long-form. This only applies to filenames. However, there are a number of gotchas...
Note the forward slashes. Backslashes will not work.
I have not found a way to build a workable file: URI with just UNC names. You need to make a drive mapping to a letter.
Any failure to open the document for any reason will be reported as the same error. With file system, there are so many things that can go wrong, that if you can't open the file, it is not safe to assume that the URI is wrong. There could be many mundane reasons why a file cannot be opened at a particular time.
Beware of firewall issues. These play a role.
Many text editors, such as NotePad++ assume, in the absence of a BOM and not encoded in one of the two UTF-16 encodings, that a text file is encoded in the system code-page. Saxon will make the default assumption that the file is encoded in UTF-8 so if you have a character that looks like this in NotePad++ (ä) with my code-page, Saxon will spit the dummy, and report that it is unable to open the file. (Aside: I'm not sure what my code-page is. My o/s is Win7 and the Current system locale is English (Australia). It is the system local that determines the system code-page). The reason why Saxon will not open the document is that the (ä) encoded in some code-page results in a sequence of bytes which is not a valid UTF-8 sequence.
URI paths which are not URL paths are not supported by the underlying operating system. Saxon may well truthfully say that it supports URIs in relation to the document() function, but that doesn't boil any cabbages, because in practice, you can't use them. - Well at least not on the windows family of o/s.
Please ignore the MSDN page on the file protocol. The form of URL suggested on that page (with the | character etc) is not accepted by the Saxon document() function. Use the form that I have suggested above. I have tested it and it works.
Your understanding of document() is incorrect. It expects a URI, not a filename.

Xerces/Xalan: UNC path as argument for document function?

I'm transforming an XML document by using Xerces-C 2.5 and Xalan-C 1.8. The XSL contains a "document" function, that references a file on the network. Unfortunately I cannot access this file by HTTP. I've only got the UNC path.
Xerces refuses to parse the referenced document, because WinSockNetAccessor::makeNew is called in Xerces as the "file" protocol is only accepted for local files. WinSockNetAccessor::makeNew is implemented for HTTP only, an exception is thrown and the file is ignored.
Is there a way to fool Xerces in order to accept the unc path as local file or any other known workaround without writing my own parser or manipulating Xerces?
A simple workaround would be, I guess, to just create a mapping, so you can call the network drive O: or whatever. That often fools programs that can't work directly with a UNC path (such as cmd.exe itself).
Does the UNC as it appears in the XSL have a "file:" prefix?
BTW, Xerces C V2.5 is several years old. Have you tried the latest version - V3.0.1 at the moment?