BOM not expected in CF but sent by IIS/SharePoint - web-services

I'm trying to consume a SharePoint webservice from ColdFusion via cfinvoke ('cause I don't want to deal with (read: parse) the SOAP response itself).
The SOAP response includes a byte-order-mark character (BOM), which produces the following exception in CF:
"Cannot perform web service invocation GetList.
The fault returned when invoking the web service operation is:
'AxisFault
faultCode: {http://www.w3.org/2003/05/soap-envelope}Server.userException
faultSubcode:
faultString: org.xml.sax.SAXParseException: Content is not allowed in prolog."
The standard for UTF-8 encoding optionally includes the BOM character (http://unicode.org/faq/utf_bom.html#29). Microsoft almost universally includes the BOM character with UTF-8 encoded streams . From what I can tell there’s no way to change that in IIS. The XML parser that JRun (ColdFusion) uses by default doesn’t handle the BOM character for UTF-8 encoded XML streams. So, it appears that the way to fix this is to change the XML parser used by JRun (http://www.bpurcell.org/blog/index.cfm?mode=entry&entry=942).
Adobe says that it doesn't handle the BOM character (see comments from anoynomous and halL on May 2nd and 5th).
http://livedocs.adobe.com/coldfusion/8/htmldocs/Tags_g-h_09.html#comments

I'm going to say that the answer to your question (is it possible?) is no. I don't know that definitively, but the poster who commented just above halL (in the comments on this page) gave a work-around for the problem -- so I assume it is possible to deal with when parsing manually.
You say that you're using CFInvoke because you don't want to deal with the soap response yourself. It looks like you don't have any choice.

As Adam Tuttle said already, the workaround is on the page that you linked to
<!--- Remove BOM from the start of the string, if it exists --->
<cfif Left(responseText, 1) EQ chr(65279)>
<cfset responseText = mid(xmlText, 2, len(responseText))>
</cfif>

It sounds like ColdFusion is using Apache Axis under the covers.
This doesn't apply exactly to your solution, but I've had to deal with this issue once before when consuming a .NET web service with Apache Axis/Java. The only solution I was able to find (since the owner of the web service was unwilling to change anything on his end) was to write a Handler class that Axis would plug into the pipeline which would delete the BOM from the message if it existed.
So perhaps it's possible to configure Axis through ColdFusion? If so you can add additional Handlers to the message handling flow.

Related

Google Cloud Dataflow removing accents and special chars with '??'

This is going to be quite a hit or miss question as I don't really know which context or piece of code to give you as it is a situation of it works in local, which does!
The situation here is that I have several services, and there's a step where messages are put in a PubSub topic awaiting for the Dataflow consumer to handle them and save as .parquet files (I also have another one which sends that payload to a HTTP endpoint).
The thing is, the message in that service prior sending it to that PubSub topic seems to be correct, Stackdriver logs show all the chars as they should be.
However, when I'm going to check the final output in .parquet or in the HTTP endpoint I just see, for example h?? instead of hí, which seems pretty weird as running everything in local makes the output be correct.
I can only think about encoding server-wise when deploying the Dataflow as a job and not running in local.
Hope someone can shed some light in something this abstract.
The strange thing is that it works locally.
But as a workaround, the first thing that comes to mind is to use encoding.
Are you using at some point a function to convert your string input as bytes?
If yes, you could try to force getBytes() to use utf-8 encoding by passing by the argument like in the following example from this Stackoverflow thread:
byte[] bytes = string.getBytes("UTF-8");
// feed bytes to Base64
// get bytes from Base64
String string = new String(bytes, "UTF-8");
Also:
- Have you tried setting the parquet.enable.dictionary option?
- Are your original files written in utf-8 before conversion?
Google Cloud Dataflow (at least the Java SDK) replaces Spanish characters like 'ñ' or accents 'á','é',' etc with the symbol � since the default charset of the JVM installed on service workers is US-ASCII. So, if UTF-8 is not explicitly declared when you instantiate strings or their relative byte-arrays transformation, the platform default encoding will be used.

Implementing Telegram bot webhooks in ColdFusion

I am developing an application in ColdFusion (CFML) to create generic, stateful, bots to be run on the Telegram messaging platform. I've found so far plenty of examples in PHP, some in other languages (Ruby,...), none in CFML. So, here I am.
The "getUpdates" (i.e., polling) way runs like a breeze, but it's not feasible polling the Telegram server for new updates at a rate decent for interactive use (some 30 sec). So, I've turned to Webhooks.
I will go over the webhook setting for a self-signed certificate, it's out of scope here, but I am ready to explain how I did overcome the issue.
My problem is: how to decode the posts received from Telegram server on occurrence of an update?
What my application server (ColdFusion + Tomcat + Apache2) gets from Telegram is an HTTP with an header like this:
struct
accept-encoding: gzip, deflate
connection: keep-alive
content-length: 344
content-type: application/json
host: demo.bigopen.eu
and a content section like this:
binary
1233411711210097116101951..... (*cut*)
Please note that the data section (ASCII) contains only decimal digits, not hex. I've been struggling how to decode that stuff, I'm striving to get a JSON representation of a single message.
I've been trying to use the CFML tools I have, such as BinaryDecode(), CharsetEncode(), Java GZip libraries, etc. but no success so far. I was expecting some serialized JSON in the reply, but it's encoded in a way I cannot decode. I've found no hint in the literature, since only calls to language-specific libraries (such as file_get_contents for PHP) are shown.
I don't expect to be given the actual CFML code, but hopefully what kind of encoding is performed by the Telegram side.
I'd like to inform that after some effort I could be able to have this issue solved. Encoding is handled by ColdFusion itself. The data given back by Telegram in a Webhook update is binary, and CF treats them as ByteArray (actually, they're declared as "Array" but not directly addressable). Nonetheless, the ToString() function, if applied, returns a string fully valid.
So, the first thing to do is :
<cfset reply = DeserializeJSON(ToString(StructFind(GetHttpRequestData(), "content"))) >
BTW, StructFind() just extracts the "content" section by the structure returned by GetHttpRequestData().
After that, reply is a structure holding what is needed, such as :
<cfset message_id = reply.message.message_id />
<cfset message_text = reply.message.text />
and so on.
Hoping that it may be useful to anyone.

How to load a shapefile in gwt-openlayers

We're building an application using GWT-Openlayers (not OpenLayers) and need to allow the user to load a polygon from a shapefile. Surprisingly, there doesn't seem to be an evident solution. The closest solutions are javascript libraries for interpreting shapefiles, but a javascript solution doesn't really help in a GWT application. Any recommendations?
Thanks in advance!
Lacking a simpler solution, the approach I used was as follows:
Use GWT FormPanel and FileUpload to allow user to select the file to upload
Create a custom servlet for handling the request
FormPanel sends a multipart POST of the file contents to the servlet
Servlet feeds the file content to a parser to convert to Well Known Text (WKT)
Servlet returns the WKT in the HttpResponse
Client side code converts the WKT to a gwt-openlayers vector feature and adds it to the map
Certainly not an elegant solution but seems to work. If anyone finds a better solution, it would be great to hear.

Http Digest authentication and utf-8 symbols in request headers

I'm trying to implement HTTP Digest authentication in a server based on cpp-netlib and I'm not sure how to tackle the issue that the username attribute in the authorization header could contain unicode symbols - the actual Digest authentication RFC is not specific on this. But practice shows that e.g. Chrome just sends utf-8 encoded username, which would be fine, apart from the fact that cpp-netlib parses the incoming stream and checks if the header contents are alpha numeric using Boost and std::isalnum and friends (ok, on Linux i could just set the current locale to utf-8, but i'm on Windows) and that of course causes assertions and what not. So, I'm just asking for a general opinion, based on the facts given:
1) Should I just dump this (and I'm really close to that) and just use a customized POST/GET for authentication?
2) Can I anyhow customize the Boost's behaivor (since the functions that verify alpha numeric values come form boost\algorithm\string\classification) to tackle this?
3) Maybe such issues are somehow handled in POCO or other web server frameworks that could server as replacements in this situation?

Java WebService call - null parameters

I have a Java WS developed with JAX-WS. This service has only one method with two int parameters as input. Every time I try to call this service the parameters are 0. If I change to type to Integer 0 is transformed in null.
To figure these things out, you need to trace the messages into and out of the service. If it uses http, then consider a debugging HTTP Proxy like Fiddler2 or (I can't remember the Java version of the proxy). Fiddler2 is not written in Java but it works fine for Java-based apps.
If the service doesn't use HTTP then you'll need some other way to trace the messages.
Normally the problem here is one of XML schema agreement. An incorrect XML namespace on an incoming message will cause the input to be deserialized "null" or zero. Even a one character difference in the namespace - let's say a missing trailing slash - can cause this.