Implementing Telegram bot webhooks in ColdFusion - coldfusion

I am developing an application in ColdFusion (CFML) to create generic, stateful, bots to be run on the Telegram messaging platform. I've found so far plenty of examples in PHP, some in other languages (Ruby,...), none in CFML. So, here I am.
The "getUpdates" (i.e., polling) way runs like a breeze, but it's not feasible polling the Telegram server for new updates at a rate decent for interactive use (some 30 sec). So, I've turned to Webhooks.
I will go over the webhook setting for a self-signed certificate, it's out of scope here, but I am ready to explain how I did overcome the issue.
My problem is: how to decode the posts received from Telegram server on occurrence of an update?
What my application server (ColdFusion + Tomcat + Apache2) gets from Telegram is an HTTP with an header like this:
struct
accept-encoding: gzip, deflate
connection: keep-alive
content-length: 344
content-type: application/json
host: demo.bigopen.eu
and a content section like this:
binary
1233411711210097116101951..... (*cut*)
Please note that the data section (ASCII) contains only decimal digits, not hex. I've been struggling how to decode that stuff, I'm striving to get a JSON representation of a single message.
I've been trying to use the CFML tools I have, such as BinaryDecode(), CharsetEncode(), Java GZip libraries, etc. but no success so far. I was expecting some serialized JSON in the reply, but it's encoded in a way I cannot decode. I've found no hint in the literature, since only calls to language-specific libraries (such as file_get_contents for PHP) are shown.
I don't expect to be given the actual CFML code, but hopefully what kind of encoding is performed by the Telegram side.

I'd like to inform that after some effort I could be able to have this issue solved. Encoding is handled by ColdFusion itself. The data given back by Telegram in a Webhook update is binary, and CF treats them as ByteArray (actually, they're declared as "Array" but not directly addressable). Nonetheless, the ToString() function, if applied, returns a string fully valid.
So, the first thing to do is :
<cfset reply = DeserializeJSON(ToString(StructFind(GetHttpRequestData(), "content"))) >
BTW, StructFind() just extracts the "content" section by the structure returned by GetHttpRequestData().
After that, reply is a structure holding what is needed, such as :
<cfset message_id = reply.message.message_id />
<cfset message_text = reply.message.text />
and so on.
Hoping that it may be useful to anyone.

Related

Manage HTTP response (uint_8 decimal vector) and use http parser to read JSON values

I am working in Simulink on a TCP connection between client (my computer) and server.
Through standard Ethernet blocks, I send an HTTP request like the following:
GET /status HTTP/1.1
Host:...
Accept: application/json
To send it, I convert it with the uint8('GET /status...') command.
The server responds, always with a uint8 array (decimal bytes) like this :
resp=[72 84 80 47 49 32 50 48...] //(HTTP/1.1 200 OK... ).
In the reply "resp" content, always in decimal characters, there is a text in JSON format like {"VarA":1000, "VarB":2000,...}.
My purpose is to create an S-function (but more generally a code in C++) that taking in input the vector "resp" returns the values of VarA and VarB (1000 and 2000).
I know that there are a lot of single-header source code like nlohmann/json or PicoHTTPParser but I don't know how to use them in the specific case.
My idea is to convert "resp" into a string and then pass it to the functions already written in the .h files for JSON/HTTP management. Is this correct? Can it be done?
I ask myself this question because I don't know what format the parse functions of those .h files take in input.
I also wonder if it is right to convert it into a string and then work on it...does anyone know if there is something that works directly on the uint_8 vector?
Should I extract the body message from "resp" before I work on it?
Sorry for these questions but I'm little bit confused. I don't even know how to search this problem on Google!
Actually, I'm looking for the easiest way to get to the final goal.
Thank you everyone!
I used https://github.com/nlohmann/json.
Very easy to use library.
Please note that the capacity of the message body in boost library is limited to 8k.
I had to write my own handlers (which in the end came out faster and more reliable). Yes, a little presumptuous, but I threw out all the extra checks and other heavy things. When you expect only json and only a specific format, you can ignore the universality and just return an exception if the input data is incorrect.

Google Cloud Dataflow removing accents and special chars with '??'

This is going to be quite a hit or miss question as I don't really know which context or piece of code to give you as it is a situation of it works in local, which does!
The situation here is that I have several services, and there's a step where messages are put in a PubSub topic awaiting for the Dataflow consumer to handle them and save as .parquet files (I also have another one which sends that payload to a HTTP endpoint).
The thing is, the message in that service prior sending it to that PubSub topic seems to be correct, Stackdriver logs show all the chars as they should be.
However, when I'm going to check the final output in .parquet or in the HTTP endpoint I just see, for example h?? instead of hí, which seems pretty weird as running everything in local makes the output be correct.
I can only think about encoding server-wise when deploying the Dataflow as a job and not running in local.
Hope someone can shed some light in something this abstract.
The strange thing is that it works locally.
But as a workaround, the first thing that comes to mind is to use encoding.
Are you using at some point a function to convert your string input as bytes?
If yes, you could try to force getBytes() to use utf-8 encoding by passing by the argument like in the following example from this Stackoverflow thread:
byte[] bytes = string.getBytes("UTF-8");
// feed bytes to Base64
// get bytes from Base64
String string = new String(bytes, "UTF-8");
Also:
- Have you tried setting the parquet.enable.dictionary option?
- Are your original files written in utf-8 before conversion?
Google Cloud Dataflow (at least the Java SDK) replaces Spanish characters like 'ñ' or accents 'á','é',' etc with the symbol � since the default charset of the JVM installed on service workers is US-ASCII. So, if UTF-8 is not explicitly declared when you instantiate strings or their relative byte-arrays transformation, the platform default encoding will be used.

Http Digest authentication and utf-8 symbols in request headers

I'm trying to implement HTTP Digest authentication in a server based on cpp-netlib and I'm not sure how to tackle the issue that the username attribute in the authorization header could contain unicode symbols - the actual Digest authentication RFC is not specific on this. But practice shows that e.g. Chrome just sends utf-8 encoded username, which would be fine, apart from the fact that cpp-netlib parses the incoming stream and checks if the header contents are alpha numeric using Boost and std::isalnum and friends (ok, on Linux i could just set the current locale to utf-8, but i'm on Windows) and that of course causes assertions and what not. So, I'm just asking for a general opinion, based on the facts given:
1) Should I just dump this (and I'm really close to that) and just use a customized POST/GET for authentication?
2) Can I anyhow customize the Boost's behaivor (since the functions that verify alpha numeric values come form boost\algorithm\string\classification) to tackle this?
3) Maybe such issues are somehow handled in POCO or other web server frameworks that could server as replacements in this situation?

How to correctly parse incoming HTTP requests

i've created an C++ application using WinSck, which has a small (handles just a few features which i need) http server implemented. This is used to communicate with the outside world using http requests. It works, but sometimes the requests are not handled correctly, because the parsing fails. Now i'm quite sure that the requests are correctly formed, since they are sent by major web browsers like firefox/chrome or perl/C# (which have http modules/dll's).
After some debugging i found out that the problem is in fact in receiving the message. When the message comes in more than just one part (it is not read in one recv() call) then sometimes the parsing fails. I have gone through numerous tries on how to resolve this, but nothing seems to be reliable enough.
What i do now is that i read in data until i find "\r\n\r\n" sequence which indicates end of header. If WSAGetLastError() reports something else than 10035 (connection closed/failed) before such a sequence is found i discard the message. When i know i have the whole header i parse it and look for information about the body length. However i'm not sure if this information is mandatory (i think not) and what should i do if there is no such information - does it mean there will be no body? Another problem is that i do not know if i should look for a "\r\n\r\n" after the body (if its length is greater than zero).
Does anybody know how to reliably parse a http message?
Note: i know there are implementations of http servers out there. I want my own for various reasons. And yes, reinventing the wheel is bad, i know that too.
If you're set on writing your own parser, I'd take the Zed Shaw approach: use the Ragel state machine compiler and build your parser based on that. Ragel can handle input arriving in chunks, if you're careful.
Honestly, though, I'd just use something like this.
Your go-to resource should be RFC 2616, which describes HTTP 1.1, which you can use to construct a parser. Good luck!
You could try looking at their code to see how they handle a HTTP message.
Or you could look at the spec, there's message length fields you should use. Only buggy browsers send additional CRLFs at the end, apparently.
Anyway HTTP request has "\r\n\r\n" at the end of request headers and before the request data if any, even if request is "GET / HTTP/1.0\r\n\r\n".
If method is "POST" you should read as many bytes after "\r\n\r\n", as specified in Content-Length field.
So pseudocode is:
read_until(buf, "\r\n\r\n");
if(buf.starts_with("POST")
{
contentLength = regex("^Content-Length: (\d+)$").find(buf)[1];
read_all(buf, contentLength);
}
There will be "\r\n\r\n" after the content only if content includes it. Content may be binary data, it hasn't any terminating sequences, and the one method to get its size is use Content-Length field.
HTTP GET/HEAD requests have no body, and POST request can have no body too. You have to check if it's a GET/HEAD, if it's, then you have no content (body/message) sent. If it was a POST, do as the specs say about parsing a message of known/unknown length, as #gbjbaanb said.

BOM not expected in CF but sent by IIS/SharePoint

I'm trying to consume a SharePoint webservice from ColdFusion via cfinvoke ('cause I don't want to deal with (read: parse) the SOAP response itself).
The SOAP response includes a byte-order-mark character (BOM), which produces the following exception in CF:
"Cannot perform web service invocation GetList.
The fault returned when invoking the web service operation is:
'AxisFault
faultCode: {http://www.w3.org/2003/05/soap-envelope}Server.userException
faultSubcode:
faultString: org.xml.sax.SAXParseException: Content is not allowed in prolog."
The standard for UTF-8 encoding optionally includes the BOM character (http://unicode.org/faq/utf_bom.html#29). Microsoft almost universally includes the BOM character with UTF-8 encoded streams . From what I can tell there’s no way to change that in IIS. The XML parser that JRun (ColdFusion) uses by default doesn’t handle the BOM character for UTF-8 encoded XML streams. So, it appears that the way to fix this is to change the XML parser used by JRun (http://www.bpurcell.org/blog/index.cfm?mode=entry&entry=942).
Adobe says that it doesn't handle the BOM character (see comments from anoynomous and halL on May 2nd and 5th).
http://livedocs.adobe.com/coldfusion/8/htmldocs/Tags_g-h_09.html#comments
I'm going to say that the answer to your question (is it possible?) is no. I don't know that definitively, but the poster who commented just above halL (in the comments on this page) gave a work-around for the problem -- so I assume it is possible to deal with when parsing manually.
You say that you're using CFInvoke because you don't want to deal with the soap response yourself. It looks like you don't have any choice.
As Adam Tuttle said already, the workaround is on the page that you linked to
<!--- Remove BOM from the start of the string, if it exists --->
<cfif Left(responseText, 1) EQ chr(65279)>
<cfset responseText = mid(xmlText, 2, len(responseText))>
</cfif>
It sounds like ColdFusion is using Apache Axis under the covers.
This doesn't apply exactly to your solution, but I've had to deal with this issue once before when consuming a .NET web service with Apache Axis/Java. The only solution I was able to find (since the owner of the web service was unwilling to change anything on his end) was to write a Handler class that Axis would plug into the pipeline which would delete the BOM from the message if it existed.
So perhaps it's possible to configure Axis through ColdFusion? If so you can add additional Handlers to the message handling flow.