How do I avoid body_limit error in boost's beast and correctly handle large messages - c++

I have cases when there is a chunked response which is too big for beast, and I want to stop before I get to beast's body_limit, and continue handling the message from that point using plain boost::asio. Mind that this (obviously) means I already received the header and a large part of the body.
I'm using it for a reverse proxy, so basically what I want to do is somehow send the incomplete response to the http client, while continuing relaying the remaining response data using boost::asio.
I'm guessing I'll need to somehow serialize the incomplete response, maybe using operator<< to std::stringstream, send that to the client using boost::asio, and continue the communication from there.
Will this work? Is this the correct way of doing that, or is there a better way, maybe even using beast api? Is there another way to handle chunked messages that are about to exceed body_limit in beast's api?
Thanks in advance,
David.
UPDATE
I finally abandoned the idea of falling back to boost asio, and am now trying to receive the http message (chunked or regular) in chunks with a fixed size buffer, so that I don't reach body limit. I'm just done skimming over Receive/parse the message body one chunk at a time · Issue #154 · boostorg/beast, and it seems that it's exactly what I need. I'm trying to implement a reverse proxy as well.. I tried to use Incremental Read 💡 - 1.70.0 but get a Reference to non-static member function must be called error when trying to compile this line:
ctx->response.get().body().data = response_buffer;
Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
Thanks in advance,
David

Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
The examples in the docs are compiled, so they can't possibly be out of date. Perhaps you are mixing different versions of the example and Beast? Are you using http::buffer_body? What does the declaration of your message look like?

By default, Beast's parser limits the size of the body to 1MB for requests and 8MB for responses. This is to prevent trivial resource exhaustion attacks. You can always increase the limit, or eliminate it entirely (by setting it to the largest uint64_t) by calling parser::body_limit :
https://www.boost.org/doc/libs/1_71_0/libs/beast/doc/html/beast/ref/boost__beast__http__parser/body_limit.html

Related

How do I check if gsoap is failing to serialize/send a big amount of data?

I'm running a web service client which we developed using gsoap library and we need to send 3.5 GB of data through a mutually authenticated connection - meaning we've got encrypted traffic on the network.
The server - which I have no access to - says it is receiving "empty" data.
I've made a network traffic capture and noticed a pause (a few seconds) during transmission and then an "Encrypted alert (21)" and connection closure.
Checking my code it seems there's some problem to serialize or send the data, but I haven't been able to find out exactly what's going on.
My suspicion is that gsoap is not able to allocate the necessary memory to serialize/send data.
How should I go about analyzing this?
EDIT1:
I've dropped the serialize trail and now my suspicion resides at the attachment functions. It seems defining attachment callbacks may solve my problem. Still interested in people's opinions and suggestions.
From your description it seems less likely that XML serialization is the cause of the problem. The standard HTTP-based communication in gsoap (i.e. without HTTP chunking) serializes the C/C++ data before sending to determine the HTTP content length. A serialization failure would show up immediately and not during transfer.
To speed up transfer of large XML chunks of data, you can optimize XML serialization with the SOAP_XML_TREE context flag to initialize the struct soap context. This reduces SOAP encoding overhead substantially. Let me explain why this is the case: the SOAP 1.1/1.2 encoding protocol incurs overhead in the gsoap engine in order to determine co-referenced elements by analyzing data structure pointers (e.g. graphs, possibly with cycles). When lots of pointers are used then this can affect the XML serializer's performance. Also, debug mode (-DDEBUG) will substantially slow down the gsoap engine so best to avoid with large transfers.
Perhaps a tool such as ssldump can shed some more light on the TLS/SSL communication issue you are experiencing.

Download progress using SFML's HTTP class?

So I'm wondering what is a good way of getting the progress of a download when using SFML's HTTP Class / using HTTP in general. The only way I've thought of being able to do it is using tons of ranged GET requests in a separate thread, but that ofc makes the download take much longer with all the requests and all.
Any ideas?
You can't. If you want progress information, you should either implement it yourself (not recommended) or use another library for networking.
From the documentation of sf::Http::sendRequest:
Warning: this function waits for the server's response and may not return instantly; use a thread if you don't want to block your application, or use a timeout to limit the time to wait.
In other words, it's a blocking method that return only on timeout or completion (success or error).
Maybe have a look at libcurl, cpp-netlib or maybe some other libraries.

Python: reading from rfile in SimpleHTTPRequestHandler

While overloading SimpleHTTPRequestHandler, my function blocks on self.rfile.read(). How do I find out if there is any data in rfile before reading? Alternatively, is there a non-blocking read call that returns in absence of data?
For the record the solution is to only try to read as many bytes as specified in the header Content-Length. ie something like:
contentLength = int(request.headers['Content-Length'])
payload = str(request.rfile.read(contentLength))
I just solved a case like this.
I'm pretty sure what's "gottcha" on this is an infinite stream of bits being written to your socket by the client your connected to. This is often called "pre-connect" and it happens because http/tcp/HttpServer doesn't make a significant distinction between "chunks" and single bytes being fed slowly to the connection. If you see the response header contains Transfer-Encoding: chunked you are a candidate for this happening. Google Chrome works this way and is a good test. If fire fox and IE work, but Chrome doesn't when you recieve a reponse from the same website, then this what's probably happening.
Possible solution:
In CPython 3.7 the HttpServer has options that support pre-connect. Look at the HttpThreadingServer in the docs.
Put the request handing in a separate thread, and timeout that thread when the read operation takes too long.
Both of these are potentially painful solutions, but at least it should get you started.

XMLRPCPP asynchronously handling multiple calls?

I have a remote server which handles various different commands, one of which is an event fetching method.
The event fetch returns right away if there is 1 or more events listed in the queue ready for processing. If the event queue is empty, this method does not return until a timeout of a few seconds. This way I don't run into any HTTP/socket timeouts. The moment an event becomes available, the method returns right away. This way the client only ever makes connections to the server, and the server does not have to make any connections to the client.
This event mechanism works nicely. I'm using the boost library to handle queues, event notifications, etc.
Here's the problem. While the server is holding back on returning from the event fetch method, during that time, I can't issue any other commands.
In the source code, XmlRpcDispatch.cpp, I'm seeing in the "work" method, a simple loop that uses a blocking call to "select".
Seems like while the handling of a method is busy, no other requests are processed.
Question: am I not seeing something and can XmlRpcpp (xmlrpc++) handle multiple requests asynchronously? Does anyone know of a better xmlrpc library for C++? I don't suppose the Boost library has a component that lets me issue remote commands?
I actually don't care about the XML or over-HTTP feature. I simply need to issue (asynchronous) commands over TCP in any shape or form?
I look forward to any input anyone might offer.
I had some problems with XMLRPC also, and investigated many solutions like GSoap and XMLRPC++, but in the end I gave up and wrote the whole HTTP+XMLRPC from scratch using Boost.ASIO and TinyXML++ (later I swaped TinyXML to expat). It wasn't really that much work; I did it myself in about a week, starting from scratch and ending up with many RPC calls fully implemented.
Boost.ASIO gave great results. It is, as its name says, totally async, and with excellent performance with little overhead, which to me was very important because it was running in an embedded environment (MIPS).
Later, and this might be your case, I changed XML to Google's Protocol-buffers, and was even happier. Its API, as well as its message containers, are all type safe (i.e. you send an int and a float, and it never gets converted to string and back, as is the case with XML), and once you get the hang of it, which doesn't take very long, its very productive solution.
My recomendation: if you can ditch XML, go with Boost.ASIO + ProtobufIf you need XML: Boost.ASIO + Expat
Doing this stuff from scratch is really worth it.

Webservice protection against big messages

I am developing a WebService in Java upon the jax-ws stack and glassfish.
Now I am a bit concerned about a couple of things.
I need to pass in a unknown amount of binary data that will be processed with a MDB, it is written this way to be asynchronous (so the user does not have to wait for the calculation to take place, kind of fault tolerant aswell as being very scalable.
The input message can however be split into chunks and sent to the MDB or split in the client and sent to the WS itself in chunks.
What I am looking for is a way to be able to specify the max size of the input so I wont blow the heap even if someone delibratly tries to send a to big message. I have noticed that things tend to be a bit unstable once you hit the ceiling and I must be able to keep running.
Is it possible to be safe against big messages or should I try to use another method instead of WS. Which options do I have?
Well I am rather new to Java EE..
If you're passing binary data take a look at enabling MTOM for endpoint. It utilizes streaming and has 'threshold' parameter.