crossbar Out-of-memory when pub runs faster than sub - c++

I was doing some pub&sub test with autobahn-cpp. However, I found that when you pub some data at a frequency that faster than the sub endpoint can consume, this will cause the router(crossbar) cache some data and the memory usage increases. Eventually, the router will use up all the memory and be killed by the os.
For example
publisher:
while(1)
{
session->publish("com.pub.test",std::make_tuple(std::string("hello, world")) );
std::this_thread::sleep_for(std::chrono::seconds(1)); // sleep 1s
} // pub a string every seconds
subscriber:
void topic1(const autobahn::wamp_event& event)
{
try
{
auto s = event.argument<std::string>(0);
std::cerr << s << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2)); //need 2s to finish the job
}
catch (std::exception& e)
{
std::cerr << e.what() << std::endl;
}
}
main()
{
...
session>subscribe("com.pub.test", &topic1);
...
} // pub runs faster than the sub can consume
After several housr:
2016-01-7 10:11:32+0000 [Controller 16142] Worker 16145: Process connection gone (A process has ended with a probable error condition: process ended by signal 9.)
dmsg:
Out of memory: Kill process 16145(Crossbar.io Wor) score 4 or sacrifice child
My questions:
Is this normal (use up all the memory and be killed by the os) ?
Or is there any config options can be set to limit the memory usage?
I found a similar issue, see link https://github.com/crossbario/crossbar/issues/48
system info: ubuntu 14.04(32bit), CPython 2.7.6, Crossbar.io 0.11.1, Autobahn 0.10.9

The client is filling up with messages it hasn't delivered yet.
This is a "feature" of message based protocols.
Instead of request -> response
It's request => response + response + etc
You're running into "backpressure", where the queue of responses to send is filling up faster than the client can receive them.
You should stop producing or drop responses. Do you need all the responses, or just the latest?
Here is some "backpressure" documentation from uWebsockets
There is an "Observable" pattern (similar to Promises), that can help, Rx.Js is for JavaScript, but I'm sure there is something similar for C++. It's like a streaming promise library.

Related

Receiving large binary data over Boost::Beast websocket

I am trying to receive a large amount of data using a boost::beast::websocket, fed by another boost::beast::websocket. Normally, this data is sent to a connected browser but I'd like to set up a purely C++ unit test validating certain components of the traffic. I set the auto fragmentation to true from the sender with a max size of 1MB but after a few messages, the receiver spits out:
Read 258028 bytes of binary
Read 1547176 bytes of binary
Read 168188 bytes of binary
"Failed read: The WebSocket message exceeded the locally configured limit"
Now, I should have no expectation that a fully developed and well supported browser should exhibit the same characteristics as my possibly poorly architected unit test, which it does not. The browser has no issue reading 25MB messages over the websocket. My boost::beast::websocket on the other hand hits a limit.
So before I go down a rabbit hole, I'd like to see if anyone has any thoughts on this. My read sections looks like this:
void on_read(boost::system::error_code ec, std::size_t bytes_transferred)
{
boost::ignore_unused(bytes_transferred);
if (ec)
{
m_log.error("Failed read: " + ec.message());
// Stop the websocket
stop();
return;
}
std::string data(boost::beast::buffers_to_string(m_buffer.data()));
// Yes I know this looks dangerous. The sender always sends as binary but occasionally sends JSON
if (data.at(0) == '{')
m_log.debug("Got message: " + data);
else
m_log.debug("Read " + utility::to_string(m_buffer.data().buffer_bytes()) + " of binary data");
// Do the things with the incoming doata
for (auto&& callback : m_read_callbacks)
callback(data);
// Toss the data
m_buffer.consume(bytes_transferred);
// Wait for some more data
m_websocket.async_read(
m_buffer,
std::bind(
&WebsocketClient::on_read,
shared_from_this(),
std::placeholders::_1,
std::placeholders::_2));
}
I saw in a separate example that instead of doing an async read, you can do a for/while loop reading some data until the message is done (https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/using_websocket/send_and_receive_messages.html). Would this be the right approach for an always open websocket that could send some pretty massive messages? Would I have to send some indicator to the client that the message is indeed done? And would I run into the exceeded buffer limit issue using this approach?
If your use pattern is fixed:
std::string data(boost::beast::buffers_to_string(m_buffer.data()));
And then, in particular
callback(data);
Then there will be no use at all reading block-wise, since you will be allocating the same memory anyways. Instead, you can raise the "locally configured limit":
ws.read_message_max(20ull << 20); // sets the limit to 20 miB
The default value is 16 miB (as of boost 1.75).
Side Note
You can probably also use ws.got_binary() to detect whether the last message received was binary or not.

Concurrent request processing with Boost Beast

I'm referring to this sample program from the Beast repository: https://www.boost.org/doc/libs/1_67_0/libs/beast/example/http/server/fast/http_server_fast.cpp
I've made some changes to the code to check the ability to process multiple requests simultaneously.
boost::asio::io_context ioc{1};
tcp::acceptor acceptor{ioc, {address, port}};
std::list<http_worker> workers;
for (int i = 0; i < 10; ++i)
{
workers.emplace_back(acceptor, doc_root);
workers.back().start();
}
ioc.run();
My understanding with the above is that I will now have 10 worker objects to run I/O, i.e. handle incoming connections.
So, my first question is the above understanding correct?
Assuming that the above is correct, I've made some changes to the lambda (handler) passed to the tcp::acceptor:
void accept()
{
// Clean up any previous connection.
boost::beast::error_code ec;
socket_.close(ec);
buffer_.consume(buffer_.size());
acceptor_.async_accept(
socket_,
[this](boost::beast::error_code ec)
{
if (ec)
{
accept();
}
else
{
boost::system::error_code ec2;
boost::asio::ip::tcp::endpoint endpoint = socket_.remote_endpoint(ec2);
// Request must be fully processed within 60 seconds.
request_deadline_.expires_after(
std::chrono::seconds(60));
std::cerr << "Remote Endpoint address: " << endpoint.address() << " port: " << endpoint.port() << "\n";
read_request();
}
});
}
And also in process_request():
void process_request(http::request<request_body_t, http::basic_fields<alloc_t>> const& req)
{
switch (req.method())
{
case http::verb::get:
std::cerr << "Simulate processing\n";
std::this_thread::sleep_for(std::chrono::seconds(30));
send_file(req.target());
break;
default:
// We return responses indicating an error if
// we do not recognize the request method.
send_bad_response(
http::status::bad_request,
"Invalid request-method '" + req.method_string().to_string() + "'\r\n");
break;
}
}
And here's my problem: If I send 2 simultaneous GET requests to my server, they're being processed sequentially, and I know this because the 2nd "Simulate processing" statement is printed ~30 seconds after the previous one which would mean that execution gets blocked on the first thread.
I've tried to read the documentation of boost::asio to better understand this, but to no avail.
The documentation for acceptor::async_accept says:
Regardless of whether the asynchronous operation completes immediately or not, the handler will not be >invoked from within this function. Invocation of the handler will be performed in a manner equivalent to >using boost::asio::io_service::post().
And the documentation for boost::asio::io_service::post() says:
The io_service guarantees that the handler will only be called in a thread in which the run(), >run_one(), poll() or poll_one() member functions is currently being invoked.
So, if 10 workers are in the run() state, then why would the two requests get queued?
And also, is there a way to workaround this behavior without adapting to a different example? (e.g. https://www.boost.org/doc/libs/1_67_0/libs/beast/example/http/server/async/http_server_async.cpp)
io_context does not create threads internally to execute the tasks, but rather uses the threads that call io_context::run explicitly. In the example the io_context::run is called just from one thread (main thread). So you have just one thread for task executions, which (thread) gets blocked in sleep and there is no other thread to execute other tasks.
To make this example work you have to:
Add more thread into the pool (like in the second example you referred to)
size_t const threads_count = 4;
std::vector<std::thread> v;
v.reserve(threads_count - 1);
for(size_t i = 0; i < threads_count - 1; ++i) { // add thraed_count threads into the pool
v.emplace_back([&ioc]{ ioc.run(); });
}
ioc.run(); // add the main thread into the pool as well
Add synchronization (for example, using strand like in the second example) where it is needed (at least for socket reads and writes), because now your application is multi-threaded.
UPDATE 1
Answering to the question "What is the purpose of a list of workers in the Beast example (the first one that referred) if in fact io_context is only running on one thread?"
Notice, regardless of thread count IO operations here are asynchronous, meaning http::async_write(socket_...) does not block the thread. And notice, that I explain here the original example (not your modified version). One worker here deals with one round-trip of 'request-response'. Imagine the situation. There are two clients client1 and client2. Client1 has poor internet connection (or requests a very big file) and client2 has the opposite conditions. Client1 makes request. Then client2 makes request. So if there was just one worker client2 would had to wait until client1 finished the whole round-trip 'request-response`. But, because there are more than one workers client2 gets response immediately not waiting the client1 (keep in mind IO does not block your single thread). The example is optimized for situation where bottleneck is IO but not the actual work. In your modified example you have quite the opposite situation - the work (30s) is very expensive compared to IO. For that case better use the second example.

How to detect cause of Dart VM crash

I have two Dart apps running on Amazon (AWS Ubuntu), which are:
Self-hosted http API
Worker that handles background tasks on a timer
Both apps use PostgreSQL. They were occasionally crashing so, in addition to trying to find the root causes, I also implemented a supervisor script that just detects whether those 2 main apps are running and restarts them as needed.
Now the problem I need to solve is that the supervisor script is crashing, or the VM is crashing. It happens every few days.
I don't think it is a memory leak because if I increase the polling rate from 10s to much more often (1 ns), it correctly shows in the Dart Observatory that it exhausts 30MB and then garbage-collects and starts over at low memory usage, and keeps cycling.
I don't think it's an uncaught exception because the infinite loop is completely enclosed in try/catch.
I'm at a loss for what else to try. Is there a VM dump file that can be examined if the VM really crashed? Is there any other technique to debug the root cause? Is Dart just not stable enough to run apps for days at a time?
This is the main part of the code in the supervisor script:
///never ending function checks the state of the other processes
Future pulse() async {
while (true) {
sleep(new Duration(milliseconds: 100)); //DEBUG - was seconds:10
try {
//detect restart (as signaled from existence of restart.txt)
File f_restart = new File('restart.txt');
if (await f_restart.exists()) {
log("supervisor: restart detected");
await f_restart.delete();
await endBoth();
sleep(new Duration(seconds: 10));
}
//if restarting or either proc crashed, restart it
bool apiAlive = await isRunning('api_alive.txt', 3);
if (!apiAlive) await startApi();
bool workerAlive = await isRunning('worker_alive.txt', 8);
if (!workerAlive) await startWorker();
//if it's time to send mail, run that process
if (utcNow().isAfter(_nextMailUtc)) {
log("supervisor: starting sendmail");
Process.start('dart', [rootPath() + '/sendmail.dart'], workingDirectory: rootPath());
_nextMailUtc = utcNow().add(_mailInterval);
}
} catch (ex) {}
}
}
If you have the observatory up you can get a crash dump with:
curl localhost:<your obseratory port>/_getCrashDump
I'm not totally sure if this is related but Process.start returns a future which I don't believe will be caught by your try/catch if it completes with an error...

Multithreading in C++, receive message from socket

I have studied Java for 8 months but decided to learn some c++ to on my spare time.
I'm currently making a multithreaded server in QT with minGW. My problem is that when a client connects, I create an instance of Client( which is a class) and pass the socket in the client class contructor.
And then I start a thread in the client object (startClient()) which is going to wait for messages, but it doesn't. Btw, startClient is a method that I create a thread from. See code below.
What happens then? Yes, when I try to send messages to the server, only errors, the server won't print out that a new client connects, and for some reason my computer starts working really hard. And qtcreator gets super slow until I close the server-program.
What I actually is trying to achieve is an object which derives the thread, but I have heard that it isn't a very good idea to do so in C++.
The listener loop in the server:
for (;;)
{
if ((sock_CONNECTION = accept(sock_LISTEN, (SOCKADDR*)&ADDRESS, &AddressSize)))
{
cout << "\nClient connected" << endl;
Client client(sock_CONNECTION); // new object and pass the socket
std::thread t1(&Client::startClient, client); //create thread of the method
t1.detach();
}
}
the Client class:
Client::Client(SOCKET socket)
{
this->socket = socket;
cout << "hello from clientconstructor ! " << endl;
}
void Client::startClient()
{
cout << "hello from clientmethod ! " << endl;
// WHEN I ADD THE CODE BELOW I DON'T GET ANY OUTPUT ON THE CONSOLE!
// No messages gets received either.
char RecvdData[100] = "";
int ret;
for(;;)
{
try
{
ret = recv(socket,RecvdData,sizeof(RecvdData),0);
cout << RecvdData << endl;
}
catch (int e)
{
cout << "Error sending message to client" << endl;
}
}
}
It looks like your Client object is going out of scope after you detach it.
if (/* ... */)
{
Client client(sock_CONNECTION);
std::thread t1(&Client::startClient, client);
t1.detach();
} // GOING OUT OF SCOPE HERE
You'll need to create a pointer of your client object and manage it, or define it at a higher level where it won't go out of scope.
The fact that you never see any output from the Server likely means that your client is unable to connect to your Server in the first place. Check that you are doing your IP addressing correctly in your connect calls. If that looks good, then maybe there is a firewall blocking the connection. Turn that off or open the necessary ports.
Your connecting client is likely getting an error from connect that it is interpreting as success and then trying to send lots of traffic on an invalid socket as fast as it can, which is why your machine seems to be working hard.
You definitely need to check the return values from accept, connect, read and write more carefully. Also, make sure that you aren't running your Server's accept socket in non-blocking mode. I don't think that you are because you aren't seeing any output, but if you did it would infinitely loop on error spawning tons of threads that would also infinitely loop on errors and likely bring your machine to its knees.
If I misunderstood what is happening and you do actually get a client connection and have "Client connected" and "hello from client method ! " output, then it is highly likely that your calls to recv() are failing and you are ignoring the failure. So, you are in a tight infinite loop that is repeatedly outputting "" as fast as possible.
You also probably want to change your catch block to catch (...) rather than int. I doubt either recv() or cout throw an int. Even so, that catch block won't be invoked when recv fails because recv doesn't throw any exceptions AFAIK. It returns its failure indicator through its return value.

boost asio "A non-recoverable error occurred during database lookup"

I'm currently stress testing my server.
sometimes I get "A non-recoverable error occurred during database lookup" Error
coming from error.message()
error is sent to my handling function by boost::asio::placeholders::error called on the async_read method.
I have no idea what this error means, and I am not able to reproduce purposely this error, it only happen sometimes and seems to be random (of course it is not, but it seems)
Does anyone have ever got this error message, and if so, know where it came from ?
EDIT 1
Here's what I found on the boost library, the error is :
no_recovery = BOOST_ASIO_NETDB_ERROR(NO_RECOVERY)
But can't figure out what this is...
EDIT 2
Just so you know everything about my problem, here the design :
I have only one io_service.
Everytime a user is connecting, an async_read is starting, waiting for something to read.
When it reads something, most of the time, it is doing some work on a thread (coming from a pool), and write something synchronously back to the user. (using boost write).
Even since boost 1.37 claims that synchronous write is thread safe, I'm really worried about the fact that it is coming from this.
If the user sends different message really quick, it can happen that async_read and write are called simultaneously, can it does any harm ?
EDIT 3
Here's some portion of my code asked by Dave S :
void TCPConnection::listenForCMD() {
boost::asio::async_read(m_socket,
boost::asio::buffer(m_inbound_data, 3),
boost::asio::transfer_at_least(3),
boost::bind(&TCPConnection::handle_cmd,
shared_from_this(),
boost::asio::placeholders::error)
);
}
void TCPConnection::handle_cmd(const boost::system::error_code& error) {
if (error) {
std::cout << "ERROR READING : " << error.message() << std::endl;
return;
}
std::string str1(m_inbound_data);
std::string str = str1.substr(0,3);
std::cout << "COMMAND FUNCTION: " << str << std::endl;
a_fact func = CommandFactory::getInstance()->getFunction(str);
if (func == NULL) {
std::cout << "command doesn't exist: " << str << std::endl;
return;
}
protocol::in::Command::pointer cmd = func(m_socket, client);
cmd->setCallback(boost::bind(&TCPConnection::command_is_done,
shared_from_this()));
cmd->parse();
}
m_inbound_data is a char[3]
Once cmd->parse() is done, it will call a callback command_is_done
void TCPConnection::command_is_done() {
m_inbound_data[0] = '0';
m_inbound_data[1] = '0';
m_inbound_data[2] = '0';
listenForCMD();
}
The error occurs in the handle_cmd when checking for error at the first line.
As I said before, the cmd->parse() will parse the command it just got, sometime lauching blocking code in a thread coming from a pool. On this thread it sends back data to the client with a synchronous write.
IMPORTANT THING : The callback command_is_done will always be called before the said thread is launched. this means that listenForCMD is already called when the thread may send something back to the client in synchronous write. Therefore my first worries.
When it reads something, most of the time, it is doing some work on a
thread (coming from a pool), and write something synchronously back to
the user. (using boost write). Even since boost 1.37 claims that
synchronous write is thread safe, I'm really worried about the fact
that it is coming from this.
Emphasis added by me, this is incorrect. A single boost::asio::tcp::socket is not thread safe, the documentation is very clear
Thread Safety
Distinct objects: Safe.
Shared objects: Unsafe.
It is also very odd to mix async_read() with a blocking write().