I have a Producer running on the main thread and a Consumer running on its own thread (std::thread). I have a simple program that sends a message using the Producer and then puts the main thread to sleep before trying to send another message.
Whenever my main thread goes to sleep the program just exists. No exception nothing. Same thing happens when I try to properly stop and delete my Consumer/Producer. Clearly I'm doing something wrong but I cannot tell what since I am not getting any kind of error out of my program. The last log message I see is the message I print right before putting the main thread to sleep.
I've put try-catch inside main and inside my Consumer thread. I've also called std::set_terminate and added logging in there. When my program exits the try-catch nor the terminate catch anything.
Any suggestions?
UPDATE #1 [Source]
As Sid S pointed out I'm missing the obvious source.
main.cc
int main(int argc, char** argv) {
std::cout << "% Main started." << std::endl;
std::set_terminate([](){
std::cerr << "% Terminate occurred in main." << std::endl;
abort();
});
try {
using com::anya::core::networking::KafkaMessenger;
using com::anya::core::common::MessengerCode;
KafkaMessenger messenger;
auto promise = std::promise<bool>();
auto future = promise.get_future();
messenger.Connect([&promise](MessengerCode code, std::string& message) {
promise.set_value(true);
});
future.get();
std::cout << "% Main connection successful." << std::endl;
// Produce 5 messages 5 seconds apart.
int number_of_messages_sent = 0;
while (number_of_messages_sent < 5) {
std::stringstream message;
message << "message-" << number_of_messages_sent;
auto message_send_promise = std::promise<bool>();
auto message_send_future = message_send_promise.get_future();
messenger.SendMessage(message.str(), [&message_send_promise](MessengerCode code) {
std::cout << "% Main message sent" << std::endl;
message_send_promise.set_value(true);
});
message_send_future.get();
number_of_messages_sent++;
std::cout << "% Main going to sleep for 5 seconds." << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(5));
}
// Disconnect from Kafka and cleanup.
auto disconnect_promise = std::promise<bool>();
auto disconnect_future = disconnect_promise.get_future();
messenger.Disconnect([&disconnect_promise](MessengerCode code, std::string& message) {
disconnect_promise.set_value(true);
});
disconnect_future.get();
std::cout << "% Main disconnect complete." << std::endl;
} catch (std::exception& exception) {
std::cerr << "% Exception caught in main with error: " << exception.what() << std::endl;
exit(1);
}
std::cout << "% Main exited." << std::endl;
exit(0);
}
KafkaMessenger.cc [Consumer Section]
void KafkaMessenger::Connect(std::function<void(MessengerCode , std::string&)> impl) {
assert(!running_.load());
running_.store(true);
// For the sake of brevity I've removed a whole bunch of Kafka configuration setup from the sample code.
RdKafka::ErrorCode consumer_response = consumer_->start(topic_for_consumer, 0, RdKafka::Topic::OFFSET_BEGINNING);
if (consumer_response != RdKafka::ERR_NO_ERROR) {
running_.store(false);
delete consumer_;
delete producer_;
error = RdKafka::err2str(consumer_response);
impl(MessengerCode::CONNECT_FAILED, error);
}
auto consumer_thread_started_promise = std::promise<bool>();
auto consumer_thread_started_future = consumer_thread_started_promise.get_future();
consumer_thread_ = std::thread([this, &topic_for_consumer, &consumer_thread_started_promise]() {
try {
std::cout << "% Consumer thread started." << std ::endl;
consumer_thread_started_promise.set_value(true);
while (running_.load()) {
RdKafka::Message* message = consumer_->consume(topic_for_consumer, 0, 5000);
switch (message->err()) {
case RdKafka::ERR_NO_ERROR: {
std::string message_string((char*) message->payload());
std::cout << "% Consumer received message: " << message_string << std::endl;
delete message;
break;
}
default:
std::cerr << "% Consumer consumption failed: " << message->errstr() << " error code=" << message->err() << std::endl;
break;
}
}
std::cout << "% Consumer shutting down." << std::endl;
if (consumer_->stop(topic_for_consumer, 0) != RdKafka::ERR_NO_ERROR) {
std::cerr << "% Consumer error while trying to stop." << std::endl;
}
} catch (std::exception& exception) {
std::cerr << "% Caught exception in consumer thread: " << exception.what() << std::endl;
}
});
consumer_thread_started_future.get();
std::string message("Consumer connected");
impl(MessengerCode::CONNECT_SUCCESS, message);
}
KafkaMessenger.cc [Producer Section]
void KafkaMessenger::SendMessage(std::string message, std::function<void(MessengerCode)> impl) {
assert(running_.load());
std::cout << "% Producer sending message." << std::endl;
RdKafka::ErrorCode producer_response = producer_->produce(
producer_topic_,
RdKafka::Topic::PARTITION_UA,
RdKafka::Producer::RK_MSG_COPY,
static_cast<void*>(&message), message.length(), nullptr, nullptr);
switch (producer_response) {
case RdKafka::ERR_NO_ERROR: {
std::cout << "% Producer Successfully sent (" << message.length() << " bytes)" << std::endl;
impl(MessengerCode::MESSAGE_SEND_SUCCESS);
break;
}
case RdKafka::ERR__QUEUE_FULL: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
case RdKafka::ERR__UNKNOWN_PARTITION: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
case RdKafka::ERR__UNKNOWN_TOPIC: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
default: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
}
}
Output
When I run the main method this is the output that I see in the console.
% Main started.
% Consumer thread started.
% Main connection successful.
% Producer sending message.
% Producer Successfully sent (9 bytes)
% Main message sent
% Main going to sleep for 5 seconds.
% Consumer received message: message-
After closer examination I do not think that the sleep is the cause of this because when I remove the sleep this still happens. As you can see in the last log line the Consumer prints the message that it received with the last character truncated. The payload should read message-0. So something somewhere is dying.
UPDATE #2 [Stack Trace]
I came across this old but very useful post about catching signals and printing out the stack. I implemented this solution and now I can see more information about where things are crashing.
Error: signal 11:
0 main 0x00000001012e4eec _ZN3com4anya4core10networking7handlerEi + 28
1 libsystem_platform.dylib 0x00007fff60511f5a _sigtramp + 26
2 ??? 0x0000000000000000 0x0 + 0
3 main 0x00000001012f2866 rd_kafka_poll_cb + 838
4 main 0x0000000101315fee rd_kafka_q_serve + 590
5 main 0x00000001012f5d46 rd_kafka_flush + 182
6 main 0x00000001012e7f1a _ZN3com4anya4core10networking14KafkaMessenger10DisconnectENSt3__18functionIFvNS1_6common13MessengerCodeENS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEEEE + 218
7 main 0x00000001012dbc45 main + 3221
8 libdyld.dylib 0x00007fff60290115 start + 1
9 ??? 0x0000000000000001 0x0 + 1
As part of my shutdown method I call producer_->flush(1000) and this causes the resulting stack trace. If I remove it then the shutdown is fine. Clearly I am misconfiguring something that is then causing this seg-fault when I attempt to flush.
UPDATE #3 [Solution]
So turns out that my classes that handled logging of Kafka events and delivery reports were scoped to a method. This was a problem because the librdkafka library takes these by reference so when my main runner method exited and cleanup commenced these objects disappeared. I scoped the loggers to the class level and this fixed the crash.
Kafka message payloads are just binary data and unless you send a string with a trailing nul-byte it will not include such a nul-byte, this causes your std::string constructor to read into adjacent memory looking for a nul, possibly accessing unmapped memory which will cause your application to crash, or at least garbel up your terminal.
Use the message length in conjunction with the payload to construct a std::string that is limited to the actual number of bytes, it will still not be safe to print, but it is a start:
std::string message_string((char*) message->payload(), message->len());
Related
I am currently trying to implement a process-pool that can be communicated with by the parent process. No child processes should exit until the parent tells thems so (likely using a signal). Upon now a couple of questions arose in my head and I am happy to get some input using my MWE:
#include <iostream>
#include <boost/thread.hpp>
#include <boost/process/async_pipe.hpp>
#include <boost/asio.hpp>
#include <boost/array.hpp>
static const std::size_t process_count = 3;
static void start_reading(boost::process::async_pipe& ap)
{
static boost::array<char, 256> buf;
ap.async_read_some(boost::asio::buffer(buf.data(), buf.size()), [](const boost::system::error_code& error, std::size_t bytes_transferred)
{
if(!error)
{
std::cout << "received " << bytes_transferred << " from pid " << getpid() << " " << buf[0] << "...." << std::endl;
// perform some heavy computations here..
}
});
}
static void start_writing(boost::process::async_pipe& ap)
{
boost::array<char, 256> buf;
buf.fill('A');
ap.async_write_some(boost::asio::buffer(buf.data(), buf.size()), [&ap](const boost::system::error_code& error, std::size_t bytes_transferred)
{
if(!error)
{
std::cout << "parent " << getpid() << " sent " << bytes_transferred << " to [" << ap.native_source() << "," << ap.native_sink() << "]" << std::endl;
}
});
}
int main()
{
try
{
boost::asio::io_service io_context;
// prevent the associated executor from stopping
boost::asio::executor_work_guard<boost::asio::io_context::executor_type> guard = boost::asio::make_work_guard(io_context);
pid_t main_process = getpid();
std::cout << "before forks " << main_process << std::endl;
std::vector<boost::process::async_pipe> pipes;
pipes.reserve(process_count);
for(std::size_t i = 0; i < process_count; i++)
{
pipes.emplace_back(io_context);
io_context.notify_fork(boost::asio::io_service::fork_prepare);
pid_t pid = fork();
if(pid == 0)
{
io_context.notify_fork(boost::asio::io_service::fork_child);
// perform some costly initialization here...
boost::process::async_pipe& ap = pipes[i];
std::cout << "child " << getpid() << " listening to [" << ap.native_source() << "," << ap.native_sink() << "]" << std::endl;
start_reading(ap);
io_context.run();
}
else if(pid > 0)
{
io_context.notify_fork(boost::asio::io_service::fork_parent);
}
else
{
std::cerr << "fork() failed" << std::endl;
}
}
// only parent gets there
start_writing(pipes[0]);
start_writing(pipes[0]);
start_writing(pipes[1]);
start_writing(pipes[2]);
io_context.run();
}
catch(const std::exception& e)
{
std::cerr << e.what() << std::endl;
}
return 1;
}
The program outputs
before forks 15603
child 15611 listening to [8,9]
child 15612 listening to [10,11]
parent 15603 sent 256 to [8,9]
parent 15603 sent 256 to [8,9]
parent 15603 sent 256 to [10,11]
parent 15603 sent 256 to [21,22]
received 256 from pid 15612 A....
received 256 from pid 15611 A....
child 15613 listening to [21,22]
received 256 from pid 15613 A....
My main concern at the time is how to infinitely read data in the worker processes (the childs) as long as the process is not already busy. As soon as the worker gets into the handler from async_read_some, it performs some computations as stated in the comment (might take a few seconds). While doing this, the process should and will block, afterwards I want to notify my parent to be ready again and accept new reads over the pipe. So far I don't have any profound idea how to do this. Notifying the parent from the child is not necessary per-se, but the parent needs to keep track of all idle child processes all time, so it can send new input via the corresponding pipe.
Apart from that there is one thing I didn't get yet:
Notice that boost::array<char, 256> buf; is static in start_reading. If I remove the static modifier I never get into the completion handler of async_read_some, why is that?
EDIT:
calling start_reading again in completion routine, will continue read. However, without the parent process "knowing" it.
EDIT2:
Until now I figured out one possible way (guess there are several) that might work. I am not finished with the implementation but the shared mutex works as expected. Here is some pseudo-code:
process_pool
{
worker get_next_worker()
ScopedLock(memory_mapped_mutex);
free_worker = *available.rbegin()
available.pop_back()
return free_worker;
memory_mapped_vec<worker> available;
};
server::completion_handler_async_connect()
get_next_worker().socket().write(request)
worker::completion_handler_async_read()
// do something very long before locking
ScopedLock(memory_mapped_mutex);
process_pool::available.push_back(self);
Apart from that there is one thing I didn't get yet: Notice that boost::array<char, 256> buf; is static in start_reading. If I remove the static modifier I never get into the completion handler of async_read_some, why is that?
That's because the buf is a local varuable, and it doesn't exist after start_reading exits. However async_read (or any other async_XXXX call) returns immediately, without waiting for the operation to complete. So if the buffer doesn't persist then you are writing into a a dangling reference to unspecified stack space, leading to Undefined Behaviour.
As for communicating back and forth, that is unnecessarily complicated between processes. Is there any reason you can't use multi-threading? That way all workers can simply monitor a shared queue.
Of course you can setup the same with a queue shared between processes (in which case I would advise against doing it via pipes with Asio, but instead use message_queue.
i am trying to generate a class for reading from a specific serial device.
For the start process it is necessary to send a char '1', then i have to wait for a response (254 and 255).
Within a period of 10 milliseconds i must sent the next command to the device, but this time the command length is 5 char.
When the communication hasn´t been send in the correct time, the device will run into a timeout and is sending me 255,255,255,2,4.
So i need different sizes of reading and the most importing thing for me is a timeout for the communication, cause otherwise the system will stop working by missing some values.
Therefore i have tried to generate a class using boost::asio::async_read.
It is working in the correct way, i can define the timeout,also the size of bytes to be read. When the device isn´t sending the correct size, the routine is going to be left.
But only the first time, when i try it a second time, the device isn´t sending me something. I have tried to use .open again, but it isn´t solving the issue. Also deactivating the close-function isn´t solving the issue, then the routine is running into an error.
Can someone give me a small tip for my issue. Maybe i am to blind to see my problem.... Bernd
ConnectionWithTimeout::ConnectionWithTimeout(int timeout_)
: timer_(io_service_, boost::posix_time::milliseconds(timeout_))
, serial_port_(io_service_) {
}
void ConnectionWithTimeout::ReadNumberOfChars(int numberOfCharactersToRead_)
{
buffer_.resize(numberOfCharactersToRead_);
for (int i = 0; i < numberOfCharactersToRead_; ++i) {
std::cout << "Clear Buffer[" << i << "]" << std::endl;
buffer_[i] = 0;
}
timer_.async_wait(boost::bind(&::ConnectionWithTimeout::Stop, this));
//async read from serial port
boost::asio::async_read(serial_port_, boost::asio::buffer(buffer_),
boost::bind(&ConnectionWithTimeout::ReadHandle, this,
boost::asio::placeholders::error));
io_service_.run();
}
void ConnectionWithTimeout::Stop() {
std::cout << "Connection is being closed." << std::endl;
serial_port_.close();
std::cout << "Connection has been closed." << std::endl;
}
void ConnectionWithTimeout::ReadHandle(const boost::system::error_code& ec) {
if (ec) {
std::cout << "The amount of data is to low: " << ec << std::endl;
for (std::vector<char>::iterator it = buffer_.begin();
it != buffer_.end(); ++it)
{
std::cout << int(*it) << std::endl;
}
}
else {
std::cout << "The amount of data is correct: " << ec << std::endl;
for (std::vector<char>::iterator it = buffer_.begin(); it !=
buffer_.end(); ++it)
{
std::cout << int(*it) << std::endl;
}
}
}
I am seeing unusual signal numbers (for example 50, 80 or 117) from the following code when waiting for a child process to terminate. I am only seeing this from one particular child process, and I have no access to the process source code and it only happens some of the time.
I want to know what these unusual values mean, given NSIG == 32, and where I can find some documentation in the headers or man pages?
Note that this code runs in a loop sending progressively more menacing signals until the child terminates.
int status, signal;
if (waitpid(m_procId, &status, WNOHANG) < 0) {
LOGERR << "Failed to wait for process " << name() << ": " <<
strerror(errno) << " (" << errno << ")";
break;
} else if (WIFEXITED(status)) {
m_exitCode = WEXITSTATUS(status);
terminated = true;
LOGINF << "Process " << name() << " terminated with exit code " << m_exitCode;
} else if (WIFSIGNALED(status)) {
signal = WTERMSIG(status); // !!! signal is sometimes 50, 80 or 117 !!!
terminated = true;
LOGINF << "Process " << name() << " terminated by signal " << signal;
} else {
LOGWRN << "Process " << name() << " changed state but did not terminate. status=0x" <<
hex << status;
}
This is running under OSX 10.8.4, but I have also seen it in 10.9 GM seed.
EDIT Modifying the code as below makes the code more robust, however sometimes the child process gets orphaned as I guess the loop doesn't do enough to kill the child process.
else if (WIFSIGNALED(status)) {
signal = WTERMSIG(status);
if (signal < NSIG) {
terminated = true;
LOGINF << "Process " << name() << " terminated by signal " << signal;
} else {
LOGWRN << "Process " << name() << " produced unusual signal " << signal
<< "; assuming it's not terminated";
}
}
Note this code is part of the Process::unload() method of this class.
From the OS X manpage for waitpid, when specifing WNOHANG, you should check for a return of 0:
When the WNOHANG option is specified and no processes wish to report status, wait4() returns a process
id of 0.
The waitpid() call is identical to wait4() with an rusage value of zero. The older wait3() call is the
same as wait4() with a pid value of -1.
The code posted does not check for this, which suggests to me that the value of status is likely junk (the value of the int is never initialized). This could cause what you are seeing.
EDIT: status is indeed only set when waitpid returns > 0.
I am sending
std::string cmdStr = "setxkbmap us";
int res = system( cmdStr.c_str() );
and the result is
res: 65280
What can be the problem?
That value indicates that the child process exited normally with a value of 255.
This could happen if:
/bin/sh couldn't find setxkbmap. (note: I might be wrong on this one. On my PC, /bin/sh returns 127 in that case.)
setxkbmap couldn't open the X server at $DISPLAY, including if DISPLAY is unset
I'm sure that there are many other possibilities. Check stdout for error messages.
When interpreting the return value from system on Linux, do this:
#include <sys/wait.h>
int res = system(foo);
if(WIFEXITED(res)) {
std::cout << "Normal exit: " << WEXITSTATUS(res) << "\n";
} else {
if(WIFSIGNALED(res)) {
std::cout << "Killed by signal #" << WTERMSIG(status);
if(WCOREDUMP(res)) {
std::cout << " Core dumped";
}
std::cout << "\n";
} else {
std::cout << "Unknown failure\n";
}
}
I have blocking task which will be performed by find_the_question() function. However, I do not want thread executing this function take more than 10 seconds. So in case it takes more than 10 seconds, I want to close that thread with cleaning all the resources.
I tried to write a code for that, but somehow I am not able to get a interrupt in find_the_question() function if thread takes more than 10 seconds. Could you please tell me what am I doing wrong?
void find_the_question(std::string value)
{
//allocate x resources
try{
//do some process on resources
sleep(14);
//clean resources
}
catch(boost::thread_interrupted const& )
{
//clean resources
std::cout << "Worker thread interrupted" << std::endl;
}
}
int main()
{
boost::posix_time::time_duration timeout = boost::posix_time::milliseconds(10000);
std::cout << "In main" << std::endl;
boost::thread t1(find_the_question, "Can you block me");
t1.interrupt();
if (t1.timed_join(timeout))
{
//finished
std::cout << "Worker thread finished" << std::endl;
}
else
{
//Not finished;
std::cout << "Worker thread not finished" << std::endl;
}
std::cout << "In main end" << std::endl;
}
Output:
If t1 takes more than 10 seconds to complete, I am getting following console output.
std::cout << "In main" << std::endl;
std::cout << "Worker thread not finished" << std::endl;
std::cout << "In main end" << std::endl;
whereas, I am expecting following output
std::cout << "In main" << std::endl;
std::cout << "Worker thread interrupted" << std::endl;
std::cout << "Worker thread not finished" << std::endl;
std::cout << "In main end" << std::endl;
Could you please tell me what am I doing wrong.
Thanks in advance
For using boost::thread::interrupt(), you have to use boost::thread::sleep() for it to work.
A running thread can be interrupted by invoking the interrupt() member
function of the corresponding boost::thread object. When the
interrupted thread next executes one of the specified interruption
points (or if it is currently blocked whilst executing one) with
interruption enabled, then a boost::thread_interrupted exception will
be thrown in the interrupted thread. If not caught, this will cause
the execution of the interrupted thread to terminate. As with any
other exception, the stack will be unwound, and destructors for
objects of automatic storage duration will be executed
Predefined interruption points:
The following functions are interruption points, which will throw
boost::thread_interrupted if interruption is enabled for the current
thread, and interruption is requested for the current thread:
boost::thread::join()
boost::thread::timed_join()
boost::condition_variable::wait()
boost::condition_variable::timed_wait()
boost::condition_variable_any::wait()
boost::condition_variable_any::timed_wait()
boost::thread::sleep()
boost::this_thread::sleep()
boost::this_thread::interruption_point()