Unusual signal numbers from WTERMSIG macro after waitpid() - c++

I am seeing unusual signal numbers (for example 50, 80 or 117) from the following code when waiting for a child process to terminate. I am only seeing this from one particular child process, and I have no access to the process source code and it only happens some of the time.
I want to know what these unusual values mean, given NSIG == 32, and where I can find some documentation in the headers or man pages?
Note that this code runs in a loop sending progressively more menacing signals until the child terminates.
int status, signal;
if (waitpid(m_procId, &status, WNOHANG) < 0) {
LOGERR << "Failed to wait for process " << name() << ": " <<
strerror(errno) << " (" << errno << ")";
break;
} else if (WIFEXITED(status)) {
m_exitCode = WEXITSTATUS(status);
terminated = true;
LOGINF << "Process " << name() << " terminated with exit code " << m_exitCode;
} else if (WIFSIGNALED(status)) {
signal = WTERMSIG(status); // !!! signal is sometimes 50, 80 or 117 !!!
terminated = true;
LOGINF << "Process " << name() << " terminated by signal " << signal;
} else {
LOGWRN << "Process " << name() << " changed state but did not terminate. status=0x" <<
hex << status;
}
This is running under OSX 10.8.4, but I have also seen it in 10.9 GM seed.
EDIT Modifying the code as below makes the code more robust, however sometimes the child process gets orphaned as I guess the loop doesn't do enough to kill the child process.
else if (WIFSIGNALED(status)) {
signal = WTERMSIG(status);
if (signal < NSIG) {
terminated = true;
LOGINF << "Process " << name() << " terminated by signal " << signal;
} else {
LOGWRN << "Process " << name() << " produced unusual signal " << signal
<< "; assuming it's not terminated";
}
}
Note this code is part of the Process::unload() method of this class.

From the OS X manpage for waitpid, when specifing WNOHANG, you should check for a return of 0:
When the WNOHANG option is specified and no processes wish to report status, wait4() returns a process
id of 0.
The waitpid() call is identical to wait4() with an rusage value of zero. The older wait3() call is the
same as wait4() with a pid value of -1.
The code posted does not check for this, which suggests to me that the value of status is likely junk (the value of the int is never initialized). This could cause what you are seeing.
EDIT: status is indeed only set when waitpid returns > 0.

Related

QProcess writing to stdin of child process failing

I have a qt c++ gui application A (parent) that sometimes starts a second qt c++ gui application B (child). parent 'A' communicates with child 'B' by writing bytes to stdin of qprocess B and also receiving bytes from stdout of B.
This code (snippet below) works perfectly on some machine but not on others. Target environment is windows 10 and all machines tested were running windows 10.
On the machines where application is failing I see the following on logs:
Critical error: QWindowsPipeWriter::write failed. (The handle is invalid.)
Is there a windows 10 setting or some other operating system setting that would prevent Parent process from writing to stdin of child process ?
QProcess* child_pr = new QProcess();
QString program = "C:/Program Files/Child/child_pr.exe";
child_pr->setEnvironment(QProcess::systemEnvironment());
child_pr->setProcessChannelMode(QProcess::MergedChannels);//.. merges output of running process to standard output channel
child_pr->setProgram(program);
connect(child_pr, &QProcess::readyReadStandardOutput, this, &ParentAppClass::updateReceived);
connect(child_pr, SIGNAL(finished(int , QProcess::ExitStatus)), this, SLOT(finishReceived(int, QProcess::ExitStatus)));
QFileInfo fileInfo(program);
child_pr->setWorkingDirectory(fileInfo.absolutePath());
child_pr->start( QProcess::Unbuffered | QProcess::ReadWrite);
bool started = child_pr->waitForStarted();
if (!started) {
qDebug() << __FUNCTION__ << __LINE__ << " Child not started as a process." << "\n";
}
int counter = 0;
while (false == (receivedAckFlags.value(child_pr->processId()))) { // note : slot method "updateReceived" handles whether ack is received or not and updates "receivedAckFlags"
child_pr->write(QString("spawned"));
counter++;
bool writeSuccess = child_pr->waitForBytesWritten();
if (!writeSuccess) {
qDebug() << __FUNCTION__ << __LINE__ << " Writing spawned message timed out or other issue." << "\n";
}
qDebug() << __FUNCTION__ << __LINE__ << " Sent spawned message " << counter << " times." << "\n";
QThread::sleep(1); //.. wait one sec before write again.
QApplication::processEvents();//.. to make sure slots for handling std in read are called
}

librdkafka Program Quits with No Error

I have a Producer running on the main thread and a Consumer running on its own thread (std::thread). I have a simple program that sends a message using the Producer and then puts the main thread to sleep before trying to send another message.
Whenever my main thread goes to sleep the program just exists. No exception nothing. Same thing happens when I try to properly stop and delete my Consumer/Producer. Clearly I'm doing something wrong but I cannot tell what since I am not getting any kind of error out of my program. The last log message I see is the message I print right before putting the main thread to sleep.
I've put try-catch inside main and inside my Consumer thread. I've also called std::set_terminate and added logging in there. When my program exits the try-catch nor the terminate catch anything.
Any suggestions?
UPDATE #1 [Source]
As Sid S pointed out I'm missing the obvious source.
main.cc
int main(int argc, char** argv) {
std::cout << "% Main started." << std::endl;
std::set_terminate([](){
std::cerr << "% Terminate occurred in main." << std::endl;
abort();
});
try {
using com::anya::core::networking::KafkaMessenger;
using com::anya::core::common::MessengerCode;
KafkaMessenger messenger;
auto promise = std::promise<bool>();
auto future = promise.get_future();
messenger.Connect([&promise](MessengerCode code, std::string& message) {
promise.set_value(true);
});
future.get();
std::cout << "% Main connection successful." << std::endl;
// Produce 5 messages 5 seconds apart.
int number_of_messages_sent = 0;
while (number_of_messages_sent < 5) {
std::stringstream message;
message << "message-" << number_of_messages_sent;
auto message_send_promise = std::promise<bool>();
auto message_send_future = message_send_promise.get_future();
messenger.SendMessage(message.str(), [&message_send_promise](MessengerCode code) {
std::cout << "% Main message sent" << std::endl;
message_send_promise.set_value(true);
});
message_send_future.get();
number_of_messages_sent++;
std::cout << "% Main going to sleep for 5 seconds." << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(5));
}
// Disconnect from Kafka and cleanup.
auto disconnect_promise = std::promise<bool>();
auto disconnect_future = disconnect_promise.get_future();
messenger.Disconnect([&disconnect_promise](MessengerCode code, std::string& message) {
disconnect_promise.set_value(true);
});
disconnect_future.get();
std::cout << "% Main disconnect complete." << std::endl;
} catch (std::exception& exception) {
std::cerr << "% Exception caught in main with error: " << exception.what() << std::endl;
exit(1);
}
std::cout << "% Main exited." << std::endl;
exit(0);
}
KafkaMessenger.cc [Consumer Section]
void KafkaMessenger::Connect(std::function<void(MessengerCode , std::string&)> impl) {
assert(!running_.load());
running_.store(true);
// For the sake of brevity I've removed a whole bunch of Kafka configuration setup from the sample code.
RdKafka::ErrorCode consumer_response = consumer_->start(topic_for_consumer, 0, RdKafka::Topic::OFFSET_BEGINNING);
if (consumer_response != RdKafka::ERR_NO_ERROR) {
running_.store(false);
delete consumer_;
delete producer_;
error = RdKafka::err2str(consumer_response);
impl(MessengerCode::CONNECT_FAILED, error);
}
auto consumer_thread_started_promise = std::promise<bool>();
auto consumer_thread_started_future = consumer_thread_started_promise.get_future();
consumer_thread_ = std::thread([this, &topic_for_consumer, &consumer_thread_started_promise]() {
try {
std::cout << "% Consumer thread started." << std ::endl;
consumer_thread_started_promise.set_value(true);
while (running_.load()) {
RdKafka::Message* message = consumer_->consume(topic_for_consumer, 0, 5000);
switch (message->err()) {
case RdKafka::ERR_NO_ERROR: {
std::string message_string((char*) message->payload());
std::cout << "% Consumer received message: " << message_string << std::endl;
delete message;
break;
}
default:
std::cerr << "% Consumer consumption failed: " << message->errstr() << " error code=" << message->err() << std::endl;
break;
}
}
std::cout << "% Consumer shutting down." << std::endl;
if (consumer_->stop(topic_for_consumer, 0) != RdKafka::ERR_NO_ERROR) {
std::cerr << "% Consumer error while trying to stop." << std::endl;
}
} catch (std::exception& exception) {
std::cerr << "% Caught exception in consumer thread: " << exception.what() << std::endl;
}
});
consumer_thread_started_future.get();
std::string message("Consumer connected");
impl(MessengerCode::CONNECT_SUCCESS, message);
}
KafkaMessenger.cc [Producer Section]
void KafkaMessenger::SendMessage(std::string message, std::function<void(MessengerCode)> impl) {
assert(running_.load());
std::cout << "% Producer sending message." << std::endl;
RdKafka::ErrorCode producer_response = producer_->produce(
producer_topic_,
RdKafka::Topic::PARTITION_UA,
RdKafka::Producer::RK_MSG_COPY,
static_cast<void*>(&message), message.length(), nullptr, nullptr);
switch (producer_response) {
case RdKafka::ERR_NO_ERROR: {
std::cout << "% Producer Successfully sent (" << message.length() << " bytes)" << std::endl;
impl(MessengerCode::MESSAGE_SEND_SUCCESS);
break;
}
case RdKafka::ERR__QUEUE_FULL: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
case RdKafka::ERR__UNKNOWN_PARTITION: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
case RdKafka::ERR__UNKNOWN_TOPIC: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
default: {
std::cerr << "% Sending message failed: " << RdKafka::err2str(producer_response) << std::endl;
impl(MessengerCode::MESSAGE_SEND_FAILED);
break;
}
}
}
Output
When I run the main method this is the output that I see in the console.
% Main started.
% Consumer thread started.
% Main connection successful.
% Producer sending message.
% Producer Successfully sent (9 bytes)
% Main message sent
% Main going to sleep for 5 seconds.
% Consumer received message: message-
After closer examination I do not think that the sleep is the cause of this because when I remove the sleep this still happens. As you can see in the last log line the Consumer prints the message that it received with the last character truncated. The payload should read message-0. So something somewhere is dying.
UPDATE #2 [Stack Trace]
I came across this old but very useful post about catching signals and printing out the stack. I implemented this solution and now I can see more information about where things are crashing.
Error: signal 11:
0 main 0x00000001012e4eec _ZN3com4anya4core10networking7handlerEi + 28
1 libsystem_platform.dylib 0x00007fff60511f5a _sigtramp + 26
2 ??? 0x0000000000000000 0x0 + 0
3 main 0x00000001012f2866 rd_kafka_poll_cb + 838
4 main 0x0000000101315fee rd_kafka_q_serve + 590
5 main 0x00000001012f5d46 rd_kafka_flush + 182
6 main 0x00000001012e7f1a _ZN3com4anya4core10networking14KafkaMessenger10DisconnectENSt3__18functionIFvNS1_6common13MessengerCodeENS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEEEEE + 218
7 main 0x00000001012dbc45 main + 3221
8 libdyld.dylib 0x00007fff60290115 start + 1
9 ??? 0x0000000000000001 0x0 + 1
As part of my shutdown method I call producer_->flush(1000) and this causes the resulting stack trace. If I remove it then the shutdown is fine. Clearly I am misconfiguring something that is then causing this seg-fault when I attempt to flush.
UPDATE #3 [Solution]
So turns out that my classes that handled logging of Kafka events and delivery reports were scoped to a method. This was a problem because the librdkafka library takes these by reference so when my main runner method exited and cleanup commenced these objects disappeared. I scoped the loggers to the class level and this fixed the crash.
Kafka message payloads are just binary data and unless you send a string with a trailing nul-byte it will not include such a nul-byte, this causes your std::string constructor to read into adjacent memory looking for a nul, possibly accessing unmapped memory which will cause your application to crash, or at least garbel up your terminal.
Use the message length in conjunction with the payload to construct a std::string that is limited to the actual number of bytes, it will still not be safe to print, but it is a start:
std::string message_string((char*) message->payload(), message->len());

Linux: fork & execv, wait for child process hangs

I wrote a helper function to start a process using fork() and execv() inspired by this answer. It is used to start e.g. mysqldump to make a database backup.
The code works totally fine in a couple of different locations with different programs.
Now I hit one constellation where it fails:
It is a call to systemctl to stop a unit. Running systemctl works, the unit is stopped. But in the intermediate process, when wait()ing for the child process, wait() hangs until the timeout process ends.
If I check, if the worker process finished with kill(), I can tell that it did.
Important: The program does not misbehave or seg fault, besides that the wait() does not signal the end of the worker process!
Is there anything in my code (see below) that is incorrect that could trigger that behavior?
I've read Threads and fork(): think twice before mixing them but I cannot find anything in there that relates to my problem.
What's strange:
Deep, deep, deep in the program JSON-RPC is used. If I deactivate the code using the JSON-RPC everything works fine!?
Environment:
The program that uses the function is a multi-threaded application. Signals are blocked for all threads. The main threads handles signals via sigtimedwait().
Code (production code in which logging got traded for output via std::cout) with sample main function:
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
namespace {
bool checkStatus(const int status) {
return( WIFEXITED(status) && ( WEXITSTATUS(status) == 0 ) );
}
}
bool startProcess(const char* const path, const char* const argv[], const unsigned int timeoutInSeconds, pid_t& processId, const int* const fileDescriptor) {
auto result = true;
const pid_t intermediatePid = fork();
if(intermediatePid == 0) {
// intermediate process
std::cout << "Intermediate process: Started (" << getpid() << ")." << std::endl;
const pid_t workerPid = fork();
if(workerPid == 0) {
// worker process
if(fileDescriptor) {
std::cout << "Worker process: Redirecting file descriptor to stdin." << std::endl;
const auto dupResult = dup2(*fileDescriptor, STDIN_FILENO);
if(-1 == dupResult) {
std::cout << "Worker process: Duplication of file descriptor failed." << std::endl;
_exit(EXIT_FAILURE);
}
}
execv(path, const_cast<char**>(argv));
std::cout << "Intermediate process: Worker failed!" << std::endl;
_exit(EXIT_FAILURE);
} else if(-1 == workerPid) {
std::cout << "Intermediate process: Starting worker failed!" << std::endl;
_exit(EXIT_FAILURE);
}
const pid_t timeoutPid = fork();
if(timeoutPid == 0) {
// timeout process
std::cout << "Timeout process: Started (" << getpid() << ")." << std::endl;
sleep(timeoutInSeconds);
std::cout << "Timeout process: Finished." << std::endl;
_exit(EXIT_SUCCESS);
} else if(-1 == timeoutPid) {
std::cout << "Intermediate process: Starting timeout process failed." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition never evaluated to true in my tests.
const auto killResult = kill(workerPid, 0);
if((-1 == killResult) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Waiting for child processes." << std::endl;
int status = -1;
const pid_t exitedPid = wait(&status);
// ---------------------------------------
// This code is only used for double checking if the worker is still running.
// The if condition evaluates to true in the case of an error.
const auto killResult2 = kill(workerPid, 0);
if((-1 == killResult2) && (ESRCH == errno)) {
std::cout << "Intermediate process: Worker is not running." << std::endl;
}
// ---------------------------------------
std::cout << "Intermediate process: Child process finished. Status: " << status << "." << std::endl;
if(exitedPid == workerPid) {
std::cout << "Intermediate process: Killing timeout process." << std::endl;
kill(timeoutPid, SIGKILL);
} else {
std::cout << "Intermediate process: Killing worker process." << std::endl;
kill(workerPid, SIGKILL);
std::cout << "Intermediate process: Waiting for worker process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(EXIT_FAILURE);
}
std::cout << "Intermediate process: Waiting for timeout process to terminate." << std::endl;
wait(nullptr);
std::cout << "Intermediate process: Finished." << std::endl;
_exit(checkStatus(status) ? EXIT_SUCCESS : EXIT_FAILURE);
} else if(-1 == intermediatePid) {
// error
std::cout << "Parent process: Error starting intermediate process!" << std::endl;
result = false;
} else {
// parent process
std::cout << "Parent process: Intermediate process started. PID: " << intermediatePid << "." << std::endl;
processId = intermediatePid;
}
return(result);
}
bool waitForProcess(const pid_t processId) {
int status = 0;
const auto waitResult = waitpid(processId, &status, 0);
auto result = false;
if(waitResult == processId) {
result = checkStatus(status);
}
return(result);
}
int main() {
pid_t pid = 0;
const char* const path = "/bin/ls";
const char* argv[] = { "/bin/ls", "--help", nullptr };
const unsigned int timeoutInS = 5;
const auto startResult = startProcess(path, argv, timeoutInS, pid, nullptr);
if(startResult) {
const auto waitResult = waitForProcess(pid);
std::cout << "waitForProcess returned " << waitResult << "." << std::endl;
} else {
std::cout << "startProcess failed!" << std::endl;
}
}
Edit
The expected output should contain
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: 0.
Intermediate process: Killing timeout process.
In the case of error the output looks like this
Intermediate process: Waiting for child processes.
Intermediate process: Child process finished. Status: -1
Intermediate process: Killing worker process.
When you run the sample code you will most likely see the expected output. I cannot reproduce the incorrect result in a simple example.
I found the problem:
Within the mongoose (JSON-RPC uses mongoose) sources in the function mg_start I found the following code
#if !defined(_WIN32) && !defined(__SYMBIAN32__)
// Ignore SIGPIPE signal, so if browser cancels the request, it
// won't kill the whole process.
(void) signal(SIGPIPE, SIG_IGN);
// Also ignoring SIGCHLD to let the OS to reap zombies properly.
(void) signal(SIGCHLD, SIG_IGN);
#endif // !_WIN32
(void) signal(SIGCHLD, SIG_IGN);
causes that
if the parent does a wait(), this call will return only when all children have exited, and then returns -1 with errno set to ECHILD."
as mentioned here in the section 5.5 Voodoo: wait and SIGCHLD.
This is also described in the man page for WAIT(2)
ERRORS [...]
ECHILD [...] (This can happen for
one's own child if the action for SIGCHLD is set to SIG_IGN.
See also the Linux Notes section about threads.)
Stupid on my part not to check the return value correctly.
Before trying
if(exitedPid == workerPid) {
I should have checked that exitedPid is != -1.
If I do so errno gives me ECHILD. If I would have known that in the first place, I would have read the man page and probably found the problem faster...
Naughty of mongoose just to mess with signal handling no matter what an application wants to do about it. Additionally mongoose does not revert the altering of signal handling when being stopped with mg_stop.
Additional info:
The code that caused this problem was changed in mongoose in September 2013 with this commit.
In our application the similar issue we faced. in a intense situation of repeated child process forks(), the child process never returned. One can monitor the PID of the child process, and if it does not return beyond a particular application defined threshold, you can terminate that process by sending a kill/Term signal.

odd behavior, where cout makes the program work

I have two threads in C++. One thread called alarm thread runs the function raiseAlarm() and the other thread called print thread runs the function called printMetrics. At a fixed interval, raiseAlarm sets an atomic variable to true. When the variable is true, printMetrics thread, which is spinning on the value of this atomic variable, prints some data. When I run this application, nothing happens. But if I put a cout anywhere in raiseAlarm, everything works fine. Why?
void Client::raiseAlarm()
{
bool no = false;
while(!stop.load(std::memory_order_acquire))
{
//cout << "about to sleep\n";
this_thread::sleep_for(std::chrono::seconds(captureInterval));
while(!alarm.compare_exchange_weak(no, true, std::memory_order_acq_rel))
{
no = false;
}
}
}
void Client::printMetrics()
{
bool yes = true;
while(!stop.load(std::memory_order_acquire))
{
while(!alarm.compare_exchange_weak(yes, false, std::memory_order_acq_rel) )
{
yes = true;
}
cout << "Msgs Rcvd: " << metrics.rcv_total.load(std::memory_order_acquire);
cout << "Msgs Sent: " << metrics.snd_total.load(std::memory_order_acquire);
cout << "Min latency: " << metrics.min_latency.load(std::memory_order_acquire);
cout << "Max latency: " << metrics.max_latency.load(std::memory_order_acquire);
metrics.reset();
}
}
Just a suggestion because I'm not so savvy with concurrency in C++, but make sure you don't forget to flush your output stream. Either stick a cout << flush; after all of your cout lines or add an << endl to each one (which will automatically flush your stream).

setxkbmap returns 65280 when executed from system call

I am sending
std::string cmdStr = "setxkbmap us";
int res = system( cmdStr.c_str() );
and the result is
res: 65280
What can be the problem?
That value indicates that the child process exited normally with a value of 255.
This could happen if:
/bin/sh couldn't find setxkbmap. (note: I might be wrong on this one. On my PC, /bin/sh returns 127 in that case.)
setxkbmap couldn't open the X server at $DISPLAY, including if DISPLAY is unset
I'm sure that there are many other possibilities. Check stdout for error messages.
When interpreting the return value from system on Linux, do this:
#include <sys/wait.h>
int res = system(foo);
if(WIFEXITED(res)) {
std::cout << "Normal exit: " << WEXITSTATUS(res) << "\n";
} else {
if(WIFSIGNALED(res)) {
std::cout << "Killed by signal #" << WTERMSIG(status);
if(WCOREDUMP(res)) {
std::cout << " Core dumped";
}
std::cout << "\n";
} else {
std::cout << "Unknown failure\n";
}
}