It seems like boost::process::system is leaking fds:
Let's say I have this simple code to flush iptables config every 3 seconds (just an example):
#include <boost/process.hpp>
#include <thread>
int main(void)
{
while(true)
{
std::this_thread::sleep_for(std::chrono::seconds(3));
boost::process::system(boost::process::search_path("iptables"), "-F");
}
return 0;
}
If I observe the count of open file descriptors by listing /proc/PID/fd |wc -l, I can see that the count increases by one every 3 seconds. Eventually, when it reaches 1024, the program will abort, because the system call will throw an exception with what() stating that there are too many open files!
How can I avoid this fd leakage? I'm using boost 1.69.
EDIT:
Replacing boost::process::system with boost::process::child does not seem to help, the child seems to also leak fds, no matter if it gets detached or not.
EDIT 2:
Valgrind log with --track-fds=yes:
https://termbin.com/d6ud
The problem seems to be a bug in the specific version (1.69) of boost, and not in the posted code itself. So upgrading boost/patching the bug solves this problem.
The bug report can be found from here: https://github.com/boostorg/process/issues/62
Related
I am throwing an exception throw std::exception("dummy") (as a test) which is not being caught anywhere.
Without ProcDump attached this immediately crashes the process as it should.
When I attach ProcDump with -e to a debug build, ProcDump properly detects the unhandled exception, creates a crash dump, and exits.
But the program continues executing as if the exception has never been thrown.
I could manually crash the process after ProcDump exits but I really don't like the idea that code continues to run after a crash that is supposed to be fatal even if it is just for a few ms.
What causes this? How can I make sure that my program crashes (and the crash dump properly represents the point of the crash)? Is this an issue with ProcDump or with how I am using it?
Here is a minimal example to reproduce this:
#include <iostream>
int main() {
char c;
std::cin >> c;
if (c == 'e')
throw std::exception("dummy");
std::cout << "clean exit" << std::endl;
return 0;
}
I've tried it with m$ clang-cl and msvc. I've tried every single ProcDump switch even vaguely relevant to my issue in all possible combinations with multiple binaries.
I don't have a good answer, unfortunately. It looks that there is a bug in procdump. You may report it on the Sysinternals forum or contact Mark Russinovich (#markrussinovich) or Andrew Richards (#arichardmsft). I can confirm that it happens when you attach to the process, for example, procdump -e prog. It behaves as expected when you run the app under procdump (procdump.exe -e -x . prog.exe). Procdump runs as a debugger attached to a process, so it might 'swallow' exceptions. Of course, it should not, but the API allows it to do so.
As an alternative, before procdump gets fixed, you may consider using minidumper (I contributed to it in the past). It does not have as many command-line options as procdump, but the -e option works as expected, for example, MiniDumper.exe -ma -e2 12824.
Internally, minidumper has a very similar design to procdump and also implements a debugger engine. Here is the line handling the exception event:
https://github.com/goldshtn/minidumper/blob/master/MiniDumper/Debugger.cs#L106.
Try using the -k option on ProcDump.
I'm observing some strange behavior when I use a file_sink (in boost::iostreams) and then fork() a child process.
The child continues the same codebase, i.e., no exec() call, because this is done as part of daemonizing the process. My full code fully daemonizaes the process, of course, but I have omitted those steps that are unncessary for reporducing the behavior.
The following code is a simplified example that demonstrates the behavior:
using namespace std;
namespace io = boost::iostreams;
void daemonize(std::ostream& log);
int main (int argc, char** argv)
{
io::stream_buffer<io::file_sink> logbuf;
std::ostream filelog(&logbuf);
//std::ofstream filelog;
// Step 1: open log
if (argc > 1)
{
//filelog.open(argv[1]);
logbuf.open(io::file_sink(argv[1]));
daemonize(filelog);
}
else
daemonize(std::cerr);
return EXIT_SUCCESS;
}
void daemonize(std::ostream& log)
{
log << "Log opened." << endl;
// Step 2: fork - parent stops, child continues
log.flush();
pid_t pid = fork(); // error checking omitted
if (pid > 0)
{
log << "Parent exiting." << endl;
exit(EXIT_SUCCESS);
}
assert(0 == pid); // child continues
// Step 3: write to log
sleep(1); // give parent process time to exit
log << "Hello World!" << endl;
}
If I run this with no argument (e.g., ./a.out), so that it logs to stderr, then I get the expected output:
Log opened.
Parent exiting.
Hello World!
However, if I do something like ./a.out temp; sleep 2; cat temp then I get:
Log opened.
Hello World!
So the parent is somehow no longer writing to the file after the fork. That's puzzle #1.
Now supposed I just move io::stream_buffer<io::file_sink> logbuf; outside of main so that it's a global variable. Doing that and simply running ./a.out gives the same expected output as in the previous case, but writing to a file (e.g., temp) now gives a new puzzling behavior:
Log opened.
Parent exiting.
Log opened.
Hello World!
The line that writes "Log opened." is before the fork() so I don't see why that should appear twice in the output. (I even put an explicit flush() immediate before the fork() to make sure that line of output wasn't simply buffered, and then the buffer got copied during the fork() and later both copies eventually flushed to the stream...) So that's puzzle #2.
Of course, if I comment out the whole fork() process (the entire section labeled as "Step 2") then it behaves as expected for both file and stderr output, and regardless of whether logbuf is global or local to main().
Also, if I switch filelog to be an ofstream instead of stream_buffer<file_sink> (see commented out lines in main()) then it also behaves as expected for both file and stderr output, and regardless of whether filelog/logbuf are global or local to main().
So it really seems that it's an interaction between file_sink and fork() producing these strange behaviors... If anyone has ideas on what may be causing these, I'd appreciate the help!
I think I got it figured out... creating this answer for posterity / anyone who stumbles on this questions looking for an answer.
I observed this behavior in boost 1.40, but when I tried it using boost 1.46 everything behaved in the expected manner in all cases, i.e.:
Log opened.
Parent exiting.
Hello World!
So my assumption right now is that this was actually a bug in boost that was fixed sometime between version 1.41-1.46. I didn't see anything in the release notes that made it real obvious to me that they found & fixed the bug, but it's possible the release notes discussed fixing some underlying cause of this bug and I wasn't able to make the conneciton between that underlying cause and this scenario.
In any case, the solution seems to be to install boost version >= 1.46
There is another process continuously creating files that need processing by this code.
This code constantly scans the file-system for new files that need processing by comparing the contents of the file-system against a sqlite database that contains the processing results - one record for each file. This process is running at nice -n 19 so as not to interfere with the creation of new files by the other process.
It all works perfectly for a large number (>1k) of files, but then blows up with BUG: scheduling while atomic.
According to this
"Scheduling while atomic" indicates that you've tried to sleep
somewhere that you shouldn't
But the only sleep in the code is like this
void doFiles(void) {
for (...) { // for each file in the file-system
... // check database - do processing if needed
}
sleep(1);
}
int main(int argc, char *argv[], char *envp[]) {
while (true) doFiles();
return -1;
}
The code will hit this sleep after it has checked every file in the file-system against the database. The process needs to be repeated since new files will be added from time to time. There is no multi-threading in this code. Are there other possible causes for "BUG: scheduling while atomic" besides a misplaced sleep?
Edit: additional error output:
note: mirlin[1083] exited with preempt_count 1
BUG: scheduling while atomic: mirlin/1083/0x40000002
Modules linked in: g_cdc_ms musb_hdrc nop_usb_xceiv irqk edmak dm365mmap cmemk
Backtrace:
[<c002a5a0>] (dump_backtrace+0x0/0x110) from [<c028e56c>] (dump_stack+0x18/0x1c)
r6:c1099460 r5:c04ea000 r4:00000000 r3:20000013
[<c028e554>] (dump_stack+0x0/0x1c) from [<c00337b8>] (__schedule_bug+0x58/0x64)
[<c0033760>] (__schedule_bug+0x0/0x64) from [<c028e864>] (schedule+0x84/0x378)
r4:c10992c0 r3:00000000
[<c028e7e0>] (schedule+0x0/0x378) from [<c0033a80>] (__cond_resched+0x28/0x38)
[<c0033a58>] (__cond_resched+0x0/0x38) from [<c028ec6c>] (_cond_resched+0x34/0x44)
r4:00013000 r3:00000001
[<c028ec38>] (_cond_resched+0x0/0x44) from [<c0082f64>] (unmap_vmas+0x570/0x620)
[<c00829f4>] (unmap_vmas+0x0/0x620) from [<c0085c10>] (exit_mmap+0xc0/0x1ec)
[<c0085b50>] (exit_mmap+0x0/0x1ec) from [<c0037610>] (mmput+0x40/0xfc)
r9:00000001 r8:80000005 r6:c04ea000 r5:00000000 r4:c0427300
[<c00375d0>] (mmput+0x0/0xfc) from [<c003b5e4>] (exit_mm+0x150/0x158)
r5:c10992c0 r4:c0427300
[<c003b494>] (exit_mm+0x0/0x158) from [<c003cd44>] (do_exit+0x198/0x67c)
r7:c03120d1 r6:c10992c0 r5:0000000b r4:c10992c0
...
As others have said, you can sleep() anytime you want to in user code.
Looks like a problem with a driver on your platform. The driver may not actually call sleep() or schedule(), but often it will make a call of an kernel function which will, in turn, call one of these.
This also looks like it is using memory mapped file I/O on an embedded TI ARM processor.
This error was caused by a bad build.
A clean build by itself did not help.
A fresh checkout and build was required to resolve this issue.
I have valgrind 3.6.0, I've searched everywhere and found nothing.
The problem is that when I'm trying to access a float number while using valgrind, I get a segfault, but when I run the program as is, without valgrind, everythings goes as expected.
This is the piece of code:
class MyClass {
public:
void end() {
float f;
f = 1.23;
std::stringstream ss;
ss << f;
std::cout << ss.str();
}
};
extern "C" void clean_exit_on_sig(int sig) {
//Code logging the error
mc->end();
exit(1);
}
MyClass *mc;
int main(int argc, char *argv[]) {
signal(SIGINT , clean_exit_on_sig);
signal(SIGABRT , clean_exit_on_sig);
signal(SIGILL , clean_exit_on_sig);
signal(SIGFPE , clean_exit_on_sig);
signal(SIGSEGV, clean_exit_on_sig);
signal(SIGTERM , clean_exit_on_sig);
mc = new MyClass();
while(true) {
// Main program loop
}
}
When I press Control+C, the program catches the signal correctly and everything goes fine, but when I run the program using valgrind, when tries to execute this command ss << f; // (Inside MyClass) a segfault is thrown :-/
I've tried this too:
std::string stm = boost::lexical_cast<std::string>(f);
But I keep on receiving a segfault signal when boost acceses the float number too.
This is the backtrace when I get segfault with boost:
./a.out(_Z17clean_exit_on_sigi+0x1c)[0x420e72]
/lib64/libc.so.6(+0x32920)[0x593a920]
/usr/lib64/libstdc++.so.6(+0x7eb29)[0x51e6b29]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0xd3)[0x51e8f43]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_RSt8ios_basecd+0x19)[0x51e9269]
/usr/lib64/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x9f)[0x51fc87f]
./a.out(_ZN5boost6detail26lexical_stream_limited_srcIcSt15basic_streambufIcSt11char_traitsIcEES4_E9lcast_putIfEEbRKT_+0x8f)[0x42c251]
./a.out(_ZN5boost6detail26lexical_stream_limited_srcIcSt15basic_streambufIcSt11char_traitsIcEES4_ElsEf+0x24)[0x42a150]
./a.out(_ZN5boost6detail12lexical_castISsfLb0EcEET_NS_11call_traitsIT0_E10param_typeEPT2_m+0x75)[0x428349]
./a.out(_ZN5boost12lexical_castISsfEET_RKT0_+0x3c)[0x426fbb]
./a.out(This line of code corresponds to the line where boost tries to do the conversion)
and this is with the default stringstream conversion:
./a.out(_Z17clean_exit_on_sigi+0x1c)[0x41deaa]
/lib64/libc.so.6(+0x32920)[0x593a920]
/usr/lib64/libstdc++.so.6(+0x7eb29)[0x51e6b29]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE15_M_insert_floatIdEES3_S3_RSt8ios_baseccT_+0xd3)[0x51e8f43]
/usr/lib64/libstdc++.so.6(_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE6do_putES3_RSt8ios_basecd+0x19)[0x51e9269]
/usr/lib64/libstdc++.so.6(_ZNSo9_M_insertIdEERSoT_+0x9f)[0x51fc87f]
./a.out(This line of code corresponds to the line where I try to do the conversion)
a.out is my program, and I run valgrind this way: valgrind --tool=memcheck ./a.out
Another weird thing is that when I call mc->end(); while the program runs fine (Any signal received, Object just finished his work), I don't get segfault in any way (as is and with valgrind).
Please, don't tell me 'Don't close your program with Control+C blah blah...' this piece of code is for logging any error the program possibly have without losing data in case of segfault, killing it because of deadlock or something else.
EDIT: Maybe is a valgrind bug (I don't know, searched on google but found nothing, don't kill me), any workaround will be accepted too.
EDIT2: Just realized that boost calls ostream too (Here is clearer than using vim :-/), going to try sprintf float conversion.
EDIT3: Tried this sprintf(fl, "%.1g", f); but still crashes, backtrace:
./a.out(_Z17clean_exit_on_sigi+0x40)[0x41df24]
/lib64/libc.so.6(+0x32920)[0x593a920]
/lib64/libc.so.6(sprintf+0x56)[0x5956be6]
./a.out(Line where sprintf is)
Ok, after some hours of reading and research, I found the problem, I'm going to answer my own question because noone does, only a comment by #Kerrek SB [ https://stackoverflow.com/users/596781/kerrek-sb ] but I cannot accept a comment. (Thank you)
It's as easy as inside a signal handler you only can call a bunch of functions safely: http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
If you call some non-async-safe functions, they can work, but not always.
If you want to call non-async-safe functions inside a signal handler, you can do this:
Create 2 pipes. int pip1[2]; int pip2[2]; pipe(pip1); pipe(pip2);
Create a new thread and make the thread wait to receive some data from the 1rst pipe read(pip1[0], msg, 1);
When signal handler is called, use write async-safe function to write to the 1rst pipe write(pip1[1], "0", 1);
Then make the signal wait for the second pipe with read(pip2[0], msg, 1);
The thread will wake up and do all the job he has to do (saving data to database in this case), after that, make the thread write data to the second pipe write(pip2[1], "0", 1);
Now main thread will wake up and finish with _Exit(1) or something else.
Info:
I'm using 2 pipes because if I write to a pipe and just after that I read it, it's possible that the 2nd thread never wakes up because the main thread have read the data have just written. And I'm using a secondary pipe to block the main thread because I don't want it to exit while the 2nd thread is saving data.
Keep in mind that signal handler maybe has been called while modifying a shared resource, if your 2nd thread acceses that resource is possible that you encounter a second segfault, so be careful when accesing shared resources with your 2nd thread (Global variables or something else).
If you are testing with valgrind and don't want to receive 'false' memory leaks when receiving a signal you can do this before exiting pthread_join(2ndthread, NULL) and exit(1) instead of _Exit(1). These are non-async-safe functions, but at least you can test memory leaks and close you app with a signal without receiving 'false' memory leaks.
Hope this helps someone. Thanks again #Kerrek SB.
Debuggers and stuff sometimes toss signals to the process that you don't normally get. I had to alter a function that used recv to work under gdb for example. Check to see what your signal is and verify that mc is not null before trying to use it. See if that starts getting you closer to an answer.
I am thinking perhaps your use of new (or something else maybe) is possibly causing valgrind to send a signal that is being caught by your handler before mc is initialized.
It's also clear you didn't paste actual code because your use of 'class' without making the end() function public means this should not compile.
I'm using Debian x64 2.6.26 to host a server application we've written in C++. Sometimes GDB gets activated on its own and it uses 100% CPU time giving no room for other processes to run. The GDB version is 6.8-debian. I don't know why this happens and how may i prevent this. It seems that it only happens when out server application is running. I need to know how to stop this from happening or if there is something wrong in our application then how may i find it. Any help is much appreciated.
Thanks
I am inclined to believe that GDB is getting called by a signal handler in some code. Another suspect is some system monitoring daemon like 'monit'. When there is a rogue process eating too much memory or CPU, it might be trying to take a backtrace or dump using GDB. On way to troubleshoot is to use 'lsof' on the GDB process and see what files are opened by GDB and see if it gives you any clue. Using 'ps -ef -o cmd,pid,ppid | grep -i gdb', you can figure out how GDB was launched and if it gives you the PID of the attached process, you will know which process is being inspected.
A sledge hammer approach to stub such automatic execution is replacing 'GDB' with a stub 'GDB' which does nothing. Non existence of GDB might signal an error though. I have done such dirty tricks when I had no time to dig deeper into the problem. In the stub GDB, you can log all the command line arguments and the calling process name.
A sample stub in 'C':
#include <stdio.h>
int
main(int argc, char *argv[]) {
size_t sz;
FILE *fp = 0;
fp = fopen("/tmp/gdbstub.log", "a");
if (fp) {
fprintf(fp, "\n%s invoked:", argv[0]);
for (sz = 0; sz < argc - 1; sz++) {
fprintf(fp, "%s ", argv[sz]);
}
fclose(fp);
}
return 0;
}