Tools for tracing a program abortion - c++

I have a program in C++ on an Ubuntu machine, that contains several threads, every thread is responsible for big amount of functions and sub functions.
The program runs, but every ~30 minutes the code stops running, I'm trying to understand why. So far I tried to:
Put try-catch all over the code: main and every thread - the program stops running without catching:
try
{
//code
}
catch(const std::exception & e)
{
}
catch(...)
{
}
2.Using strace: When the code stops running, the last lines of the output file are:
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, NULL) = 0
nanosleep({0, 10000}, <ptrace(SYSCALL):No such process>
+++ killed by SIGABRT +++
I cannot understand what causes the abortion of the program by killed by SIGABRT message or <ptrace(SYSCALL):No such process>
Using gdb: I put
(gdb) catch throw
(gdb) run
the code starts to run but it seems that the gdb stops running:
Starting program: *****
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff2d6a700 (LWP 13305)]
[Thread 0x7ffff2d6a700 (LWP 13305) exited]
[Inferior 1 (process 13304) exited normally]
(gdb)
If I'm doing something wrong here, I'll be happy to know what is wrong and if not, are there some other ways\tools to trace the problem?
I'm starting to think maybe it something external to the program that causes this issue (?).
Thanks.

Put breakpoints on everything that exits
b exit
b _exit
b __exit
b exit_group
And maybe also kill variants, if you don’t use them elsewhere
b kill

Related

Segmentation fault incrementing a map

I am trying to trace/fix a segmentation fault in my program. My program works fine when perform() has only one iteration of "protos", but not with two. On the second, I get a segmentation fault after the first iteration. I am pretty sure that the way I am dealing with iterating and deleting elements in my map in write_blacklist() is correct, but it still reports that it is the error. I thought that it may be because the map is empty, but I did checks to avoid that and it still throws a segmentation fault.
For write_blacklist(), all it should just safely do its iterations and delete the map elements that meet the conditions.
(gdb) run
Starting program: /root/BruteBlock/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
openssh
vsftpd
115.239.198.235 already in blacklist, skipping...
121.14.7.244 already in blacklist, skipping...
42.7.26.88 already in blacklist, skipping...
143.137.151.22 already in blacklist, skipping...
58.87.67.58 already in blacklist, skipping...
60.173.82.156 already in blacklist, skipping...
[New Thread 0x7ffff2d34700 (LWP 2087)]
[Thread 0x7ffff2d34700 (LWP 2087) exited]
[New Thread 0x7ffff2d34700 (LWP 2088)]
[Thread 0x7ffff2d34700 (LWP 2088) exited]
[New Thread 0x7ffff2d34700 (LWP 2089)]
[Thread 0x7ffff2d34700 (LWP 2089) exited]
Detaching after fork from child process 2090.
115.239.198.235 already in iptables, skipping...
121.14.7.244 already in iptables, skipping...
42.7.26.88 already in iptables, skipping...
143.137.151.22 already in iptables, skipping...
58.87.67.58 already in iptables, skipping...
60.173.82.156 already in iptables, skipping...
[New Thread 0x7ffff2d34700 (LWP 2091)]
[Thread 0x7ffff2d34700 (LWP 2091) exited]
Program received signal SIGSEGV, Segmentation fault.
0x000000000040c1e6 in std::__detail::_Hash_node<std::pair<std::string const, int>, true>::_M_next (this=0x0)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/hashtable_policy.h:285
285 { return static_cast<_Hash_node*>(this->_M_nxt); }
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-21.el7.x86_64 glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libcurl-7.29.0-42.el7_4.1.x86_64 libgcc-4.8.5-16.el7_4.1.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-11.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-16.el7_4.1.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 nss-3.28.4-15.el7_4.x86_64 nss-pem-1.0.3-4.el7.x86_64 nss-softokn-3.28.3-8.el7_4.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 nss-sysinit-3.28.4-15.el7_4.x86_64 nss-util-3.28.4-3.el7.x86_64 openldap-2.4.44-5.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x000000000040c1e6 in std::__detail::_Hash_node<std::pair<std::string const, int>, true>::_M_next (this=0x0)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/hashtable_policy.h:285
#1 0x000000000040a829 in std::__detail::_Node_iterator_base<std::pair<std::string const, int>, true>::_M_incr (
this=0x7fffffffde20) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/hashtable_policy.h:314
#2 0x0000000000409612 in std::__detail::_Node_iterator<std::pair<std::string const, int>, false, true>::operator++ (
this=0x7fffffffde20) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/hashtable_policy.h:369
Python Exception <class 'gdb.error'> There is no member or method named _M_bbegin.:
#3 0x000000000040597e in BruteBlock::write_blacklist (this=0x7fffffffe130, ips=std::unordered_map with 0 elements,
output_file="/etc/blacklist.lst") at BruteBlock.cpp:68
#4 0x0000000000406552 in BruteBlock::perform (this=0x7fffffffe130) at BruteBlock.cpp:188
#5 0x0000000000404e8c in main () at main.cpp:18
main.cpp:
while (true) {
18: b.perform();
sleep(b.get_interval());
}
BruteBlock.cpp::perform():
void BruteBlock::perform() {
// Hopefully this will become more elegant!
for (auto i : protos) {
std::unordered_map<std::string, int> r(retr_fails(i.logfile, i.expr));
if (r.empty()) {
} else {
write_blacklist(r, blacklist_);
188: block(blacklist_);
}
}
}
BruteBlock.cpp::write_blacklist():
void BruteBlock::write_blacklist(std::unordered_map<std::string, int> &ips, const std::string &output_file) {
std::ifstream is(output_file.c_str());
if (!is) throw std::runtime_error("Error opening blacklist");
if (ips.empty()) return;
// ignore duplicates
std::string buf;
while (std::getline(is, buf)) {
if (ips.find(buf) != ips.end()) {
ips.erase(buf);
std::cout << buf << " already in blacklist, skipping..." << '\n';
}
}
// delete the IPs that don't meet the criteria
auto a = ips.begin();
while (a != ips.end()) {
if (a->second < max_attempts_) {
a = ips.erase(a);
} else {
if (a->second > max_attempts_) {
if (check_reports(a->first) < max_reports_) {
a = ips.erase(a);
}
}
68: ++a;
}
}
// write the remaining IPs to the blacklist
std::ofstream os(output_file.c_str(), std::ios_base::app);
if (!os) throw std::invalid_argument("Error opening blacklist file");
for (auto f : ips) {
if ((f.second > max_attempts_) && (check_reports(f.first) > max_reports_)) {
os << f.first << '\n';
std::cout << f.first << " had " << f.second << " failed attempts and " << check_reports(f.first)
<< " abuse reports, adding to blacklist...\n";
}
}
}
In your last loop, j3 lines above the line you've labeled with 68:, you have a = ips.erase(a);. If you're erasing the last node in the map, a will point to ips.end() after that erase. When you attempt to increment a on line 68 you get the segmentation fault since you can't increment an iterator the end iterator.
The solution would be to not increment a if you're erasing it.

Mysterious crash in cppwinrt example

I am using Visual Studio 17 v15.0 and Win 10 Anniversary Update SDK.
I build the following code (basically sample in github repo) with cl /EHsc /O2 /DUNICODE /bigobj /await /std:c++latest, with /MT or MD. It compiles without error.
If I run when `"message.png" is not present in current directory, exception will be thrown, caught and reported with printf, then exit without crashing.
If I run when `"message.png" is present in current directory, "Hello World!" will be printed, then crash for no reason.
Weird thing is If I run it inside GDB debugger, GDB always say the program exits normally (and indeed no crash happen).
GDB output:
[New Thread 1364.0x2324]
[New Thread 1364.0x624]
[New Thread 1364.0x12cc]
[New Thread 1364.0x58c]
[New Thread 1364.0x1134]
[New Thread 1364.0x10d8]
[New Thread 1364.0x18a8]
[New Thread 1364.0x1794]
[New Thread 1364.0x20e8]
[New Thread 1364.0x2204]
[New Thread 1364.0x1030]
[New Thread 1364.0x1474]
Hello world!
[Thread 1364.0x10d8 exited with code 0]
[Thread 1364.0x624 exited with code 0]
[Thread 1364.0x20e8 exited with code 0]
[Thread 1364.0x1794 exited with code 0]
[Thread 1364.0x18a8 exited with code 0]
[Thread 1364.0x58c exited with code 0]
[Thread 1364.0x1134 exited with code 0]
[Thread 1364.0x12cc exited with code 0]
[Thread 1364.0x8d0 exited with code 0]
[Thread 1364.0x2324 exited with code 0]
[Thread 1364.0x1b38 exited with code 0]
[Thread 1364.0x2204 exited with code 0]
[Thread 1364.0x1030 exited with code 0]
[Thread 1364.0x1474 exited with code 0]
[Inferior 1 (process 1364) exited normally]
Code:
#pragma comment(lib, "windowsapp")
#pragma comment(lib, "pathcch")
#include <winrt/Windows.Storage.Streams.h>
#include <winrt/Windows.Graphics.Imaging.h>
#include <winrt/Windows.Media.Ocr.h>
#include <winrt/Windows.Networking.Sockets.h>
#include <pathcch.h>
using namespace winrt;
using namespace std::chrono;
using namespace Windows::Foundation;
using namespace Windows::Storage;
using namespace Windows::Storage::Streams;
using namespace Windows::Graphics::Imaging;
using namespace Windows::Media::Ocr;
hstring MessagePath()
{
wchar_t buffer[1024]{};
GetCurrentDirectory(_countof(buffer), buffer);
check_hresult(PathCchAppendEx(buffer, _countof(buffer), L"message.png", PATHCCH_ALLOW_LONG_PATHS));
return buffer;
}
IAsyncOperation<hstring> AsyncSample()
{
StorageFile file = co_await StorageFile::GetFileFromPathAsync(MessagePath());
IRandomAccessStream stream = co_await file.OpenAsync(FileAccessMode::Read);
BitmapDecoder decoder = co_await BitmapDecoder::CreateAsync(stream);
SoftwareBitmap bitmap = co_await decoder.GetSoftwareBitmapAsync();
OcrEngine engine = OcrEngine::TryCreateFromUserProfileLanguages();
OcrResult result = co_await engine.RecognizeAsync(bitmap);
return result.Text();
}
int main()
{
init_apartment();
try
{
printf("%ls\n", AsyncSample().get().c_str());
}
catch (hresult_error const & e)
{
printf("hresult_error: (0x%8X) %ls\n", e.code(), e.message().c_str());
}
return 0;
}
Turns out hstring returned by AsyncSample().get() is not null terminated, so printf crashes.
try
{
auto ans = AsyncSample().get();
printf("[%u]: ", ans.size());
auto s = ans.c_str();
for (uint32_t i = 0; i < ans.size(); i++) {
printf("%lc", s[i]);
}
putchar('\n');
}

How does gdb retrieve the exit code of target program?

Under command line, I know that using echo $? gets me the exit code. In gdb, I use "r" to run through the program and the program terminates, so how does gdb gets this exit code? Any commands inside gdb?
Thanks!
When a program exits, gdb sets the convenience variable $_exitcode to the exit code.
So given:
int main() {
return 23;
}
Running it in gdb, I get:
(gdb) run
Starting program: /tmp/q
[Inferior 1 (process 3677) exited with code 027]
(gdb) print $_exitcode
$1 = 23
It just prints exit code at the end of debug session when the program terminates. Or prints exited normally for 0 exit code. See test debug session for this test program:
#include <stdlib.h>
int main(int argc, char *argv[]) {
return atoi(argv[1]);
}
Debug session:
[ksemenov#NB824RIH ~]$ gdb -q ./a.out
Reading symbols from ./a.out...(no debugging symbols found)...done.
(gdb) r 0
Starting program: /home/ksemenov/a.out 0
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.23.1-10.fc24.x86_64
[Inferior 1 (process 19162) exited normally]
(gdb) r 1
Starting program: /home/ksemenov/a.out 1
[Inferior 1 (process 19166) exited with code 01]
(gdb) r 6
Starting program: /home/ksemenov/a.out 6
[Inferior 1 (process 19167) exited with code 06]
(gdb)

Print or examine semaphore count value in GDB

I am trying to implement a thread pool using ACE Semaphore library. It does not provide any API like sem_getvalue which is in Posix semaphore. I need to debug some flow which is not behaving as expected. Can I examine the semaphore in GDB. I am using Centos as OS.
I initialized two semaphores using the default constructor providing count 0 and 10. I have declared them as static in the class and initialized it in the cpp file as
DP_Semaphore ThreadPool::availableThreads(10);
DP_Semaphore ThreadPool::availableWork(0);
But when I am printing the semaphore in GDB using the print command, I am getting the similar output
(gdb) p this->availableWork
$7 = {
sema = {
semaphore_ = {
sema_ = 0x6fe5a0,
name_ = 0x0
},
removed_ = false
}
}
(gdb) p this->availableThreads
$8 = {
sema = {
semaphore_ = {
sema_ = 0x6fe570,
name_ = 0x0
},
removed_ = false
}
}
Is there a tool which can help me here, or shall I switch to Posix thread and re-write all my code.
EDIT: As requested by #timrau the output of call this->availableWork->dump()
(gdb) p this->availableWork.dump()
[Switching to Thread 0x2aaaae97e940 (LWP 28609)]
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(DP_Semaphore::dump()) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) call this->availableWork.dump()
[Switching to Thread 0x2aaaaf37f940 (LWP 28612)]
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(DP_Semaphore::dump()) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) info threads
[New Thread 0x2aaaafd80940 (LWP 28613)]
6 Thread 0x2aaaafd80940 (LWP 28613) 0x00002aaaac10a61e in __lll_lock_wait_private ()
from /lib64/libpthread.so.0
* 5 Thread 0x2aaaaf37f940 (LWP 28612) ThreadPool::fetchWork (this=0x78fef0, worker=0x2aaaaf37f038)
at ../../CallManager/src/DP_CallControlTask.cpp:1043
4 Thread 0x2aaaae97e940 (LWP 28609) DP_Semaphore::dump (this=0x6e1460) at ../../Common/src/DP_Semaphore.cpp:21
2 Thread 0x2aaaad57c940 (LWP 28607) 0x00002aaaabe01ff3 in __find_specmb () from /lib64/libc.so.6
1 Thread 0x2aaaacb7b070 (LWP 28604) 0x00002aaaac1027c0 in __nptl_create_event () from /lib64/libpthread.so.0
(gdb)
sema.semaphore_.sema_ in your code looks like a pointer. Try to find it's type in the ACE headers, then convert it to a type and print:
(gdb) p *((sem_t)0x6fe570)
Update: try to convert the address within the structure you posted to sem_t. If you use linux, ACE should be using posix semaphores, so type sem_t must be visible to gdb.

gdb 7.0, signal SIGCONT doesn't break from a pause() call

I'd built a version of gdb 7.0 for myself after being pointed to a new feature, and happened to have that in my path still.
Attempting to step through some new code, I'd added a pause() call, expecting to be able to get out like so:
(gdb) b 5048
Breakpoint 1 at 0x2b1811b25052: file testca.C, line 5048.
(gdb) signal SIGCONT
Continuing with signal SIGCONT.
Breakpoint 1, FLUSH_SUDF_TEST (h=#0x2b1811b061c0) at testca.C:5048
5048 rc = h.SAL_testcaFlushPagesByUDF( uPrimary - 1, uPrimary ) ;
(that was with the system gdb, version 6.6).
With gdb 7.0 I never hit the post-pause() breakpoint when I try this. With the various multi process debugging changes in gdb 7, does anybody know if signal handling has to be handled differently and how?
The pause() function does not return unless a signal handler is called (see the specification and the man page).
To make it return after your program receives SIGCONT, you must install an handler for SIGCONT. Try and see using the following example:
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
volatile int caught_signal = 0;
void handler(int sig)
{
caught_signal = sig;
}
int main()
{
signal(SIGCONT, handler);
pause();
printf("Caught signal: %d, %s\n",
caught_signal, strsignal(caught_signal));
return 0;
}
The behavior is correct with gdb 7.0: pause() completely ignores ignored signals (like SIGCHLD, returns on caught signals (SIGCONT), and no signal is delivered when the continue command is issued.
(gdb) break 17
Breakpoint 1 at 0x80484b3: file pause.c, line 17.
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x0012d422 in __kernel_vsyscall ()
(gdb) signal SIGCHLD
Continuing with signal SIGCHLD.
^C
Program received signal SIGINT, Interrupt.
0x0012d422 in __kernel_vsyscall ()
(gdb) signal SIGCONT
Continuing with signal SIGCONT.
Breakpoint 1, main () at pause.c:17
17 printf("Caught signal: %d, %s\n",
(gdb)