Why does a call to write(3) terminate the thread? - c++

I have a thread that saves to a file. This works fine probably more than 99.9% of the time. Sometimes, though, the thread terminates unexpectedly in the same place every time, a call to write() (the BSD system call). The actual call to write() is very simple:
ssize_t wr = ::write(m_fd, buffer, (size_t)bytes);
At the time of the exit, the buffer is 4 bytes of zeros, bytes is 4, and the file descriptor is the same as all the previous writes made in this thread. I installed a clean-up function using pthread_cleanup_push() that is called from pthread_exit() so I can poke around in the debugger when this unexpected exit occurs.
The question is, why does write() do this at all instead of returning an error? Can I stop it from exiting and handle an error without causing the thread to terminate? The process continues seemingly unaffected, and there are a dozen other threads that seem to continue to function normally after this unexpected exit occurs. This is OS X 10.11.6.
I notice a call to tramp_cerror() that does not seem to be documented anywhere, in fact if you Google it nothing comes up anywhere at all. Am I the first person in the world to wonder what it does and why it seems empowered to terminate my thread?
This is the stack:
#0 0x01c1e388 in cleanup(void*) at ThreadAny.cpp:32
#1 0x0728900b in _pthread_exit ()
#2 0x07289adc in pthread_exit ()
#3 0x07286bc2 in _pthread_exit_if_canceled ()
#4 0x9189ed9d in cerror ()
#5 0x918a7415 in tramp_cerror ()
#6 0x13fda508 in 0x13fda508 ()
#7 0x024f6388 in long long Standard::File::write<unsigned int>(unsigned int const*, long long) at FileIO.h:54
#8 0x024e5fab in SaveFile(std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, bool, std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >*) at SaveFile.cpp:4947
#9 0x01c1eeaa in SaveFileThread(void*) at ThreadAny.cpp:103
#10 0x01afe81d in PThread::execute() at PThread.cpp:129
#11 0x01ac7c95 in PThread::run() at POSIXThread.cpp:91
#12 0x01ac7d8a in PThread::threadFunction(void*) at POSIXThread.cpp:131
#13 0x07287928 in _pthread_body ()
#14 0x0728789e in _pthread_start ()
#15 0x07284f2e in thread_start ()

Related

Shared library problems between C++11 and C++14

I have had to switch from C++11 to C++14 in my project to use the Catch testing framework, written for C++14 (it doesn't compile with anything less). Everything compiles fine. However, while running the program, the following function causes a series of errors which seem linked to a mismatch between C++11 and C++14 shared standard library files:
//it is called as such:
string p = "...";
handle_file(p);
//...
int handle_file (string p) {
if (is_valid(p)) {
return 0;
}
else return -1;
}
The error trace in gdb reads:
Program received signal SIGABRT, Aborted.
0x00007ffff7299ef5 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff7299ef5 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff7283862 in abort () from /usr/lib/libc.so.6
#2 0x00007ffff72dbf38 in __libc_message () from /usr/lib/libc.so.6
#3 0x00007ffff72e3bea in malloc_printerr () from /usr/lib/libc.so.6
#4 0x00007ffff72e5113 in _int_free () from /usr/lib/libc.so.6
#5 0x00007ffff72e8ca8 in free () from /usr/lib/libc.so.6
#6 0x00007ffff76b76ea in operator delete (ptr=<optimized out>) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
#7 0x00007ffff76b76fa in operator delete (ptr=<optimized out>) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/del_ops.cc:33
#8 0x000055555556c893 in __gnu_cxx::new_allocator<char>::deallocate (__t=<optimized out>, __p=<optimized out>, this=0x7fffffffd910)
at /usr/include/c++/10.2.0/ext/new_allocator.h:133
#9 std::allocator_traits<std::allocator<char> >::deallocate (__n=<optimized out>, __p=<optimized out>, __a=...)
at /usr/include/c++/10.2.0/bits/alloc_traits.h:492
#10 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_destroy (__size=<optimized out>, this=0x7fffffffd910)
at /usr/include/c++/10.2.0/bits/basic_string.h:237
#11 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_dispose (this=0x7fffffffd910)
at /usr/include/c++/10.2.0/bits/basic_string.h:232
#12 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string (this=0x7fffffffd910, __in_chrg=<optimized out>)
at /usr/include/c++/10.2.0/bits/basic_string.h:658
#13 Manager::handle_file (p ="./data/20pix", this=0x5555556c6d20)
at src/manager.hpp:149
Does anyone know how to solve this?
The error is inside malloc, after it has printed some error message (usually double-free or something similar) indicating heap corruption.
While it is possible that you have some kind of library mismatch, it is much more likely that you have a plain heap corruption bug in your own code (overflowing heap buffer, double delete, delete of non-allocated, etc. etc.), and that bug is simply getting exposed now, while it remained hidden before.
Your first step should be to run the program under Valgrind or with Address Sanitizer, and fix any errors these tools report.

gRPC C++: AddressSanitizer: bad-free

gRPC v1.30.0
I created a grpc service and tried to run it. The execution goes smooth till the last return statement at server side.
Status theService(ServerContext *context, const Request* req, Response* res)
{
Status status = actualLogic(req,res);
//execution goes fine till here
return status;
}
Status actualLogic(req,res)
{
Response_NestedMsg msg;
msg.set_something(...);
res->mutable_nestedmsg()->CopyFrom(msg);
return Status::OK
}
//server startup code
ServerBuilder builder;
builder.AddListeningPort((address),grpc::InsecureServerCredentials());
builder.RegisterService(&serviceClassObj);
std::unique_ptr<Server> server(builder.BuildAndStart());
server->Wait();
Running this code, I get following runtime error
==14394==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x61b00000fcc8 in thread T5 (grpcpp_sync_ser)
#0 0x7fe9d35602c0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0)
#1 0x55cb87299afd in __gnu_cxx::new_allocator<std::_List_node<grpc_impl::Server const*> >::deallocate(std::_List_node<grpc_impl::Server const*>*, unsigned long) (/home/john/Desktop/my_executable+0xd0afd)
#2 0x55cb87297ba1 in std::allocator_traits<std::allocator<std::_List_node<grpc_impl::Server const*> > >::deallocate(std::allocator<std::_List_node<grpc_impl::Server const*> >&, std::_List_node<grpc_impl::Server const*>*, unsigned long) (/home/john/Desktop/my_executable+0xceba1)
#3 0x55cb8729448d in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::_M_put_node(std::_List_node<grpc_impl::Server const*>*) (/home/john/Desktop/my_executable+0xcb48d)
#4 0x55cb8728bb5a in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::_M_clear() (/home/john/Desktop/my_executable+0xc2b5a)
#5 0x55cb87287307 in std::__cxx11::_List_base<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::~_List_base() (/home/john/Desktop/my_executable+0xbe307)
#6 0x55cb87278d29 in std::__cxx11::list<grpc_impl::Server const*, std::allocator<grpc_impl::Server const*> >::~list() (/home/john/Desktop/my_executable+0xafd29)
#7 0x55cb87278e2c in grpc_impl::CompletionQueue::~CompletionQueue() (/home/john/Desktop/my_executable+0xafe2c)
#8 0x7fe9d1826998 in grpc_impl::Server::SyncRequest::CallData::ContinueRunAfterInterception() (/usr/local/lib/libgrpc++.so.1+0x6f998)
#9 0x7fe9d18278ee in grpc_impl::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) (/usr/local/lib/libgrpc++.so.1+0x708ee)
#10 0x7fe9d182c4ca in grpc::ThreadManager::MainWorkLoop() (/usr/local/lib/libgrpc++.so.1+0x754ca)
#11 0x7fe9d182c68b in grpc::ThreadManager::WorkerThread::Run() (/usr/local/lib/libgrpc++.so.1+0x7568b)
#12 0x7fe9cf5a78d2 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::{lambda(void*)#1}::_FUN(void*) (/usr/local/lib/libgpr.so.10+0x118d2)
#13 0x7fe9d1eef6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#14 0x7fe9d0ba8a3e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x121a3e)
0x61b00000fcc8 is located 72 bytes inside of 1448-byte region [0x61b00000fc80,0x61b000010228)
allocated by thread T5 (grpcpp_sync_ser) here:
#0 0x7fe9d355f448 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0448)
#1 0x7fe9d18274f2 in grpc_impl::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) (/usr/local/lib/libgrpc++.so.1+0x704f2)
Thread T5 (grpcpp_sync_ser) created by T0 here:
#0 0x7fe9d34b6d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
#1 0x7fe9cf5a7a92 in grpc_core::Thread::Thread(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&) (/usr/local/lib/libgpr.so.10+0x11a92)
SUMMARY: AddressSanitizer: bad-free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0) in operator delete(void*)
==14394==ABORTING
None of my code tries to free any pointer and error seems to be coming from some auto generated file only. Please suggest if some more code/details needed.
I briefly checked the error message and code but it looks strange to me because both allocation and destruction were done by C++ new & delete. This is also consistent with your error message.
### Destruction (with operator delete)
#0 0x7fe9d35602c0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0)
### Allocation (with operator new)
#0 0x7fe9d355f448 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0448)
This might be caused by other issues like buggy ASAN or customized memory allocator.

GDB not giving details at segfault cygwin

So I am trying to debug a segfault in a C++ project for school. The project uses a makefile to compile. When I run the program under gdb (using cygwin) I get the segfault but when I try to use the "l" command to show the code I get different code than expected, and when I use "frame", gdb only prints the name of the function instead of the code that last executed.
Program received signal SIGSEGV, Segmentation fault.
0x000000010041570b in NodeCorrectness::eval(Alignment const&) ()
(gdb) l
1 /* advapi32.cc: Win32 replacement functions.
2
3 This file is part of Cygwin.
4
5 This software is a copyrighted work licensed under the terms of the
6 Cygwin license. Please consult the file "CYGWIN_LICENSE" for
7 details. */
8
9 #include "winsup.h"
10 #include <winioctl.h>
(gdb) frame
#0 0x000000010041570b in NodeCorrectness::eval(Alignment const&) ()
(gdb)
Using backtrace I get the following:
(gdb) bt
#0 0x000000000042781b in NodeCorrectness::eval(Alignment const&) ()
#1 0x0000000000421a2f in MeasureCombination::printMeasures(Alignment const&, std::ostream&) const ()
#2 0x00000000004fbedf in makeReport(Graph const&, Graph&, Alignment const&, MeasureCombination const&, Method*, std::basic_ofstream<char, std::char_traits<char> >&)
()
#3 0x00000000004fe6cd in saveReport(Graph const&, Graph&, Alignment const&, MeasureCombination const&, Method*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#4 0x00000000004d06a7 in NormalMode::run(ArgumentParser&) ()
#5 0x000000000040a655 in main ()
(gdb)
So gdb does not let me see where exactly the segfault occurred. Any reason for this?

glibc malloc deadlock during string reserve

From some time I am trying to catch a problem with my product hanging. This is the call stack I got after coredumping the process with gcore:
/lib64/libc.so.6
#1 0x00007f36fb339ba6 in _L_lock_12192 () from /lib64/libc.so.6
#2 0x00007f36fb337121 in malloc () from /lib64/libc.so.6
#3 0x00007f36fbbef0cd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#4 0x00007f36fbc4d7e9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /lib64/libstdc++.so.6
#5 0x00007f36fbc4e42b in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /lib64/libstdc++.so.6
#6 0x00007f36fbc4e4d4 in std::string::reserve(unsigned long) () from /lib64/libstdc++.so.6
#7 0x00007f36fbc2b3c6 in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int) () from /lib64/libstdc++.so.6
#8 0x00007f36fbc2fb36 in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, long) () from /lib64/libstdc++.so.6
#9 0x00007f36fbc26885 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) () from /lib64/libstdc++.so.6
#10 0x00007f36fe85c83a in ?? ()
#11 0x00007f36fb2ffdff in vfprintf () from /lib64/libc.so.6
#12 0x000000000055c110 in fgetc#plt ()
#13 0xffffffffffffffa8 in ?? ()
#14 0xffffffffffffffa8 in ?? ()
#15 0x0000000000bc42a0 in ?? ()
#16 0x000000000055b8b0 in setsockopt#plt ()
#17 0x000000000055bf30 in log4cxx::NDC::push ()
This is multithreaded code executed after forking a process in log4cxx ndc push seemingly. All the other threads sleep on condition variables and this is the only one deadlocked. What could possibly make malloc deadlock?
You shouldn't fork processes that use threading.
If one of the other threads has a lock acquired (let's say, the lock in malloc) during the fork, the lock is cloned in locked state, but the thread that locked it isn't running in the forked process. This means that the lock will never be released.

Segmentation fault in malloc_consolidate (malloc.c) that valgrind doesn't detect [duplicate]

This question already has answers here:
Segfaults in malloc() and malloc_consolidate()
(2 answers)
Closed 7 years ago.
My program goes in segmentation faults, and I cannot find the cause.
The worst part is, the function in question does not always lead to segfault.
GDB confirms the bug and yields this backtrace:
Program received signal SIGSEGV, Segmentation fault.
0xb7da6d6e in malloc_consolidate (av=<value optimized out>) at malloc.c:5169
5169 malloc.c: No such file or directory.
in malloc.c
(gdb) bt
#0 0xb7da6d6e in malloc_consolidate (av=<value optimized out>) at malloc.c:5169
#1 0xb7da9035 in _int_malloc (av=<value optimized out>, bytes=<value optimized out>) at malloc.c:4373
#2 0xb7dab4ac in __libc_malloc (bytes=525) at malloc.c:3660
#3 0xb7f8dc15 in operator new(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#4 0xb7f72db5 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#5 0xb7f740bf in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned int) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#6 0xb7f741f1 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::reserve(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#7 0xb7f6bfec in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#8 0xb7f70e1c in std::basic_streambuf<char, std::char_traits<char> >::xsputn(char const*, int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#9 0xb7f5b498 in std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<unsigned long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#10 0xb7f5b753 in std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::do_put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, unsigned long) const () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#11 0xb7f676ac in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<unsigned long>(unsigned long) ()
from /usr/lib/i386-linux-gnu/libstdc++.so.6
#12 0xb7f67833 in std::basic_ostream<char, std::char_traits<char> >::operator<<(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#13 0x08049c42 in sim::Address::GetS (this=0xbfffec40) at address.cc:27
#14 0x0806a499 in sim::UserGenerator::ProcessEvent (this=0x80a1af0, e=...) at user-generator.cc:59
#15 0x0806694b in sim::Simulator::CommunicateEvent (this=0x809f970, e=...) at simulator.cc:144
#16 0x0806685d in sim::Simulator::ProcessNextEvent (this=0x809f970) at simulator.cc:133
#17 0x08065d76 in sim::Simulator::Run (seed=0) at simulator.cc:53
#18 0x0807ce85 in main (argc=1, argv=0xbffff454) at main.cc:75
(gdb) f 13
#13 0x08049c42 in sim::Address::GetS (this=0xbfffec40) at address.cc:27
27 oss << m_address;
(gdb) p this->m_address
$1 = 1
Method GetS of class Address translates a number (uint32_t m_address) into a string and returns it. The code (very simple) is the following:
std::string
Address::GetS () const
{
std::ostringstream oss;
oss << m_address;
return oss.str ();
}
Besides, as can be seen in the backtrace, m_address is properly defined.
Now, I have tried to run my program using valgrind.
The program doesn't crash, likely due to the fact that valgrind replaces malloc () among other functions.
The error summary shows no memory leaking:
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 4,367 bytes in 196 blocks
still reachable: 9,160 bytes in 198 blocks
suppressed: 0 bytes in 0 blocks
All possibly lost refer to backtraces like this:
80 bytes in 5 blocks are possibly lost in loss record 3 of 26
at 0x4024B64: operator new(unsigned int) (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
by 0x40DBDB4: std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x40DE077: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x40DE1E5: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/i386-linux-gnu/libstdc++.so.6.0.16)
by 0x806AF62: sim::UserGenerator::CreateUser(unsigned int) (user-generator.cc:152)
I don't think this is related to the bug. However, the code in question can be found following this link.
I am thinking of a bug in libstdc++. However, how likely would that be?
I have also upgraded such library. Here's the versions currently installed on my system.
$ dpkg -l | grep libstdc
ii libstdc++5 1:3.3.6-23 The GNU Standard C++ Library v3
ii libstdc++6 4.6.1-1 GNU Standard C++ Library v3
ii libstdc++6-4.1-dev 4.1.2-27 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.3-dev 4.3.5-4 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.4-dev 4.4.6-6 GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.5-dev 4.5.3-3 The GNU Standard C++ Library v3 (development files)
ii libstdc++6-4.6-dev 4.6.1-1 GNU Standard C++ Library v3 (development files)
Now the thing is, I am not sure which version g++ uses, and whether there's some means to enforce the use of a particular version.
What I am pondering is to modify GetS. But this is the only method I know. Do you suggest any alternative?
Eventually, I am even considering to replace std::string with simpler char*.
Maybe a little drastic, but I wouldn't set it aside.
Any thought in merit?
Thank you all in advance.
Best,
Jir
Ok. This is NOT the problem:
I am thinking of a bug in libstdc++
The problem is that you overwrote some memory buffer and corrupted one of the structures used by the memory manager. The hard part is going to be finding it. Does not valgrind give you information about writting past the end of an allocated piece of memory.
Don't do this:
Eventually, I am even considering to replace std::string with simpler char*. Maybe a little drastic, but I wouldn't set it aside.
You already have enough problems with memory management. This will just add more problems. There is absolutely NOTHING wrong with std::string or the memory management routines. They are heavily tested and used. If there was something wrong people all over the world would start screaming (it would be big news).
Reading your code at http://mercurial.intuxication.org/hg/lte_sim/file/c2ef6e0b6d41/src/ it seems like you are still stuck in a C style of writting code (C with Classes). So you have the power of C++ to automate (the blowing up of your code) but still have all the problems associated with C.
You need to re-look at your code in terms of ownership. You pass things around by pointer way too much. As a result it is hard to follow the ownership of the pointer (and thus who is responsible for deleting it).
I think you best bet at finding the bug is to write unit tests for each class. Then run the unit tests through val-grind. I know its a pain (but you should have done it to start with now you have the pain all in one go).