Class LocalT have member of other class that realized read-write-mutex. Mutex initialized at constructor and use pthread_rwlock_rdlock(&aMutex); for reading lock. So, seems, its all ok with mutex class. But program crashed when some LocalT object lock his mutex member for reading.
CSerialize.cpp:2054 line is MUTEX.lock_reading();
Thread 6 (Thread 0x80d4e00 (runnable)):
#0 0x4864f11d in pthread_mutex_lock () from /lib/libpthread.so.2
#1 0x4864b558 in pthread_rwlock_init () from /lib/libpthread.so.2
#2 0x4864b659 in pthread_rwlock_rdlock () from /lib/libpthread.so.2
#3 0x0807ae14 in LocalT::serialize (this=0x80d4e00, outbin=#0x7574736b)
at CSerialize.cpp:2054
Other two running threads:
1) at socket accept();
2) next runnable thread at popen() call, seems its execute or read from pipe. But does not know what is __error() ?????
Thread 1 (Thread 0x8614800 (LWP 100343)):
#0 0x4865b8f9 in __error () from /lib/libpthread.so.2
#1 0x4865a15a in pthread_testcancel () from /lib/libpthread.so.2
#2 0x486425bf in read () from /lib/libpthread.so.2
#3 0x08056340 in UT::execute_popen (command=#0x4865e6bc,
ptr_output=0xbf2f7d30) at Utils.cpp:75
3) all other thread sleeping.
I have no ideas why its crashed? Maybe somebody can assume something or suggest?
==EDIT==
and here is one system(?) thread ( i does not create it, but program always have +1 thread). It always:
Thread 8 (Thread 0x80d4a00 (LWP 100051)):
#0 0x4865a79b in pthread_testcancel () from /lib/libpthread.so.2
#1 0x48652412 in pthread_mutexattr_init () from /lib/libpthread.so.2
#2 0x489fd450 in ?? ()
==EDIT2 - bt as requested==
(gdb) bt
#0 0x4865a7db in pthread_testcancel () from /lib/libpthread.so.2
#1 0x48652412 in pthread_mutexattr_init () from /lib/libpthread.so.2
#2 0x489fd450 in ?? ()
strangely... why ?? () ?
==EDIT3 - when loading core==
Program terminated with signal 11, Segmentation fault.
[skiped]
#0 0x4865a7db in pthread_testcancel () from /lib/libpthread.so.2
[New Thread 0x8614800 (LWP 100343)]
[New Thread 0x8614600 (sleeping)]
[New Thread 0x8614400 (sleeping)]
[New Thread 0x8614200 (sleeping)]
[New Thread 0x8614000 (sleeping)]
[New Thread 0x80d4e00 (runnable)]
[New Thread 0x80d4c00 (sleeping)]
[New Thread 0x80d4a00 (LWP 100051)]
[New Thread 0x80d4000 (runnable)]
[New LWP 100802]
(gdb) info thread
* 10 LWP 100802 0x4865a7db in pthread_testcancel () from /lib/libpthread.so.2
9 Thread 0x80d4000 (runnable) 0x486d7bd3 in accept () from /lib/libc.so.6 -- MAIN() THREAD
8 Thread 0x80d4a00 (LWP 100051) 0x4865a79b in pthread_testcancel ()
from /lib/libpthread.so.2 ( UNIDENTIFIED THREAD system()? )
7 Thread 0x80d4c00 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2 (SIGNAL PROCESSOR THREAD)
6 Thread 0x80d4e00 (runnable) 0x4864f11d in pthread_mutex_lock ()
from /lib/libpthread.so.2 (MAINTENANCE THREAD)
5 Thread 0x8614000 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2 (other mutex cond_wait - worker 1)
4 Thread 0x8614200 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2 (other mutex cond_wait - worker 2 )
3 Thread 0x8614400 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2 (other mutex cond_wait - worker 3 )
2 Thread 0x8614600 (sleeping) 0x48651cb6 in pthread_mutexattr_init ()
from /lib/libpthread.so.2 (other mutex cond_wait - worker 4)
1 Thread 0x8614800 (LWP 100343) 0x4865b8f9 in __error ()
from /lib/libpthread.so.2 ( popen() thread see below)
I created: 1 maintenance thread (serializing), 1 popen() thread, 4 workers, 1 main, 1 signal thread = 8 threads....
the thread that you are referring to as system thread is actually your program's main thread.
Secondly with information shared by you so far, it looks like you are acquiring the mutex but never releasing it. that leads to an unstable state (some parameters having wrong values) which leads to a crash. I am sure you will also be observing an intermittent hang.
could you share the backtrace when it crashes ?
Related
My binary (generated from C++) has two threads. When I care about exceptions, I care about exceptions thrown in one of them (the worker) but not the other.
Is there a way to tell gdb only to pay attention to one of the threads when using catch throw? The gdb manual (texinfo document) and googling suggest to me that this isn't possible, although I think I could request catching a specific type of exception, which hopefully only one of the threads would throw, using catch throw REGEXP.
Is there a way to tell gdb only to pay attention to one of the threads
The catch throw is really just a fancy way to set a breakpoint on __cxxabiv1::__cxa_throw (or similar), and you can make a breakpoint conditional on thread number, achieving the equivalent result.
Example:
#include <pthread.h>
#include <unistd.h>
void *fn(void *)
{
while(true) {
try {
throw 1;
} catch (...) {}
sleep(1);
}
return nullptr;
}
int main() {
pthread_t tid;
pthread_create(&tid, nullptr, fn, nullptr);
fn(nullptr);
return 0;
}
g++ -g -pthread t.cc
Using catch throw, you would get a breakpoint firing on the main and the second thread. But using break ... thread 2 you would only get the one breakpoint you care about:
gdb -q a.out
Reading symbols from a.out...
(gdb) start
Temporary breakpoint 1 at 0x11d6: file t.cc, line 17.
Starting program: /tmp/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at t.cc:17
17 pthread_create(&tid, nullptr, fn, nullptr);
(gdb) n
[New Thread 0x7ffff7a4c640 (LWP 1225199)]
## Note: GDB refuses to set thread-specific breakpoint until the thread actually exists.
18 fn(nullptr);
(gdb) break __cxxabiv1::__cxa_throw thread 2
Breakpoint 2 at 0x7ffff7e40370: file ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc, line 77.
(gdb) c
Continuing.
[Switching to Thread 0x7ffff7a4c640 (LWP 1225199)]
Thread 2 "a.out" hit Breakpoint 2, __cxxabiv1::__cxa_throw (obj=0x7ffff0000be0, tinfo=0x555555557dc8 <typeinfo for int#CXXABI_1.3>, dest=0x0) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:77
77 ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
(gdb) bt
#0 __cxxabiv1::__cxa_throw (obj=0x7ffff0000be0, tinfo=0x555555557dc8 <typeinfo for int#CXXABI_1.3>, dest=0x0) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:77
#1 0x00005555555551b5 in fn () at t.cc:8
#2 0x00007ffff7d7eeae in start_thread (arg=0x7ffff7a4c640) at pthread_create.c:463
#3 0x00007ffff7caea5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) c
Continuing.
Thread 2 "a.out" hit Breakpoint 2, __cxxabiv1::__cxa_throw (obj=0x7ffff0000be0, tinfo=0x555555557dc8 <typeinfo for int#CXXABI_1.3>, dest=0x0) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:77
77 in ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc
(gdb) bt
#0 __cxxabiv1::__cxa_throw (obj=0x7ffff0000be0, tinfo=0x555555557dc8 <typeinfo for int#CXXABI_1.3>, dest=0x0) at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:77
#1 0x00005555555551b5 in fn () at t.cc:8
#2 0x00007ffff7d7eeae in start_thread (arg=0x7ffff7a4c640) at pthread_create.c:463
#3 0x00007ffff7caea5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
VoilĂ -- thread-specific catch throw equivalent.
I am wrote on C++ multithread TCP server, for synchronization using boost:scoped_lock
After connecting to server client freezes.
in gdb i saw more threads in pthread_kill after call boost::mutex::lock
(gdb) info thread
277 Thread 808779c00 (LWP 245289330/xgps) 0x0000000802579d5c in poll () at poll.S:3
276 Thread 808779800 (LWP 245289329/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
275 Thread 808779400 (LWP 245289328/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
.....
246 Thread 808c92800 (LWP 245289296/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
245 Thread 808643800 (LWP 245289295/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
244 Thread 808643400 (LWP 245289294/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
243 Thread 806c8f400 (LWP 245289292/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
242 Thread 808643000 (LWP 245286262/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
241 Thread 808c92400 (LWP 245289288/xgps) 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
[Switching to thread 205 (Thread 80863a000 (LWP 245289251/xgps))]#0 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
(gdb) where
#0 0x00000008019799bc in pthread_kill () from /lib/libthr.so.3
#1 0x0000000801973cfc in pthread_getschedparam () from /lib/libthr.so.3
#2 0x00000008019782fc in pthread_mutex_getprioceiling () from /lib/libthr.so.3
#3 0x000000080197838b in pthread_mutex_lock () from /lib/libthr.so.3
#4 0x0000000000442b2e in boost::mutex::lock (this=0x803835f10) at mutex.hpp:62
#5 0x0000000000442c36 in boost::unique_lock<boost::mutex>::lock (this=0x7fffe7334270) at lock_types.hpp:346
#6 0x0000000000442c7c in unique_lock (this=0x7fffe7334270, m_=#0x803835f10) at lock_types.hpp:124
#7 0x0000000000466e31 in XDevice::getDeviceIMEI (this=0x803835e20) at /home/xgps_app/device.cpp:639
#8 0x000000000049071f in XDevicePool::get_device (this=0x7fffffffd9c0, device_imei=868683024674230) at /home/xgps_app/pool_devices.cpp:351
Code at line device.cpp:639
IMEI
XDevice::getDeviceIMEI()
{
try {
boost::mutex::scoped_lock lock(cn_mutex);
return device_imei;
}
catch (std::exception &e )
{
cout << " ERROR in getDeviceIMEI " << e.what() << "\n";
}
return 0;
}
Code in pool_device
XDevicePtr
XDevicePool::get_device(IMEI device_imei)
{
XDevicePtr device;
unsigned int i = 0;
while(i < this->devices.size())
{
device = devices[i];
if (device->getDeviceIMEI() == device_imei) {
LOG4CPLUS_DEBUG(logger, "XDevicePool::get_device found!");
return device;
}
i++;
}
device.reset();
return device;
}
XDevicePtr
XDevicePool::get_device_mt(IMEI device_imei)
{
try
{
boost::mutex::scoped_lock lock(pool_mutex);
}
catch (std::exception & e)
{
LOG4CPLUS_ERROR(logger, "XDevicePool::get_device error! " << e.what());
}
// boost::mutex::scoped_lock lock(pool_mutex);
return get_device(device_imei);
}
Why after call to mutex lock thread terminating?
I think dead lock not reason for that behavior
Please help!
You have multiple locks.
Whenever you have multiple locks that can be required simultaneously you need to obtain them in a fixed order, to avoid dead-locking.
It seems likely that you have such a deadlock occurring. See Boost Thread's free function boost::lock http://www.boost.org/doc/libs/1_63_0/doc/html/thread/synchronization.html#thread.synchronization.lock_functions.lock_multiple for help acquiring multiple lock in reliable order.
You will also want to know about std::defer_lock.
Other than this, there might be interference from fork in multi-threaded programs. I think it's beyond the scope now to explain, unless you are indeed using fork in your process
tl;dr pthread_kill is likely a red herring.
Why after call to mutex lock thread terminating?
It doesn't. Your threads have not been terminated (as evidenced by them still appearing on info thread).
You seem to assume that pthread_kill kills the current thread. In fact, what pthread_kill does is send a signal to another thread. And even the sending is optional (if sig=0).
See the man page for further details.
I got a core dump on embeded linux platform when playing sound.
I used QT for GUI, the callstack is:
Program terminated with signal 11, Segmentation fault.
#0 0x411740e4 in dlopen () from /lib/libdl.so.0
(gdb) bt
#0 0x411740e4 in dlopen () from /lib/libdl.so.0
#1 0x40194a04 in snd_dlopen () from /nand/usr/lib/libasound.so.2
#2 0x40190cc8 in ?? () from /nand/usr/lib/libasound.so.2
(gdb) thread apply all bt
Thread 7 (process 511):
#0 0x405e15c8 in ?? () from /nand/usr/lib/libQtGui.so.4
Cannot access memory at address 0x2
Thread 6 (process 538):
#0 0x410dda1c in read () from /lib/libc.so.0
#1 0x40fdbe14 in read () from /lib/libpthread.so.0
#2 0x4014b924 in SysUtil_GetALSEventValue () at ALSPSensor.cpp:46
#3 0x00040af0 in ALSensor::operateBackLight (this=<value optimized out>) at src/PALSSensorHandle.cpp:97
#4 0x00040c04 in ALSensor::run (this=0x12a218) at src/PALSSensorHandle.cpp:76
#5 0x40d4d4f8 in ?? () from /nand/usr/lib/libQtCore.so.4
Thread 5 (process 537):
#0 0x410dda1c in read () from /lib/libc.so.0
#1 0x40fdbe14 in read () from /lib/libpthread.so.0
#2 0x4014b5fc in SysUtil_GetPSEventValue () at ALSPSensor.cpp:163
#3 0x00040c54 in PSensor::operateBackLight (this=<value optimized out>) at src/PALSSensorHandle.cpp:83
#4 0x00040d20 in PSensor::run (this=0x11ecf8) at src/PALSSensorHandle.cpp:66
#5 0x40d4d4f8 in ?? () from /nand/usr/lib/libQtCore.so.4
Thread 4 (process 536):
#0 0x4117e730 in mq_receive () from /lib/librt.so.0
#1 0x40219fa4 in BidirctMsgQueue::get (this=0x130ba8, msg_len=0xbe5ffd3c) at BidirctMsgQueue.cpp:56
#2 0x0003fb0c in MqReceiver::run (this=0x12b270) at src/MqReceiver.cpp:45
#3 0x40d4d4f8 in ?? () from /nand/usr/lib/libQtCore.so.4
Thread 3 (process 535):
#0 0x410ddfd8 in select () from /lib/libc.so.0
#1 0x4009897c in ipc_block_recv (sock=27, recv_message=0xbe7ffc90) at ipc.cpp:123
#2 0x40022c74 in CSipMsgControl::InitSipMsg (this=0x145ea8) at sipmsgcontrol.cpp:170
#3 0x40022cb8 in CSipMsgControl::run (this=0xfffffdfe) at sipmsgcontrol.cpp:119
#4 0x40d4d4f8 in ?? () from /nand/usr/lib/libQtCore.so.4
Thread 2 (process 534):
#0 0x410dd5ac in poll () from /lib/libc.so.0
#1 0x40fd87d4 in __pthread_manager () from /lib/libpthread.so.0
Backtrace stopped: frame did not save the PC
Thread 1 (process 1416):
#0 0x411740e4 in dlopen () from /lib/libdl.so.0
#1 0x40194a04 in snd_dlopen () from /nand/usr/lib/libasound.so.2
#2 0x40190cc8 in ?? () from /nand/usr/lib/libasound.so.2
(gdb)
I am trying to implement a thread pool using ACE Semaphore library. It does not provide any API like sem_getvalue which is in Posix semaphore. I need to debug some flow which is not behaving as expected. Can I examine the semaphore in GDB. I am using Centos as OS.
I initialized two semaphores using the default constructor providing count 0 and 10. I have declared them as static in the class and initialized it in the cpp file as
DP_Semaphore ThreadPool::availableThreads(10);
DP_Semaphore ThreadPool::availableWork(0);
But when I am printing the semaphore in GDB using the print command, I am getting the similar output
(gdb) p this->availableWork
$7 = {
sema = {
semaphore_ = {
sema_ = 0x6fe5a0,
name_ = 0x0
},
removed_ = false
}
}
(gdb) p this->availableThreads
$8 = {
sema = {
semaphore_ = {
sema_ = 0x6fe570,
name_ = 0x0
},
removed_ = false
}
}
Is there a tool which can help me here, or shall I switch to Posix thread and re-write all my code.
EDIT: As requested by #timrau the output of call this->availableWork->dump()
(gdb) p this->availableWork.dump()
[Switching to Thread 0x2aaaae97e940 (LWP 28609)]
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(DP_Semaphore::dump()) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) call this->availableWork.dump()
[Switching to Thread 0x2aaaaf37f940 (LWP 28612)]
The program stopped in another thread while making a function call from GDB.
Evaluation of the expression containing the function
(DP_Semaphore::dump()) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) info threads
[New Thread 0x2aaaafd80940 (LWP 28613)]
6 Thread 0x2aaaafd80940 (LWP 28613) 0x00002aaaac10a61e in __lll_lock_wait_private ()
from /lib64/libpthread.so.0
* 5 Thread 0x2aaaaf37f940 (LWP 28612) ThreadPool::fetchWork (this=0x78fef0, worker=0x2aaaaf37f038)
at ../../CallManager/src/DP_CallControlTask.cpp:1043
4 Thread 0x2aaaae97e940 (LWP 28609) DP_Semaphore::dump (this=0x6e1460) at ../../Common/src/DP_Semaphore.cpp:21
2 Thread 0x2aaaad57c940 (LWP 28607) 0x00002aaaabe01ff3 in __find_specmb () from /lib64/libc.so.6
1 Thread 0x2aaaacb7b070 (LWP 28604) 0x00002aaaac1027c0 in __nptl_create_event () from /lib64/libpthread.so.0
(gdb)
sema.semaphore_.sema_ in your code looks like a pointer. Try to find it's type in the ACE headers, then convert it to a type and print:
(gdb) p *((sem_t)0x6fe570)
Update: try to convert the address within the structure you posted to sem_t. If you use linux, ACE should be using posix semaphores, so type sem_t must be visible to gdb.
I've been doing threaded networking for a game, but the server dies randomly, while i've been testing the networking so that I have several clients connecting and sending bunch of packets and disconnecting then connecting back again.
I am using c++ with SFML/Network and SFML/System threads. I have thread which listens for connections in the server once connection is established it creates two new threads for sending and receiving packets. The event handler and the send/receive threads share data with two std::queues. I've been trying to debug the crash with gdb, but i'm not that experienced with this so i'm looking for help.
Here is gdb console input when the crash happens.
OUT: 10 1 HELLO
IN: 10 0 LOLOLOL
OUT: 10 1 HELLO
IN: 10 0 LOLOLOL
OUT: 10 1 HELLO
Out thread killed by in thread!
In thread died!
New client connected!
[Thread 0x34992b70 (LWP 16167) exited]
[New Thread 0x3118bb70 (LWP 16186)]
terminate called without an active exception
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x35193b70 (LWP 16166)]
0x00110416 in __kernel_vsyscall ()
Here is the backtrace:
(gdb) backtrace
#0 0x00110416 in __kernel_vsyscall ()
#1 0x46a0967f in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x46a0afb5 in abort () at abort.c:92
#3 0x47b8af0d in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#4 0x47b88c84 in __cxxabiv1::__terminate (handler=0x47b8adc0 <__gnu_cxx::__verbose_terminate_handler()>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:40
#5 0x47b88cc0 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:50
#6 0x47b8878f in __cxxabiv1::__gxx_personality_v0 (version=1, actions=10, exception_class=890844228, ue_header=0x35193dc0, context=0x35192ea0)
at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:669
#7 0x46bdbfbe in _Unwind_ForcedUnwind_Phase2 (exc=0x35193dc0, context=0x35192ea0) at ../../../gcc/unwind.inc:175
#8 0x46bdc3a9 in _Unwind_ForcedUnwind (exc=0x35193dc0, stop=0x46b76fc0 <unwind_stop>, stop_argument=0x35193444) at ../../../gcc/unwind.inc:207
#9 0x46b794e2 in _Unwind_ForcedUnwind (exc=0x35193dc0, stop=0x46b76fc0 <unwind_stop>, stop_argument=0x35193444) at ../nptl/sysdeps/pthread/unwind-forcedunwind.c:132
#10 0x46b77141 in __pthread_unwind (buf=<optimized out>) at unwind.c:130
#11 0x46b6f5bb in __do_cancel () at ../nptl/pthreadP.h:265
#12 sigcancel_handler (sig=<optimized out>, si=<optimized out>, ctx=<optimized out>) at nptl-init.c:202
#13 sigcancel_handler (sig=32, si=0x35192f7c, ctx=0x35192ffc) at nptl-init.c:155
#14 <signal handler called>
#15 0x08049930 in out (data=0xb761c798) at src/layer7.cpp:40
#16 0x0804b8d7 in sf::priv::ThreadFunctorWithArg<void (*)(networkdata*), networkdata*>::Run (this=0xb761c7c8) at /usr/local/include/SFML/System/Thread.inl:48
#17 0x00116442 in sf::Thread::Run() () from /home/toni/ProjectRepos/sfml/build/lib/libsfml-system.so.2
#18 0x001166df in sf::priv::ThreadImpl::EntryPoint(void*) () from /home/toni/ProjectRepos/sfml/build/lib/libsfml-system.so.2
#19 0x46b70c5e in start_thread (arg=0x35193b70) at pthread_create.c:305
#20 0x46ab4b4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:133
Here is the thread code from src/layer7.cpp
void out(networkdata * data) {
bool running = true;
while(running) {
if(data->pipe_out->pipe_empty() == false) {
sf::Packet packet = data->pipe_out->pop_message();
if(data->socket->Send(packet) == sf::Socket::Disconnected) {
data->thread_in->Terminate();
std::cout << "In thread killed by out thread!" << std::endl;
running = false;
}
}
}
std::cout << "Out thread died!" << std::endl;
}
Line 40 is the first if keyword after the while(running).
The data->pipe_out->pipe_empty() is call to the queue->empty()
The data->pipe_out->pop_message() is call which pops the front from the queue.
Then it sends the packet and checks if the connection is not disconnected
if socket is disconnected it terminates the "in" thread and stops the own thread.
You need locks around data to protect against concurrent access to the same data structure from multiple threads.
One possible reason for is an exception: exception should be caught withing thread. Also, looks like data->thread_in->Terminate() sends cancelation request, make sure that all established cancellation handlers are working correctly in that case.