I've run into some problems debugging a multi-threaded process using GDB. I have a multi-threaded process that splinters off into several (8 or 9) different threads, and I am trying to determine what the contents of variables are when the constructor for a class called XML_File_Data is called. However, I've run into a problem where, after I apply the correct function breakpoint to all threads and it's apparent one of the thread's break point is getting hit (the program temporarily halts execution), I'm not able to determine which thread hit the breakpoint. The command
(gdb) thread apply all where
is giving me shockingly useless information in the form:
#0 0x004ab410 in __kernel_vsyscall ()
#1 0x05268996 in nanosleep () from /lib/libc.so.6
#2 0x052a215c in usleep () from /lib/libc.so.6
#3 0x082ee313 in frame_clock_frame_end (clock=0xb4bfd2f8)
at frame_clock.c:143
#4 0x003a349a in ?? ()
#5 0x00b5cfde in thread_proxy ()
from /cets_development_libraries/install/lib/libboost_thread-gcc41-mt-1_38.so.1.38.0
#6 0x02c1f5ab in start_thread () from /lib/libpthread.so.0
#7 0x052a8cfe in clone () from /lib/libc.so.6
Of the 9 processes, 7 or so are giving me almost exactly that output, and the information about the last 2 isn't really much more helpful (functions far down the call stack have recognizable names, but any recent #0-#4 functions aren't recognizable).
This is what I have so far:
(gdb) gdb
(gdb) gdb attach <processid>
(gdb) thread apply all 'XML_File_Data::XML_File_Data()'
and (after the breakpoint is hit)
(gdb) thread apply all where
Could any experienced debuggers offer me some hints on what I am doing wrong or what is normally done in this situation?
Cheers,
Charlie
EDIT: Fortunately, I was able to find out that the cause of the ??'s was optimized code being run through the debugger, in addition to not running the debugger in the directory of the executable file. Still not much success with the debugging though.
I find myself doing this all the time:
> t a a f
Short for:
> thread apply all frame
Of course, other variants are possible:
> t a a bt 3
Which prints the bottom 3 frames of each thread's stack. (You can also use negative numbers to get the top N frames of the stack)
You can use command thread or info threads to find out the current thread number after breakpoint hit
(gdb) thread
[Current thread is 1 (Thread 0xb790d6c0 (LWP 2519))]
(gdb)
(gdb) info threads
17 Thread 0xb789cb90 (LWP 2536) 0xb7fc6402 in __kernel_vsyscall ()
16 Thread 0xb769bb90 (LWP 2537) 0xb7fc6402 in __kernel_vsyscall ()
15 Thread 0xb749ab90 (LWP 2543) 0xb7fc6402 in __kernel_vsyscall ()
14 Thread 0xb7282b90 (LWP 2544) 0xb7fc6402 in __kernel_vsyscall ()
13 Thread 0xb5827b90 (LWP 2707) 0xb7fc6402 in __kernel_vsyscall ()
12 Thread 0xb5626b90 (LWP 2708) 0xb7fc6402 in __kernel_vsyscall ()
11 Thread 0xb5425b90 (LWP 2709) 0xb7fc6402 in __kernel_vsyscall ()
10 Thread 0xb5161b90 (LWP 2713) 0xb7fc6402 in __kernel_vsyscall ()
9 Thread 0xb4ef9b90 (LWP 2715) 0xb7fc6402 in __kernel_vsyscall ()
8 Thread 0xb4af7b90 (LWP 2717) 0xb7fc6402 in __kernel_vsyscall ()
7 Thread 0xb46ffb90 (LWP 2718) 0xb7fc6402 in __kernel_vsyscall ()
6 Thread 0xb44feb90 (LWP 2726) 0xb7fc6402 in __kernel_vsyscall ()
5 Thread 0xb42fdb90 (LWP 2847) 0xb7fc6402 in __kernel_vsyscall ()
4 Thread 0xb40fcb90 (LWP 2848) 0xb7fc6402 in __kernel_vsyscall ()
3 Thread 0xb3efbb90 (LWP 2849) 0xb7fc6402 in __kernel_vsyscall ()
2 Thread 0xb3cfab90 (LWP 2850) 0xb7fc6402 in __kernel_vsyscall ()
* 1 Thread 0xb790d6c0 (LWP 2519) 0xb7fc6402 in __kernel_vsyscall ()
(gdb)
An asterisk `*' to the left of the gdb thread number indicates the current thread. See here.
Related
When I start my program in cuda-gdb, I get an output like:
[New Thread 0x7fffef8ea700 (LWP 8003)]
[New Thread 0x7fffe35b2700 (LWP 8010)]
[New Thread 0x7fffe2db1700 (LWP 8011)]
[New Thread 0x7fffe25b0700 (LWP 8012)]
I do not understand why these multiple threads are launched in the beginning. I have not launched my program in multi-threaded mode. I am using MPI, but I start one process. So, where are these threads coming from?
This does not affect my debugging process in any way. Its just that I don't understand what this means.
These threads you see are created by the CUDA runtime library, and aren't directly related to cuda-gdb itself. If you run the same code with gdb, you will also see the same messages.
If you want to see what happens what these threads are doing or where they're coming from, simply compile your code with the -g flag, set a breakpoint in your code (e.g., immediately before a CUDA kernel starts), run it, and then run the following command in the gdb console:
thread apply all backtrace
This command has the same effect of gdb's backtrace, except that it will show the backtrace for all threads created by your program.
In my case, I get the following messages after starting my program:
[New Thread 0x7fffeffb3700 (LWP 7141)]
[New Thread 0x7fffef731700 (LWP 7142)]
[New Thread 0x7fffeef30700 (LWP 7143)]
When I run the command mentioned above in my gdb console, I see the following output:
(gdb) thread apply all backtrace
Thread 4 (Thread 0x7fffeef30700 (LWP 7143)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff63c19b7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6386bb7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffeef30700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7fffef731700 (LWP 7142)):
#0 0x00007ffff6cc5aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff63bf6a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff642261e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffef731700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7fffeffb3700 (LWP 7141)):
#0 0x00007ffff6ccfa9f in accept4 (fd=13, addr=..., addr_len=0x7fffeffb2e18, flags=-1) at ../sysdeps/unix/sysv/linux/accept4.c:45
#1 0x00007ffff63c0556 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff63b404d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffeffb3700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fc0740 (LWP 7136)):
#0 main () at cuda_heap.cu:66
As you can verify, all threads that have been created at the beginning match both thread addresses and LWP (Light Weight Process) IDs. You can see that all of them come from libcuda.so.1.
In cuda-gdb, you're able to see some more detailed information:
(cuda-gdb) thread apply all bt
Thread 4 (Thread 0x7fffeef30700 (LWP 10019)):
#0 0x00007ffff79c33f8 in pthread_cond_timedwait##GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff63c19b7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6386bb7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 3 (Thread 0x7fffef731700 (LWP 10018)):
#0 0x00007ffff6cc5aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff63bf6a3 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff642261e in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 2 (Thread 0x7fffeffb3700 (LWP 10017)):
#0 0x00007ffff6ccfa9f in accept4 () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff63c0556 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff63b404d in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 1 (Thread 0x7ffff7fc0740 (LWP 10007)):
#0 main () at cuda_heap.cu:66
i don't know what exactly it is, but I think cuda-gdb need to create multiple thread to catch the errors/exceptions like: memory violation or bank conflicts.
This question already has an answer here:
How to find which thread caused SEGFAULT in a post-mortem gdb session?
(1 answer)
Closed 8 years ago.
My application uses more than 8 threads. When I run info threads in gdb I see the threads and the last function they were executing. It does not seem obvious to me exactly which thread caused the SIGSEGV. Is it possible to tell it? Is it thread 1? How are the threads numbered?
When you use gdb to analyze the core dump file, the gdb will stop at the function which causes program core dump. And the current thread will be the murder. Take the following program as an example:
#include <stdio.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
while (1)
{
printf("%s\n", (char*)p_arg);
sleep(10);
}
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, thread_func, "Thread 1");
pthread_create(&t2, NULL, thread_func, NULL);
sleep(1000);
return;
}
The t2 thread will cause program down because it refers a NULL pointer. After the program down, use gdb to analyze the core dump file:
[root#localhost nan]# gdb -q a core.32794
Reading symbols from a...done.
[New LWP 32796]
[New LWP 32795]
[New LWP 32794]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6
(gdb)
The gdb stops at __strlen_sse2 function, this means this function causes the program down. Then use bt command to see it is called by which thread:
(gdb) bt
#0 0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6
#1 0x00000034e4268cdb in puts () from /lib64/libc.so.6
#2 0x00000000004005cc in thread_func (p_arg=0x0) at a.c:7
#3 0x00000034e4a079d1 in start_thread () from /lib64/libpthread.so.0
#4 0x00000034e42e8b6d in clone () from /lib64/libc.so.6
(gdb) i threads
Id Target Id Frame
3 Thread 0x7ff6104c1700 (LWP 32794) 0x00000034e42accdd in nanosleep () from /lib64/libc.so.6
2 Thread 0x7ff6104bf700 (LWP 32795) 0x00000034e42accdd in nanosleep () from /lib64/libc.so.6
* 1 Thread 0x7ff60fabe700 (LWP 32796) 0x00000034e4281451 in __strlen_sse2 () from /lib64/libc.so.6
The bt command shows the stack frame of the current thread(which is the murder). "i threads" commands shows all the threads, the thread number which begins with * is the current thread.
As for "How are the threads numbered?", it depends on the OS. you can refer the gdb manual for more information.
I'm debugging an app with many threads, so I've named them using prctl. This works great with gdb's info threads option, but it would be nice if thread * apply all operations showed it as well. Any way to coerce gdb to do this?
(gdb) info threads
Id Target Id Frame
...
3 Thread 0x7ffff6ffe700 (LWP 30048) "poll_uart_threa" 0x00007ffff78eb823 in select ()
at ../sysdeps/unix/syscall-template.S:82
2 Thread 0x7ffff77ff700 (LWP 30047) "signal hander" do_sigwait (set=<optimized out>,
sig=0x7ffff77feed8)
at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
* 1 Thread 0x7ffff7fcc700 (LWP 30046) "simulator" __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
Pointer, PID {well, thread ID, but LWP threads == processes, ish}, and name
(gdb) thread apply all bt
...
Thread 3 (Thread 0x7ffff6ffe700 (LWP 30048)):
#0 0x00007ffff78eb823 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000403bb3 in poll_uart_thread (unused=0x0) at uart.c:96
#2 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6ffe700) at pthread_create.c:308
#3 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7ffff77ff700 (LWP 30047)):
<call stack>
#2 0x0000000000417a89 in sig_thread (arg=0x7fffffffbb60) at simulator.c:879
#3 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff77ff700) at pthread_create.c:308
#4 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7ffff7fcc700 (LWP 30046)):
<call stack>
#9 0x00000000004182e3 in simulator (flash_file=0x7fffffffe0e4 "../programs/blink.bin")
at simulator.c:1005
#10 0x0000000000401f14 in main (argc=3, argv=0x7fffffffdd48) at cli.c:167
While I can find the name by hunting the call stack, it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
There's no easy way, you have to patch GDB. It's a simple patch, you can find it here.
it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
Please file an ehnancement request in GDB bugzilla.
If you are using GDB with embedded python, you might be able to script "thread apply" to do what you want, but it really ought to do the right thing already.
I have application (server) written in C++ that are crashing around few hours, looks random probably.
Worst part is i trying to debug any of core file using gdb and i see that result:
gdb --core=core.668 --symbols=selectserver
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `./selectserver'.
Program terminated with signal 11, Segmentation fault.
[New process 672]
[New process 671]
[New process 670]
[New process 669]
[New process 668]
#0 0xb7866896 in ?? ()
(gdb) info threads
5 process 668 0xffffe410 in __kernel_vsyscall ()
4 process 669 0xffffe410 in __kernel_vsyscall ()
3 process 670 0xffffe410 in __kernel_vsyscall ()
2 process 671 0xffffe410 in __kernel_vsyscall ()
* 1 process 672 0xb7866896 in ?? ()
(gdb) bt
#0 0xb7866896 in ?? ()
#1 0x082da4b0 in ?? ()
#2 0xb79e4252 in ?? ()
#3 0xa2ba9014 in ?? ()
#4 0x0825e14c in ?? ()
#5 0x082da4b0 in ?? ()
#6 0xb56175e8 in ?? ()
#7 0x00000080 in ?? ()
#8 0xb5fe723f in ?? ()
#9 0xa2ba9014 in ?? ()
#10 0xa2ba9008 in ?? ()
#11 0xb7a32ff4 in ?? ()
#12 0x00000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (process 671)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (process 670)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 4
[Switching to thread 4 (process 669)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 5
[Switching to thread 5 (process 668)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb78b7de1 in ?? ()
#2 0x00000032 in ?? ()
#3 0xbf849ae8 in ?? ()
#4 0xbf8499e8 in ?? ()
#5 0x00000000 in ?? ()
(gdb) quit
I dont know what is going on, why addresses on stack excluding __kernel_vsyscall are so wired not maps to symbol.
What i need to do to find the problem, debug memory dump of that problem.
Thanks for help!
You need to compile the program with debugging symbols or get a separate file with debugging symbols. Pass the -g flag to gcc to enable these.
If you want to see what all of the functions are, even the ones inside library functions (for instance, standard library functions) you also need to get a version of the library with debugging symbols.
Starting gdb --core=core.668 selectserver fixed problem.
UPDATE Well, it looks like adding PyEval_InitThreads() before the call to PyGILState_Ensure() does the trick. In my haste to figure things out I incorrectly attributed my "hanging" to PyEval_InitThreads().
However, after reading some Python documentation I am wondering if this is the correct solution.
It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock.
First of all, I am working on some modified GNU Radio code - particularly a modified gr_bin_statistics_f block. Now, there is a bug report (albeit an old one) which pretty much describes my exact situation.
http://gnuradio.org/redmine/issues/show/199
Now, usrp_spectrum_sense.py which is mentioned in the bug report calls gr_bin_statistics_f (C++) which then calls back to Python periodically to re-tune the USRP (radio).
Here is what happens when the Python code is called:
PyGILState_STATE d_gstate;
d_gstate = PyGILState_Ensure();
// call python code
PyGILState_Release(d_gstate);
So, once we return from the Python code a segmentation fault occurs when PyGILState_Release(d_gstate) is called. While there are differences between my code and the original gr_bin_statistics_f, nothing seems to be remotely related to this.
I read that calling PyEval_InitThreads() before PyGILState_Ensure() has solved the problem for some people, but it just causes my program to hang.
Can anyone shed light on this for me? Or is it simply time to send a message to the GNU Radio mailing list?
Using Python2.7 on Fedora 14 x86_64.
Here is the GDB backtrace:
(gdb) c
Continuing.
[New Thread 0x7fabd3a8d700 (LWP 23969)]
[New Thread 0x7fabd328c700 (LWP 23970)]
[New Thread 0x7fabd2a8b700 (LWP 23971)]
[New Thread 0x7fabd228a700 (LWP 23972)]
[New Thread 0x7fabd1a89700 (LWP 23973)]
[New Thread 0x7fabd1288700 (LWP 23974)]
[New Thread 0x7fabd0a87700 (LWP 23975)]
[New Thread 0x7fabbbfff700 (LWP 23976)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fabbbfff700 (LWP 23976)]
0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
#1 0x00000036c1317679 in PyThread_release_lock () from /usr/lib64/libpython2.7.so.1.0
#2 0x00007fabd6159c1f in ~ensure_py_gil_state (this=0x2dc6fc0, x=887000000)
at gnuradio_swig_py_general.cc:5593
#3 gr_py_feval_dd::calleval (this=0x2dc6fc0, x=887000000) at gnuradio_swig_py_general.cc:5605
#4 0x00007fabd77c4b6e in gr_noise_level_f::tune_window (this=0x2db3ca0,
target_freq=) at gr_noise_level_f.cc:97
#5 0x00007fabd77c554b in gr_noise_level_f::work (this=0x2db3ca0, noutput_items=7,
input_items=, output_items=)
at gr_noise_level_f.cc:115
#6 0x00007fabd7860714 in gr_sync_block::general_work (this=0x2db3ca0,
noutput_items=, ninput_items=,
input_items=, output_items=) at gr_sync_block.cc:64
#7 0x00007fabd7846ce4 in gr_block_executor::run_one_iteration (this=0x7fabbbffed90)
at gr_block_executor.cc:299
#8 0x00007fabd7864332 in gr_tpb_thread_body::gr_tpb_thread_body (this=0x7fabbbffed90, block=...)
at gr_tpb_thread_body.cc:49
#9 0x00007fabd785cce7 in operator() (function_obj_ptr=...) at gr_scheduler_tpb.cc:42
#10 operator() (function_obj_ptr=...)
at /home/tja/Research/energy/detector/gnuradio-3.3.0/gruel/src/include/gruel/thread_body_wrapper.h:49
#11 boost::detail::function::void_function_obj_invoker0, void>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153
---Type to continue, or q to quit---
#12 0x00007fabd74914ef in operator() (this=)
at /usr/include/boost/function/function_template.hpp:1013
#13 boost::detail::thread_data >::run (this=)
at /usr/include/boost/thread/detail/thread.hpp:61
#14 0x00007fabd725ca55 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.44.0
#15 0x00000036b3e06d5b in start_thread () from /lib64/libpthread.so.0
#16 0x00000036b3ae4a7d in clone () from /lib64/libc.so.6
(gdb)
Thanks for looking!
Python expects a certain amount of initialization to be done by the main thread before anything attempts to call back in from a subthread.
If the main thread is an application that is embedding Python, then it should call PyEval_InitThreads() immediately after calling Py_Initialize().
If the main thread is instead the Python interpreter itself (as seems to be the case here), then the module using the multithreaded extension module should include an "import threading" early to ensure that PyEval_InitThreads() is called correctly before any subthreads are spawned.
I ran into this exact problem as well. The documentation for anything relating to threads in CPython is unfortunately patchy at best.
Essentially, you need to do the following:
In your main thread, BEFORE any other threads are spawned, you need to call PyEval_InitThreads(). A good place to do this is right after you call PyInitialize().
Now, PyEval_InitThreads() not only initializes the Python interpreter thread-state, it also implicitly acquires the Global Interpreter Lock. This means, you need to release the lock before you call PyGILEnsure_State() in some other thread, otherwise your program will hang. You can do this with the function PyEval_ReleaseLock().
So basically, in your main thread, before any other threads are launched, you want to say:
PyInitialize();
PyEval_InitThreads();
PyEval_ReleaseLock();
Then, in any additional thread, anytime you use the Python API you need to say:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* ... some code that does things with Python ... */
PyGILState_Release(gstate);