I have a multi-threaded program, where each thread is assigned a unique name with help of pthread_setname_np. If I attach to a running program with gdb, I can see their names:
(gdb) i th
Id Target Id Frame
* 1 Thread 0x7f883dba1200 (LWP 3867757) "main" __pthread_clockjoin_ex (threadid=140222870320896, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>)
at pthread_join_common.c:145
2 Thread 0x7f883477f700 (LWP 3867759) "log_aggregator" 0x00007f883dc8026f in __GI___clock_nanosleep (clock_id=clock_id#entry=0, flags=flags#entry=0, req=0x7f883477b220, rem=0x7f883477b220)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
Note main and log_aggregator names. However, If I make a core dump and then load this dump with gdb, I don't see the names:
$ gcore 3867757
...
Saved corefile core.3867757
...
$ gdb -c core.3867757 my_program
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
...
(gdb) i th
Id Target Id Frame
* 1 Thread 0x7f883dba1200 (LWP 3867757) __pthread_clockjoin_ex (threadid=140222870320896, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>)
at pthread_join_common.c:145
2 Thread 0x7f883477f700 (LWP 3867759) 0x00007f883dc8026f in __GI___clock_nanosleep (clock_id=clock_id#entry=0, flags=flags#entry=0, req=0x7f883477b220, rem=0x7f883477b220)
at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
How do I get thread names in this case? Or should I create core file differently to put them in the core file? I examined coredump_filter option, but did not find anything related. I also discovered this discussion at narkive mailing list archive, but it does not provide an answer, only a suggestion that this might be impossible.
My next question would be is there any way this /proc/self/task/[tid]/comm
The /proc/.../comm is not a real file, it's a representation of kernel data structures related to this task generated on-demand (when you try to read that "file").
The current->comm field that you are looking for is not mentioned in the binfmt_elf.c file (which is where other core dumping code is), so it's probably not saved.
In fact this is easily confirmed by running a test program generating a random string, calling pthread_setname_np, dumping core and then running strings -a core | grep $string, which produces no output.
There is no fundamental reason the current->comm can not be saved in the core, but currently it isn't.
P.S. Usually looking at thread apply all where is enough to distinguish different threads, so the unique thread name is redundant and unnecessary.
Related
With boost::log is it safe to add or remove log sinks while still logging from other threads? Is there some manual locking that I need to do to make these operations thread-safe?
What I am trying to do is to start a new text file that contains a subset of entries of an existing log file between two points in my program.
For example if I have the following entries going to a log file "main_debug.log"
line 1
line 2
line 3
line 4
And then I add a new sink after line 1, and remove it after line 3, I would see "new_debug.log" containing the following entries
line 2
line 3
What I have seems to work most of the time, but I am occasionally seeing segmentation faults occurring within boost::log. An example of an instance of this occurring when I managed to catch it with gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 8760]
boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
738 /boost-1.60.0/boost/intrusive/list.hpp: No such file or directory.
(gdb) bt
#0 boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
#1 boost::log::v2_mt_posix::attribute_value_set::implementation::~implementation (this=0x5e64afe8, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:150
#2 boost::log::v2_mt_posix::attribute_value_set::implementation::destroy (p=0x5e64afe8) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:239
#3 boost::log::v2_mt_posix::attribute_value_set::~attribute_value_set (this=0x5e64b3e4, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:519
#4 0x76e3bbac in boost::log::v2_mt_posix::record_view::public_data::~public_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/boost/log/core/record_view.hpp:86
#5 boost::log::v2_mt_posix::record_view::private_data::~private_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/core.cpp:79
#6 boost::log::v2_mt_posix::record_view::private_data::destroy (this=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:131
#7 boost::log::v2_mt_posix::record_view::public_data::destroy (p=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:184
#8 0x0020b030 in boost::log::v2_mt_posix::sinks::asynchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend, boost::log::v2_mt_posix::sinks::unbounded_fifo_queue>::run() ()
#9 0x76d4be6c in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at /boost-1.60.0/libs/thread/src/pthread/thread.cpp:167
#10 0x76c22f00 in ?? () from /lib/libpthread.so.0
To add a new sink I am doing the following:
const auto pDebugBackend = boost::make_shared<boost::log::sinks::text_file_backend>(
boost::log::keywords::file_name = "debug.log",
boost::log::keywords::channel = "InfoConsole" );
const auto pNewDebugSink = boost::make_shared<boost::log::sinks::asynchronous_sink<boost::log::sinks::text_file_backend>>( pDebugBackend );
// Other code to set the filter and formatter for the sink.
boost::log::core::get()->add_sink( pNewDebugSink );
And to remove the sink some time later I have, which follows the order described in https://www.boost.org/doc/libs/1_60_0/libs/log/doc/html/log/detailed/sink_frontends.html#log.detailed.sink_frontends.async:
boost::log::core::get()->remove_sink( pNewDebugSink );
pNewDebugSink->stop();
pNewDebugSink->flush();
pNewDebugSink.reset();
I am using boost-1.60.0, and it is built with threading support enabled.
With boost::log is it safe to add or remove log sinks while still logging from other threads?
Yes, although adding and removing sinks are two distinct operations, and you may miss some log records while the old sink is removed and the new one is not added yet.
Regarding the crashes you're seeing, it seems to happen while the dedicated logging thread is still running (i.e. before the stop method completes), so it is possible that removing the sink is not related. This may be a bug in Boost.Log or some other library used by it, but your Boost version is rather old. Try updating and if it still reproduces, report a bug in Boost.Log with a reproducer code sample.
My app is randomly (once a day) crashed and I have tried several ways to find out the reason but no luck.
With other core dump or segmentation fault cases, I can locate where does it happen by gdb, but for this case, gdb don't give me too much hint.
I need some advice for my continuous debugging, please help.
GDB output when my app crashed
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/greystone/myapp/myapp'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
[Current thread is 1 (Thread 0x7f5cea3d4700 (LWP 14353))]
(gdb) bt full
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#1 0x00007f5d3a435c6f in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f5d3a472742 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#3 0x00007f5d3a42cab3 in g_main_context_new () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#4 0x00007f5d3f4894c9 in QEventDispatcherGlibPrivate::QEventDispatcherGlibPrivate(_GMainContext*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#5 0x00007f5d3f4895a1 in QEventDispatcherGlib::QEventDispatcherGlib(QObject*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#6 0x00007f5d3f266870 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#7 0x00007f5d3f267758 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#8 0x00007f5d3efa76ba in start_thread (arg=0x7f5cea3d4700) at pthread_create.c:333
__res =
pd = 0x7f5cea3d4700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140037043603200, 4399946704104667801, 0, 140033278038543, 8388608, 140037073195984, -4344262468029171047, -4344357617020880231}, mask_was_saved = 0}},
priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
__PRETTY_FUNCTION__ = "start_thread"
#9 0x00007f5d3e43c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
Solutions I have tried
Search topic related with SIGTRAP
People said it is in debug mode and there are somewhere in the code set break point. However, my app is compiled in release mode without break point.
Catch signal handler and ignore SIGTRAP
No success, I can only ignore SIGTRAP sent by "kill -5 pid". With the SIGTRAP occurs randomly in runtime, my app is still crashed
Fix memory leak in code
Initialize pointer with nullptr
Double check mysql C API race conditions
Double check delete array action and double check assign value for the index out of array boundaries
Check signals and slots
My app is built on Qt frameworks as a GUI application, there are many signals and slots I have checked but no ideas how are they related to SIGTRAP core dump.
Check exceptions for opencv
I use opencv for image processing tasks. I have checked for exception cases
Shared memory
Memory shared between main process and sub processes were carefully checked
Example code
A lot of code in my app, but because gdb don't give me exactly where does it happen, so I don't know which code I should share. If you need it for checking for suggestion, please tell me which part of the code you would like to check. My app have these following parts.
Mysql in C api mysql 5.7.29
User interface (alot) by Qt framework 5.9.2
Image processing with opencv 2.4.9
Process flow in multi threading by Qt framework 5.9.2
If there is any ideas, please share me some keywords then I could search about it and apply to my app. Thanks for your help.
for this case, gdb don't give me too much hint
GDB tells you exactly what happened, you just didn't understand it.
What's happening is that some code in libglib called g_logv(..., G_LOG_FLAG_FATAL, ...), which eventually calls _g_log_abort(), which executes int3 (debug breakpoint) instruction.
You should be able to (gdb) x/i 0x00007f5d3a435afb and see that instruction.
It looks like g_main_context_new() may have failed to allocate memory.
In any case, you should look in the application stderr logs for the reason libglib is terminating your program (effectively, libglib calls an equivalent of abort, because some precondition has failed).
I have a process which uses almost 16 threads. On of the thread id getting exited by itself after working for some time (please note process didn't crash). I am not sure how to look for reasons for thread exiting ? I tried using print statement but that doesn't seems to help. Tried capturing through gdb didn't help. If this is some sort of memory corruption then the process should crash and core file should have tell everything (most of the time) but process remains running just the thread exiting is making my job difficult..
Can you please suggest some way to debug this issue ?
-Arpit
I am not sure how to look for reasons for thread exiting
There are only two ways that I can think of for this to happen:
The thread function returns
Something the thread function calls executes syscall(SYS_exit, ...)
You can use (gdb) catch syscall exit and then where to find out where this is happening.
If where shows something similar to this:
(gdb) where
#0 0x00007ffff7bc3556 in __exit_thread () at ../sysdeps/unix/sysv/linux/exit-thread.h:36
#1 start_thread (arg=0x7ffff781c700) at pthread_create.c:478
#2 0x00007ffff7905a8f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
then you have case 1, and you need to step through your thread function to find out where it is returning (https://rr-project.org/ may help a lot).
If instead you see something like this:
(gdb) where
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000555555554746 in do_work () at t.c:9
#2 0x0000555555554762 in fn (p=0x0) at t.c:15
#3 0x00007ffff7bc3494 in start_thread (arg=0x7ffff781c700) at pthread_create.c:333
#4 0x00007ffff7905a8f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
then the culprit is whatever routine called syscall.
I have a strange problem that I can't solve. Please help!
The program is a multithreaded c++ application that runs on ARM Linux machine. Recently I began testing it for the long runs and sometimes it crashes after 1-2 days like so:
*** glibc detected ** /root/client/my_program: free(): invalid pointer: 0x002a9408 ***
When I open core dump I see that the main thread it seems has a corrupt stack: all I can see is infinite abort() calls.
GNU gdb (GDB) 7.3
...
This GDB was configured as "--host=i686 --target=arm-linux".
[New LWP 706]
[New LWP 700]
[New LWP 702]
[New LWP 703]
[New LWP 704]
[New LWP 705]
Core was generated by `/root/client/my_program'.
Program terminated with signal 6, Aborted.
#0 0x001c44d4 in raise ()
(gdb) bt
#0 0x001c44d4 in raise ()
#1 0x001c47e0 in abort ()
#2 0x001c47e0 in abort ()
#3 0x001c47e0 in abort ()
#4 0x001c47e0 in abort ()
#5 0x001c47e0 in abort ()
#6 0x001c47e0 in abort ()
#7 0x001c47e0 in abort ()
#8 0x001c47e0 in abort ()
#9 0x001c47e0 in abort ()
#10 0x001c47e0 in abort ()
#11 0x001c47e0 in abort ()
And it goes on and on. I tried to get to the bottom of it by moving up the stack: frame 3000 or even more, but eventually core dump runs out of frames and I still can't see why this has happened.
When I examine the other threads everything seems normal there.
(gdb) info threads
Id Target Id Frame
6 LWP 705 0x00132f04 in nanosleep ()
5 LWP 704 0x001e7a70 in select ()
4 LWP 703 0x00132f04 in nanosleep ()
3 LWP 702 0x00132318 in sem_wait ()
2 LWP 700 0x00132f04 in nanosleep ()
* 1 LWP 706 0x001c44d4 in raise ()
(gdb) thread 5
[Switching to thread 5 (LWP 704)]
#0 0x001e7a70 in select ()
(gdb) bt
#0 0x001e7a70 in select ()
#1 0x00057ad4 in CSerialPort::read (this=0xbea7d98c, string_buffer=..., delimiter=..., timeout_ms=1000) at CSerialPort.cpp:202
#2 0x00070de4 in CScanner::readResponse (this=0xbea7d4cc, resp_recv=..., timeout=1000, delim=...) at PidScanner.cpp:657
#3 0x00071198 in CScanner::sendExpect (this=0xbea7d4cc, cmd=..., exp_str=..., rcv_str=..., timeout=1000) at PidScanner.cpp:604
#4 0x00071d48 in CScanner::pollPid (this=0xbea7d4cc, mode=1, pid=12, pid_str=...) at PidScanner.cpp:525
#5 0x00072ce0 in CScanner::poll1 (this=0xbea7d4cc)
#6 0x00074c78 in CScanner::Poll (this=0xbea7d4cc)
#7 0x00089edc in CThread5::Thread5Poll (this=0xbea7d360)
#8 0x0008c140 in CThread5::run (this=0xbea7d360)
#9 0x00088698 in CThread::threadFunc (p=0xbea7d360)
#10 0x0012e6a0 in start_thread ()
#11 0x001e90e8 in clone ()
#12 0x001e90e8 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(Classes and functions names are a bit wierd because I changed them -:)
So, thread #1 is where the stack is corrupt, backtrace of every other (2-6) shows
Backtrace stopped: previous frame identical to this frame (corrupt stack?).
It happends because threads 2-6 are created in the thread #1.
The thing is that I can't run the program in gdb because it runs on an embedded system. I can't use remote gdb server. The only option is examining core dumps that occur not very often.
Could you please suggest something that could move me forward with this? (Maybe something else I can extract from the core dump or maybe somehow to make some hooks in the code to catch abort() call).
UPDATE: Basile Starynkevitch suggested to use Valgrind, but turns out it's ported only for ARMv7. I have ARM 926 which is ARMv5, so this won't work for me. There are some efforts to compile valgrind for ARMv5 though: Valgrind cross compilation for ARMv5tel, valgrind on the ARM9
UPDATE 2: Couldn't make Electric Fence work with my program. The program uses C++ and pthreads. The version of Efence I got, 2.1.13 crashed in a arbitrary place after I start a thread and try to do something more or less complicated (for example to put a value into an STL vector). I saw people mentioning some patches for Efence on the web but didn't have time to try them. I tried this on my Linux PC, not on the ARM, and other tools like valgrind or Dmalloc don't report any problems with the code. So, everyone using version 2.1.13 of efence be prepared to have problems with pthreads (or maybe pthread + C++ + STL, don't know).
My guess for the "infinite' aborts is that either abort() causes a loop (e.g. abort -> signal handler -> abort -> ...) or that gdb can't correctly interpret the frames on the stack.
In either case I would suggest manually checking out the stack of the problematic thread. If abort causes a loop, you should see a pattern or at least the return address of abort repeating every so often. Perhaps you can then more easily find the root of the problem by manually skipping large parts of the (repeating) stack.
Otherwise, you should find that there is no repeating pattern and hopefully the return address of the failing function somewhere on the stack. In the worst case such addresses are overwritten due to a buffer overflow or such, but perhaps then you can still get lucky and recognise what it is overwritten with.
One possibility here is that something in that thread has very, very badly smashed the stack by vastly overwriting an on-stack data structure, destroying all the needed data on the stack in the process. That makes postmortem debugging very unpleasant.
If you can reproduce the problem at will, the right thing to do is to run the thread under gdb and watch what is going on precisely at the moment when the the stack gets nuked. This may, in turn, require some sort of careful search to determine where exactly the error is happening.
If you cannot reproduce the problem at will, the best I can suggest is very carefully looking for clues in the thread local storage for that thread to see if it hints at where the thread was executing before death hit.
UPDATE Well, it looks like adding PyEval_InitThreads() before the call to PyGILState_Ensure() does the trick. In my haste to figure things out I incorrectly attributed my "hanging" to PyEval_InitThreads().
However, after reading some Python documentation I am wondering if this is the correct solution.
It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock.
First of all, I am working on some modified GNU Radio code - particularly a modified gr_bin_statistics_f block. Now, there is a bug report (albeit an old one) which pretty much describes my exact situation.
http://gnuradio.org/redmine/issues/show/199
Now, usrp_spectrum_sense.py which is mentioned in the bug report calls gr_bin_statistics_f (C++) which then calls back to Python periodically to re-tune the USRP (radio).
Here is what happens when the Python code is called:
PyGILState_STATE d_gstate;
d_gstate = PyGILState_Ensure();
// call python code
PyGILState_Release(d_gstate);
So, once we return from the Python code a segmentation fault occurs when PyGILState_Release(d_gstate) is called. While there are differences between my code and the original gr_bin_statistics_f, nothing seems to be remotely related to this.
I read that calling PyEval_InitThreads() before PyGILState_Ensure() has solved the problem for some people, but it just causes my program to hang.
Can anyone shed light on this for me? Or is it simply time to send a message to the GNU Radio mailing list?
Using Python2.7 on Fedora 14 x86_64.
Here is the GDB backtrace:
(gdb) c
Continuing.
[New Thread 0x7fabd3a8d700 (LWP 23969)]
[New Thread 0x7fabd328c700 (LWP 23970)]
[New Thread 0x7fabd2a8b700 (LWP 23971)]
[New Thread 0x7fabd228a700 (LWP 23972)]
[New Thread 0x7fabd1a89700 (LWP 23973)]
[New Thread 0x7fabd1288700 (LWP 23974)]
[New Thread 0x7fabd0a87700 (LWP 23975)]
[New Thread 0x7fabbbfff700 (LWP 23976)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fabbbfff700 (LWP 23976)]
0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00000036b3e0db00 in sem_post () from /lib64/libpthread.so.0
#1 0x00000036c1317679 in PyThread_release_lock () from /usr/lib64/libpython2.7.so.1.0
#2 0x00007fabd6159c1f in ~ensure_py_gil_state (this=0x2dc6fc0, x=887000000)
at gnuradio_swig_py_general.cc:5593
#3 gr_py_feval_dd::calleval (this=0x2dc6fc0, x=887000000) at gnuradio_swig_py_general.cc:5605
#4 0x00007fabd77c4b6e in gr_noise_level_f::tune_window (this=0x2db3ca0,
target_freq=) at gr_noise_level_f.cc:97
#5 0x00007fabd77c554b in gr_noise_level_f::work (this=0x2db3ca0, noutput_items=7,
input_items=, output_items=)
at gr_noise_level_f.cc:115
#6 0x00007fabd7860714 in gr_sync_block::general_work (this=0x2db3ca0,
noutput_items=, ninput_items=,
input_items=, output_items=) at gr_sync_block.cc:64
#7 0x00007fabd7846ce4 in gr_block_executor::run_one_iteration (this=0x7fabbbffed90)
at gr_block_executor.cc:299
#8 0x00007fabd7864332 in gr_tpb_thread_body::gr_tpb_thread_body (this=0x7fabbbffed90, block=...)
at gr_tpb_thread_body.cc:49
#9 0x00007fabd785cce7 in operator() (function_obj_ptr=...) at gr_scheduler_tpb.cc:42
#10 operator() (function_obj_ptr=...)
at /home/tja/Research/energy/detector/gnuradio-3.3.0/gruel/src/include/gruel/thread_body_wrapper.h:49
#11 boost::detail::function::void_function_obj_invoker0, void>::invoke (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153
---Type to continue, or q to quit---
#12 0x00007fabd74914ef in operator() (this=)
at /usr/include/boost/function/function_template.hpp:1013
#13 boost::detail::thread_data >::run (this=)
at /usr/include/boost/thread/detail/thread.hpp:61
#14 0x00007fabd725ca55 in thread_proxy () from /usr/lib64/libboost_thread-mt.so.1.44.0
#15 0x00000036b3e06d5b in start_thread () from /lib64/libpthread.so.0
#16 0x00000036b3ae4a7d in clone () from /lib64/libc.so.6
(gdb)
Thanks for looking!
Python expects a certain amount of initialization to be done by the main thread before anything attempts to call back in from a subthread.
If the main thread is an application that is embedding Python, then it should call PyEval_InitThreads() immediately after calling Py_Initialize().
If the main thread is instead the Python interpreter itself (as seems to be the case here), then the module using the multithreaded extension module should include an "import threading" early to ensure that PyEval_InitThreads() is called correctly before any subthreads are spawned.
I ran into this exact problem as well. The documentation for anything relating to threads in CPython is unfortunately patchy at best.
Essentially, you need to do the following:
In your main thread, BEFORE any other threads are spawned, you need to call PyEval_InitThreads(). A good place to do this is right after you call PyInitialize().
Now, PyEval_InitThreads() not only initializes the Python interpreter thread-state, it also implicitly acquires the Global Interpreter Lock. This means, you need to release the lock before you call PyGILEnsure_State() in some other thread, otherwise your program will hang. You can do this with the function PyEval_ReleaseLock().
So basically, in your main thread, before any other threads are launched, you want to say:
PyInitialize();
PyEval_InitThreads();
PyEval_ReleaseLock();
Then, in any additional thread, anytime you use the Python API you need to say:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* ... some code that does things with Python ... */
PyGILState_Release(gstate);