Python, Threads, the GIL, and C++

Python, Threads, the GIL, and C++ - c++

Is there some way to make boost::python control the Python GIL for every interaction with python?
I am writing a project with boost::python. I am trying to write a C++ wrapper for an external library, and control the C++ library with python scripts. I cannot change the external library, only my wrapper program. (I am writing a functional testing application for said external library)
The external library is written in C and uses function pointers and callbacks to do a lot of heavy lifting. Its a messaging system, so when a message comes in, a callback function gets called, for example.
I implemented an observer pattern in my library so that multiple objects could listen to one callback. I have all the major players exported properly and I can control things very well up to a certain point.
The external library creates threads to handle messages, send messages, processing, etc. Some of these callbacks might be called from different processes, and I recently found out that python is not thread safe.
These observers can be defined in python, so I need to be able to call into python and python needs to call into my program at any point.
I setup the object and observer like so
class TestObserver( MyLib.ConnectionObserver ):
def receivedMsg( self, msg ):
print("Received a message!")
ob = TestObserver()
cnx = MyLib.Conection()
cnx.attachObserver( ob )
Then I create a source to send to the connection and the receivedMsg function is called.
So a regular source.send('msg') will go into my C++ app, go to the C library, which will send the message, the connection will get it, then call the callback, which goes back into my C++ library and the connection tries to notify all observers, which at this point is the python class here, so it calls that method.
And of course the callback is called from the connection thread, not the main application thread.
Yesterday everything was crashing, I could not send 1 message. Then after digging around in the Cplusplus-sig archives I learned about the GIL and a couple of nifty functions to lock things up.
So my C++ python wrapper for my observer class looks like this now
struct IConnectionObserver_wrapper : Observers::IConnectionObserver, wrapper<Observers::IConnectionObserver>
{
void receivedMsg( const Message* msg )
{
PyGILState_STATE gstate = PyGILState_Ensure();
if( override receivedMsg_func = this->get_override( "receivedMsg" ) )
receivedMsg_func( msg );
Observers::IConnectionObserver::receivedMsg( msg );
PyGILState_Release( gstate );
}
}
And that WORKS, however, when I try to send over 250 messages like so
for i in range(250)
source.send('msg")
it crashes again. With the same message and symptoms that it has before,
PyThreadState_Get: no current thread
so I am thinking that this time I have a problem calling into my C++ app, rather then calling into python.
My question is, is there some way to make boost::python handle the GIL itself for every interaction with python? I can not find anything in the code, and its really hard trying to find where the source.send call enters boost_python :(

I found a really obscure post on the mailing list that said to use
PyEval_InitThreads();
in BOOST_PYTHON_MODULE
and that actually seemed to stop the crashes.
Its still a crap shoot whether it the program reports all the messages it got or not. If i send 2000, most of the time it says it got 2000, but sometimes it reports significantly less.
I suspect this might be due to the threads accessing my counter at the same time, so I am answering this question because that is a different problem.
To fix just do
BOOST_PYTHON_MODULE(MyLib)
{
PyEval_InitThreads();
class_ stuff

Don't know about your problem exactly, but take a look at CallPolicies:
http://www.boost.org/doc/libs/1_37_0/libs/python/doc/v2/CallPolicies.html#CallPolicies-concept
You can define new call policies (one call policy is "return_internal_reference" for instance) that will execute some code before and/or after the wrapped C++ function is executed. I have successfully implemented a call policy to automatically release the GIL before executing a C++ wrapped function and acquiring it again before returning to Python, so I can write code like this:
.def( "long_operation", &long_operation, release_gil<>() );
A call policy might help you in writing this code more easily.

I think the best approach is to avoid the GIL and ensure your interaction with python is single-threaded.
I'm designing a boost.python based test tool at the moment and think I'll probably use a producer/consumer queue to dispatch events from the multi-threaded libraries which will be read sequentially by the python thread.

Related

Injecting dll before windows executes target TLS callbacks

There's an app that uses TLS callbacks to remap its memory using (NtCreateSection/NtUnmapViewOfSection/NtMapViewOfSection) using the SEC_NO_CHANGE flag.
Is there any way to hook NtCreateSection before the target app use it on its TLS callback?

You could use API Monitor to check if it is really that function call and if I understand you correctly you want to modify its invocation. API Monitor allows you to modify the parameters on the fly. If just "patching" the value when the application accesses the api is enough you could than use x64dbg to craft a persistent binary patch for your application. But this requires you to at least know or get familiar with basic x64/x86 assembler.

I have no idea what you're trying to achieve exactly but if you're trying to execute setup code before the main() function is called (to setup hooks), you could use the constructor on a static object. You would basically construct an object before your main program starts.
// In a .cpp file (do not put in a header as that would create multiple static objects!)
class StaticIntitializer {
StaticIntitializer(){
std::cout << "This will run before your main function...\n";
/* This is where you would setup all your hooks */
}
};
static StaticInitializer staticInitializer;
Beware though, as any object constructed this way might get constructed in any order depending on compilers, files order, etc. Also, some things might not be initialized yet and you might not be able to achieve what you want to setup.
That might be a good starting point, but as I said, I'm not sure exactly what you're trying to achieve here, so good luck and I hope it helps a little.

Easiest way to add immediate task to a `CFRunLoop`?

What is the easiest way to add an immediate one-time task to a CFRunLoop from a C/C++ program, that is, a callback which must be invoked by the run-loop before it blocks again.
According to the documentation, we have CFRunLoopPerformBlock(), but the problem with it, is that it uses the block-notation which requires Objective-C compilation mode.
Is there something similar to CFRunLoopPerformBlock() which is available to a C/C++ program, or am I forced to use a zero-delay timer?

The block language feature does not require the use of Objective-C. It's also supported in C and C++ by Clang. So, you can go ahead and use CFRunLoopPerformBlock().
If you're still looking for alternatives and you wish to target the main thread's run loop (i.e. the main run loop), you can use dispatch_async_f(). Although it's most common to use the block-based functions when using GCD, the functions with the _f suffix take function pointers.
static void my_task_function(void *context)
{
// ...
}
...
dispatch_async_f(dispatch_get_main_queue(), any_pointer_you_like, my_task_function);

C++: repeated call of system()

I need some help with external program call from c++ code.
I have to call javap.exe (from JDK package) from my program many times (probably more than 100), but call system("javap.exe some_parameters") is extremely slow. It's work so good for one set of parameters but repeated calls of system() not acceptable. I think it is only because of costs to access the hard disk and application run (but I'm not sure).
What can I do for better performance? Can I "save javap.exe in RAM" and call it "directly".
Or may be somebody knows how can I get java-class description and methods signature without javap.exe?

The Java VM is not cheap to start running, and it's likely that its initialization is eating up the lion's share of your time. Luckily, the functionality of javap is available directly through Java code. I suggest that you write a small Java application which, while similar to javap, does with one invocation what you would otherwise need thousands for. (Though... maybe you could already use just one? javap will take multiple class files, after all...)

Calling system() is easy, but very inefficient, primarily because you are not just launching whatever program you specify. Rather, you are launching one process (a shell), and that shell will examine your parameter and launch a second process.
If you're on a system that supports fork() and exec*(), you're going to improve performance by using them instead. As a pseudo-code example, consider:
void replace_system(const char *command)
{
pid_t child = fork();
if (child < 0) {
perror("fork:");
return;
}
if (child) {
/* this is the parent, wait for the child to finish */
while (waitpid(child, &status, options) <= 0);
return;
}
/* this is the new process */
exec*(...);
perror("failed to start the child");
exit(-1);
}
Choose one of the exec* functions based on how you want to arrange the parameters. You'll need to break your string of arguments into components, and possibly provide an environment of your liking. Once you call the exec* function, that function will never return (unless there is an error starting the command you've defined for it).
Beyond performance considerations, another reason to use this is, if desired, it allows you to modify the child's standard paths. For example, you might be interested in the output of a child; if you modify its stdout to be a pipe available to you, you can simply read what it prints. Research source code for the standard popen() call to find an example of this.

How to wrap multithreaded C++ library using Python C/API?

This is a somewhat long question, but I hope I can express it clearly.
I am trying to wrap a C++ library using Python/C API. The main library, say, mylib, has its own object system (it is something like an interpreter for another language ) and uniquely identifies each object in its environment by an Id. It creates multiple threads in its init() function and does different things on different threads (say creating objects on one thread and interpreting commands in another thread).
Now I am trying to wrap it in two levels:
I created a Dummy class with the Id of an object in mylib. The Dummy constructor actually calls a function in mylib to create a new object and store its Id. Other methods in Dummy class similarly call equivalent functions in mylib. This does not use Python/C API.
I created mylibmodule.cpp, which uses the Python/C API to provide the functions that will be called from the Python interpreter.
I call the init() function of mylib in PyMODINIT_FUNC init_mylib().
I code functions like :
static PyObject * py_new_Dummy(PyObject* self, PyObject *args){
// ... process arguments
return reinterpret_cast<PyObject*>(new Dummy);
}
Note that the Dummy constructor does call functions in mylib that are executed on threads created by using pthreads.
I compile this into _mylib.so and I have a mylib.py:
import _mylib
class MyClass(obj):
def __init__(self, *args)
self.__ptr = _mylib.py_new_Dummy()
Now to the actual problem: I can import mylib in the Python interpreter, but as soon as I try:
a = MyClass(some_args)
I get a segmentation fault. A gdb backtrace shows
Program received signal SIGSEGV, Segmentation fault.
__pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
Even funnier is that if I disable spawning multiple threads in the mylib code (still linked with pthreads), I can create MyClass instances, but I get a segmentation violation at exit from the Python interpreter.
The "Thin Ice" section in the Python documentation (http://docs.python.org/extending/) did not enlighten me. I am wondering if I should use PyGILState_Ensure and PyGILState_Release around all Python C/API calls in mylibmodule.cpp. Or should it be Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS?
Can anybody help? Is there any definitive documentation on how exactly Python plays with pthreads?

From your description it doesn't really sound like a threading issue at all: you claim you define the Dummy class without using the Python API, but that would mean Dummy instances are not PyObjects, so the reinterpret_cast will do the wrong thing. You can't create PyObjects by just instantiating a C++ class; you need to play along with Python's object system and create a proper PyType struct and a PyObject struct and properly initialize both. You also need to make sure your refcounts are correct.
Once you have that sorted, the main thing to remember about threads is that any call that touches Python objects or that uses any of the Python API (except the functions to grab the GIL) must have the GIL acquired. If any of the threads in your C++ library try to call back to Python code or touch Python objects, the access needs to be wrapped in PyGILState_Ensure/PyGILState_Release.

Thank you Thomas for pointing out the red herring. The problem was in the initialization of the threads in the C++ side.
And yes, it did not need any GIL manipulation as the none of the additional C++ threads were accessing Python C/API.

How to trap System.exit() in code called from JNI

I'm writing a C interface to a java library that calls System.exit(). I call:
/* Calls the main method for the class */
printf("about to call main\n");
(*env)->CallStaticVoidMethod(env, mainClass, mainMethod, args);
printf("returning from main\n");
I (unfortunately) don't have the option of changing the library, but I'd still like for the JVM to return control back to the C calling function (so I can do various cleanup tasks, etc..). Is there a way to get JNI to do that, or am I SOL?
Thanks,

You do not need bytecode editing for so simple case, a lot of security handling is implemented in the good old java.
System.setSecurityManager(SecurityManager) throw some Error (like ThreadDeath) in checkExit() and assuming System.exit(int) [erm Runtime.getRuntime().exit(int)] is invoked in the same thread, it should do it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python, Threads, the GIL, and C++ - c++

Related

Injecting dll before windows executes target TLS callbacks

Easiest way to add immediate task to a `CFRunLoop`?

C++: repeated call of system()

How to wrap multithreaded C++ library using Python C/API?

How to trap System.exit() in code called from JNI

Categories

Resources