How to wrap multithreaded C++ library using Python C/API? - c++

This is a somewhat long question, but I hope I can express it clearly.
I am trying to wrap a C++ library using Python/C API. The main library, say, mylib, has its own object system (it is something like an interpreter for another language ) and uniquely identifies each object in its environment by an Id. It creates multiple threads in its init() function and does different things on different threads (say creating objects on one thread and interpreting commands in another thread).
Now I am trying to wrap it in two levels:
I created a Dummy class with the Id of an object in mylib. The Dummy constructor actually calls a function in mylib to create a new object and store its Id. Other methods in Dummy class similarly call equivalent functions in mylib. This does not use Python/C API.
I created mylibmodule.cpp, which uses the Python/C API to provide the functions that will be called from the Python interpreter.
I call the init() function of mylib in PyMODINIT_FUNC init_mylib().
I code functions like :
static PyObject * py_new_Dummy(PyObject* self, PyObject *args){
// ... process arguments
return reinterpret_cast<PyObject*>(new Dummy);
}
Note that the Dummy constructor does call functions in mylib that are executed on threads created by using pthreads.
I compile this into _mylib.so and I have a mylib.py:
import _mylib
class MyClass(obj):
def __init__(self, *args)
self.__ptr = _mylib.py_new_Dummy()
Now to the actual problem: I can import mylib in the Python interpreter, but as soon as I try:
a = MyClass(some_args)
I get a segmentation fault. A gdb backtrace shows
Program received signal SIGSEGV, Segmentation fault.
__pthread_mutex_lock (mutex=0x0) at pthread_mutex_lock.c:50
Even funnier is that if I disable spawning multiple threads in the mylib code (still linked with pthreads), I can create MyClass instances, but I get a segmentation violation at exit from the Python interpreter.
The "Thin Ice" section in the Python documentation (http://docs.python.org/extending/) did not enlighten me. I am wondering if I should use PyGILState_Ensure and PyGILState_Release around all Python C/API calls in mylibmodule.cpp. Or should it be Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS?
Can anybody help? Is there any definitive documentation on how exactly Python plays with pthreads?

From your description it doesn't really sound like a threading issue at all: you claim you define the Dummy class without using the Python API, but that would mean Dummy instances are not PyObjects, so the reinterpret_cast will do the wrong thing. You can't create PyObjects by just instantiating a C++ class; you need to play along with Python's object system and create a proper PyType struct and a PyObject struct and properly initialize both. You also need to make sure your refcounts are correct.
Once you have that sorted, the main thing to remember about threads is that any call that touches Python objects or that uses any of the Python API (except the functions to grab the GIL) must have the GIL acquired. If any of the threads in your C++ library try to call back to Python code or touch Python objects, the access needs to be wrapped in PyGILState_Ensure/PyGILState_Release.

Thank you Thomas for pointing out the red herring. The problem was in the initialization of the threads in the C++ side.
And yes, it did not need any GIL manipulation as the none of the additional C++ threads were accessing Python C/API.

Related

Calling a python 2.7 function in c++ using the default api?

Say I have a function
def pyfunc():
print("ayy lmao")
return 4
and I want to call it in c++
int j = (int)python.pyfunc();
how exactly would I do that?
You might want to have a look into this:https://docs.python.org/2/extending/extending.html
In order to call a Python function from C++, you have to embed Python
in your C++ application. To do this, you have to:
Load the Python DLL. How you do this is system dependent:
LoadLibrary under Windows, dlopen under Unix. If the Python DLL is
in the usual path you use for DLLs (%path% under Windows,
LD_LIBRARY_PATH under Unix), this will happen automatically if you try
calling any function in the Python C interface. Manual loading will
give you more control with regards to version, etc.
Once the library has been loaded, you have to call the function
Py_Initialize() to initialize it. You may want to call
Py_SetProgramName() or Py_SetPythonHome() first to establish the
environment.
Your function is in a module, so you'll have to load that:
PyImport_ImportModule. If the module isn't in the standard path,
you'll have to add its location to sys.path: use
PyImport_ImportModule to get the module "sys", then
PyObject_GetAttrString to get the attribute "path". The path
attribute is a list, so you can use any of the list functions to add
whatever is needed to it.
Your function is an attribute of the module, so you use
PyObject_GetAttrString on the module to get an instance of the
function. Once you've got that, you pack the arguments into a tuple or
a dictionary (for keyword arguments), and use PyObject_Call to call
it.
All of the functions, and everything that is necessary, is documented
(extremely well, in fact) in https://docs.python.org/2/c-api/. You'll
be particularly interested in the sections on "Embedding Python" and
"Importing Modules", along with the more general utilities ("Object
Protocol", etc.). You'll also need to understand the general principles
with regards to how the Python/C API works—things like reference
counting and borrowed vs. owned references; you'll probably want to read
all of the sections in the Introduction first.
And of course, despite the overall quality of the documentation, it's
not perfect. A couple of times, I've had to plunge into the Python
sources to figure out what was going on. (Typically, when I'm getting
an error back from Python, to find out what it's actually complaining
about.)

how to share a lib between process and called script subprocess using SWIG?

I have a C++ program foobar which starts with main() and then the flow of control goes through a first part, then the second part of the program. If I change main to foobar_main, I can then compile the whole program and a SWIG Python wrapper to a shared library foobar.so, and import this to Python, call foobar_main from within Python and everything works fine.
The second part communicates with the first one by some respectable C++ constructs. Specifically: the first part creates some single objects of some classes, and the second part uses class static methods to get those objects.
Now I want to run only the first part from main() and the second part from Python. That is, I want to start the C++ program foobar and then after the first part is finished, run a Python script (programmatically from within C++) that continues with the second part.
To do this, I:
compile the second part and a SWIG wrapper to foobar2.so
replace the second part of C++ code with system("python foobar2.py")
compile the modified C++ program to foobar1.so and load to foobar
write the script foobar2.py which imports foobar1 and foobar2 and then equivalent to the second part
Then I attempt to run foobar. It does not work, because it appears, that the routines in the second part complain that certain steps which should have been done in the first part, are not done.
This is embarasing but obviously I have some deep flaws here in my understanding of how computers work :) Can somebody clue me in what I am missing, including possibly simplifying the above process?
I'm going to assume your C++ code looks like this:
void part1()
{}
void part2()
{}
int main()
{
part1();
part2();
}
And that you have a Python version of part2() that is implemented with some other wrapped C++ functions. If these assumptions are wrong let me know.
I think the easiest way to go is to wrap part1() along with the other wrapped part2-related functions, then have a Python script like this:
import foobar
foobar.part1()
py_part2()
This of course means that the program starts in Python. If you need to start a C++ program for some reason (i.e. you need main()) then in order to use py_part2() you'll have to embed the Python interpreter inside your C++ program. This is a much more difficult and involved process, this answer has good info about how to get started.
Since you're learning I'll explain why system("python foobar2.py") doesn't work. In this scheme you have your C++ program start another process (program), named python, and then wait for it to finish. These are two completely different programs that in your case don't talk to each other and don't share anything in common. Hence why it doesn't work.
In general, reconsider anything that involves system. Its primary use seems to be to point out beginner programmers.

Generate a List/Set by preprocessor/template from different modules and has to be filled before linking

I have some global variables (classes) that as I understand have created before main() call. I need some technique to add data to a List of something either global or as member of that class from any part of my code and it have to be filled before Linker. To use it inside globally created classes. Is it possible? I did not find any similar solution or even question, so any help will be appreciated.
Detailed example:
I've have singleton class which has a variable that was filled in the constructor. All other classes use data from the class at run time someone early someone later. Singleton class is created at the begin of main function. I am using singleton in different projects, data that it has mostly universal for all projects and modules except special flag which should show requirement for this module. If singleton throw exception program exits and we know it at the first steps of execution because singleton creates at first lines of the main. So singleton has a lot of universal data for all projects but I need to add flags to point for singleton which data is required for each module. I have created template for main so for each project I have the same main with just simple define PROJECT_NAME later I add .cpp to the project and they have to register which data is required and which not.
This explanation not looks like perfect, if it is not understandable don't hesitate to say it I will organize it more deliberately.
Edited:
I'm still looking for solution in the Internet, and found constexpr it looks like what I need, but at this moment it is not supported by all compilers, is there some workaround ?
First step is to not use global variables, but instead use static variables within a global function. The first time the function is called the variable will be initialized and then have the function return a reference to the varaible.
my_class& instnace() {
static my_class i;
return i;
}
A list example:
std::list<my_class>& global_list() {
static std::list<my_class> m; return m;
}
static int x = []()->int{ global_list().push_back(my_class); return 0;}();
If you want to accomplish what you want from different modules , one of the way is to have things done is some sort of callback function when your dll is loaded.
One windows it is DLLMain.
An optional entry point into a dynamic-link library (DLL). When the
system starts or terminates a process or thread, it calls the
entry-point function for each loaded DLL using the first thread of the
process. The system also calls the entry-point function for a DLL when
it is loaded or unloaded using the LoadLibrary and FreeLibrary
functions.
On other platforms with gcc you can get by the [attribute constructor]
constructor The constructor attribute causes the function to be called
automatically before execution enters main (). Similarly, the
destructor attribute causes the function to be called automatically
after main () has completed or exit () has been called. Functions with
these attributes are useful for initializing data that will be used
implicitly during the execution of the program. These attributes are
not currently implemented for Objective-C.
Warning
It is very easy to shoot yourself in the foot with this method. At least on windows that are lot of things you cannot do in the dll entry point. Please read your compiler/os documentation to see what the limitations are.

GObject warning cannot register exisiting type

I'm a GStreamer user/programmer but I had never use GLib directly. Recenty I decided to use GLib for building a simple GObject and take advantages of signal implementation. (I'm a Windows programmer)
I have develop a simple static library with the GObject definition, implementation. The main app link statically with this library and link dinamically with other library linked statically with the first one too.
If I call
DummyObj *dummy = (DummtyObj *) g_object_new(DUMMY_OBJ_TYPE, NULL);
from the main app it works, but if inside the dynamic library I try to build a DummyObj instance with the same function, it fails, in the output I can read
cannot register exisiting type ...
g_once_init_leave_ assertion 'initialization_value != 0' failed
g_object_new: assertion 'G_TYPE_IS_OBJECT (object_tye)' failed
Instead, if is the dynamic library the first one to call
DummyObj *dummy = (DummtyObj *) g_object_new(DUMMY_OBJ_TYPE, NULL);
after if the main app call this function it fails with the same error.
Is like if the first context? that initialize the object is the only one that can create instances of this kind of objects.
I'm a little bit confused about this. In GStreamer I can create new plugins in my main app, inside other plugins, dynamic libraries, I have never see these errors
I hope I have explained well, the english is not my native language and I think that the issue is not easy to explain.
Thanks a lot
It seems that first call to g_object_new in every context try to register the TYPE in a hash_table. The first one can register the TYPE but the second always fail with the same error. Looking the code I'm not able to detect why the second call try to register again the type... the function check_type_name_I in gtype.c fails but I don't know why g_type_register_static is call in both cases.
Before glib 2.32 you were required to initialize the thread system (used by the g_once_... family functions) by calling once (and only once) g_thread_init(). Furthermore, before glib 2.36 you had to initialize the type system with g_type_init().
Knowing that g_type_init():
internally calls g_thread_init by itself, protecting from multiple calls by checking g_thread_get_initialized() on glib < 2.32;
resolves to a nop function on glib >= 2.36;
I think that you can solve your issue in a backward compatible way by just calling g_type_init() at startup.

Python, Threads, the GIL, and C++

Is there some way to make boost::python control the Python GIL for every interaction with python?
I am writing a project with boost::python. I am trying to write a C++ wrapper for an external library, and control the C++ library with python scripts. I cannot change the external library, only my wrapper program. (I am writing a functional testing application for said external library)
The external library is written in C and uses function pointers and callbacks to do a lot of heavy lifting. Its a messaging system, so when a message comes in, a callback function gets called, for example.
I implemented an observer pattern in my library so that multiple objects could listen to one callback. I have all the major players exported properly and I can control things very well up to a certain point.
The external library creates threads to handle messages, send messages, processing, etc. Some of these callbacks might be called from different processes, and I recently found out that python is not thread safe.
These observers can be defined in python, so I need to be able to call into python and python needs to call into my program at any point.
I setup the object and observer like so
class TestObserver( MyLib.ConnectionObserver ):
def receivedMsg( self, msg ):
print("Received a message!")
ob = TestObserver()
cnx = MyLib.Conection()
cnx.attachObserver( ob )
Then I create a source to send to the connection and the receivedMsg function is called.
So a regular source.send('msg') will go into my C++ app, go to the C library, which will send the message, the connection will get it, then call the callback, which goes back into my C++ library and the connection tries to notify all observers, which at this point is the python class here, so it calls that method.
And of course the callback is called from the connection thread, not the main application thread.
Yesterday everything was crashing, I could not send 1 message. Then after digging around in the Cplusplus-sig archives I learned about the GIL and a couple of nifty functions to lock things up.
So my C++ python wrapper for my observer class looks like this now
struct IConnectionObserver_wrapper : Observers::IConnectionObserver, wrapper<Observers::IConnectionObserver>
{
void receivedMsg( const Message* msg )
{
PyGILState_STATE gstate = PyGILState_Ensure();
if( override receivedMsg_func = this->get_override( "receivedMsg" ) )
receivedMsg_func( msg );
Observers::IConnectionObserver::receivedMsg( msg );
PyGILState_Release( gstate );
}
}
And that WORKS, however, when I try to send over 250 messages like so
for i in range(250)
source.send('msg")
it crashes again. With the same message and symptoms that it has before,
PyThreadState_Get: no current thread
so I am thinking that this time I have a problem calling into my C++ app, rather then calling into python.
My question is, is there some way to make boost::python handle the GIL itself for every interaction with python? I can not find anything in the code, and its really hard trying to find where the source.send call enters boost_python :(
I found a really obscure post on the mailing list that said to use
PyEval_InitThreads();
in BOOST_PYTHON_MODULE
and that actually seemed to stop the crashes.
Its still a crap shoot whether it the program reports all the messages it got or not. If i send 2000, most of the time it says it got 2000, but sometimes it reports significantly less.
I suspect this might be due to the threads accessing my counter at the same time, so I am answering this question because that is a different problem.
To fix just do
BOOST_PYTHON_MODULE(MyLib)
{
PyEval_InitThreads();
class_ stuff
Don't know about your problem exactly, but take a look at CallPolicies:
http://www.boost.org/doc/libs/1_37_0/libs/python/doc/v2/CallPolicies.html#CallPolicies-concept
You can define new call policies (one call policy is "return_internal_reference" for instance) that will execute some code before and/or after the wrapped C++ function is executed. I have successfully implemented a call policy to automatically release the GIL before executing a C++ wrapped function and acquiring it again before returning to Python, so I can write code like this:
.def( "long_operation", &long_operation, release_gil<>() );
A call policy might help you in writing this code more easily.
I think the best approach is to avoid the GIL and ensure your interaction with python is single-threaded.
I'm designing a boost.python based test tool at the moment and think I'll probably use a producer/consumer queue to dispatch events from the multi-threaded libraries which will be read sequentially by the python thread.