Program crashing with embedded Python/C++ code across DLL boundary in Windows - c++

Sorry for the long post. I've searched around quite a bit and couldn't find an answer for this so here it goes:
I am working developing a Python extension library using C++ (BoostPython). For testing, we have a Python-based test harness but I also want to add a separate C++ executable (eg. using BoostUnitTest or similar) to add further testing of the library including testing of the functionality that is not directly exposed to Python.
I am currently running this in Linux without problems. I am building the library and this then is dynamically linked to an executable that uses BoostUnitTest. Everything compiles and runs as expected.
In Windows though, I'm having problems. I think it might a problem with the registering of the C++->Python type converters across DLL boundaries.
To show the problem I have the following example:
In my library I have defined:
namespace bp = boost::python;
namespace bn = boost::numpy;
class DLL_API DummyClass
{
public:
static std::shared_ptr<DummyClass> Create()
{
return std::make_shared<DummyClass>();
}
static void RegisterPythonBindings();
};
void DummyClass::RegisterPythonBindings()
{
bp::class_<DummyClass>("DummyClass", bp::init<>())
;
bp::register_ptr_to_python< std::shared_ptr<DummyClass> >();
}
where DLL_API is the usual _declspec(…) for Windows. The idea is that this dummy class would be exported as part of a bigger Python module with
BOOST_PYTHON_MODULE(module)
{
DummyClass::RegisterPythonBindings();
}
From within the executable linking to the library I have (omitting includes, etc):
void main()
{
Py_Initialize();
DummyClass::RegisterPythonBindings();
auto myDummy = DummyClass::Create();
auto dummyObj = bp::object( myDummy );
}
The last line where I wrap myDummy within a boost::python::object crashes with an unhandled exception in Windows. The exception is being thrown from Python (throw_error_already_set). I believe (but could be wrong) that it is not finding an appropriate converter of the C++ type to Python, even though I made the call to register the bindings.
KernelBase.dll!000007fefd91a06d()
msvcr110.dll!000007fef7bde92c()
TestFromMain.exe!boost::python::throw_error_already_set(void)
TestFromMain.exe!boost::python::converter::registration::to_python(void const volatile *)
TestFromMain.exe!boost::python::converter::detail::arg_to_python_base::arg_to_python_base(void const volatile *,struct boost::python::converter::registration const &)
TestFromMain.exe!main() Line 66
TestFromMain.exe!__tmainCRTStartup()
kernel32.dll!0000000077a259cd()
ntdll.dll!0000000077b5a561()
As a test, I copied the exact same code defining the DummyClass all inside the executable just before the main function, instead of linking to the dll, and this works as expected.
Is my model of compiling as a DLL using embedded python in both sides of the boundary even possible in Windows (this is only used for a testing harness so I’d always use the exact same toolchain all over).
Thanks very much.

In case anyone ever reads this again, the solution in Windows was to compile Boost as dynamic libraries and link everything dynamically. We had to change the structure of our code a bit, but it now works.
There is a (small) reference in the Boost documentation stating that in Windows the dynamic lib version of Boost has one common register of types used for conversion between Python/C+. The doc doesn't mention not having a common register for the static lib version (but I now know it doesn't work).

Related

Can I include a DLL generated by GCC in a MSVC project?

I have a library of code I'm working on upgrading from x86 to x64 for a Windows application.
Part of the code took advantage of MSVC inline assembly blocks. I'm not looking to go through and interpret the assembly but I am looking to keep functionality from this part of the application.
Can I compile the functions using the inline assembly using GCC to make a DLL and link that to the rest of the library?
EDIT 1:(7/7/21) The flexibility with which compiler the project uses is open and I am currently looking into using Clang for use with MSVC.(also the Intel C++ compiler as another possibility) As stated in the first sentence it is a Windows application that I want to keep on Windows and the purpose of using another compiler is due to me 1.) not wanting to rewrite the large amount of assembly and 2.) because I know that MSVC does not support x64 inline assembly. So far clang seems to be working with a couple issues of how it declares comments inside of the assembly block and a few commands. The function is built around doing mathematical operations on a block of data, in what was supposed to be as fast as possible when it was developed but now that it works as intended I'm not looking to upgrade just maintain functionality. So, any compiler that will support inline assembly is an option.
EDIT 2:(7/7/21) I forgot to mention in the first edit, I'm not necessarily looking to load the 32-bit DLL into another process because I'm worried about copying data into an out of shared memory. I've done a similar solution for another project but the data set is around 8 MB and I'm worried that slow copy times for the function would cause the time constraint on the math to cause issues in the runtime of the application.(slow, laggy, and buffering are effects I'm trying to avoid.) I'm not trying to make it any faster but it definitely can't get any slower.
In theory, if you manage to create a plain C interface for that DLL (all exported symbols from DLL are standard C functions) and don't use memory management functions across "border" (no mixed memory management) then you should be able to dynamically load that DLL from another another (MSVC) process and call its functions, at least.
Not sure about statically linking against it... probably not, because the compiler and linker must go hand in hand (MSVC compiler+MSVC linker or GCC compiler+GCC linker) . The output of GCC linker is probably not compatible with MSVC at least regarding name mangling.
Here is how I would structure it (without small details):
Header.h (separate header to be included in both DLL and EXE)
//... remember to use your preferred calling convention but be consistent about it
struc Interface{
void (*func0)();
void (*func1)(int);
//...
};
typedef Interface* (*GetInterface)();
DLL (gcc)
#include "Header.h"
//functions implementing specific functionality (not exported)
void f0)(){/*...*/}
void f1)(int){/*...*/}
//...
Interface* getInterface(){//this must be exported from DLL (compiler specific)
static Interface interface;
//initialize functions pointers from interface with corresponding functions
interface.func0 = &f0;
interface.func1 = &f1;
//...
return &interface;
}
EXE (MSVC)
#include "Header.h"
int main(){
auto dll = LoadLibrary("DLL.dll");
auto getDllInterface = (GetInstance)GetProcAddress(dll, "getInterface");
auto* dllInterface = getDllInterface();
dllInterface->func0();
dllInterface->func1(123);
//...
return 0;
}

How to dynamically register class in a factory class at runtime period with c++

Now, I implemented a factory class to dynamically create class with a idenification string, please see the following code:
void IOFactory::registerIO()
{
Register("NDAM9020", []() -> IOBase * {
return new NDAM9020();
});
Register("BK5120", []() -> IOBase * {
return new BK5120();
});
}
std::unique_ptr<IOBase> IOFactory::createIO(std::string ioDeviceName)
{
std::unique_ptr<IOBase> io = createObject(ioDeviceName);
return io;
}
So we can create the IO class with the registered name:
IOFactory ioFactory;
auto io = ioFactory.createIO("BK5120");
The problem with this method is if we add another IO component, we should add another register code in registerIO function and compile the whole project again. So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
io_factory.conf
------------------
NDAM9020:NDAM9020
BK5120:BK5120
------------------
The first is identification name and the second is class name.
I have tried with Macros, but the parameter in Macros cann't be string. So I was wondering if there is some other ways. Thanks for advance.
Update:
I didn't expect so many comments and answers, Thank you all and sorry for replying late.
Our current OS is Ubuntu16.04 and we use the builtin compiler that is gcc/g++5.4.0, and we use CMake to manage the build.
And I should mention that it is not a must that I should register class at runtime period, it is also OK if there is a way to do this in compile period. What I want is just avoiding the recompiling when I want to register another class.
So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
No. As of C++20, C++ has no reflection features allowing it. But you could do it at compile time by generating a simple C++ implementation file from your configuration file.
How to dynamically register class in a factory class at runtime period with c++
Read much more about C++, at least a good C++ programming book and see a good C++ reference website, and later n3337, the C++11 standard. Read also the documentation of your C++ compiler (perhaps GCC or Clang), and, if you have one, of your operating system. If plugins are possible in your OS, you can register a factory function at runtime (by referring to to that function after a plugin providing it has been loaded). For examples, the Mozilla firefox browser or recent GCC compilers (e.g. GCC 10 with plugins enabled), or the fish shell, are doing this.
So I was wondering if I could dynamically register class from a configure file(see below) at runtime.
Most C++ programs are running under an operating system, such as Linux. Some operating systems provide a plugin mechanism. For Linux, see dlopen(3), dlsym(3), dlclose(3), dladdr(3) and the C++ dlopen mini-howto. For Windows, dive into its documentation.
So, with a recent C++ implementation and some recent operating systems, y ou can register at runtime a factory class (using plugins), and you could find libraries (e.g. Qt or POCO) to help you.
However, in pure standard C++, the set of translation units is statically known and plugins do not exist. So the set of functions, lambda-expressions, or classes in a given program is finite and does not change with time.
In pure C++, the set of valid function pointers, or the set of valid possible values for a given std::function variable, is finite. Anything else is undefined behavior. In practice, many real-life C++ programs accept plugins thru their operating systems or JIT-compiling libraries.
You could of course consider using JIT-compiling libraries such as asmjit or libgccjit or LLVM. They are implementation specific, so your code won't be portable.
On Linux, a lot of Qt or GTKmm applications (e.g. KDE, and most web browsers, e.g. Konqueror, Chrome, or Firefox) are coded in C++ and do load plugins with factory functions. Check with strace(1) and ltrace(1).
The Trident web browser of MicroSoft is rumored to be coded in C++ and probably accepts plugins.
I have tried with Macros, but the parameter in Macros can't be string.
A macro parameter can be stringized. And you could play x-macros tricks.
What I want is just avoiding the recompiling when I want to register another class.
On Ubuntu, I recommend accepting plugins in your program or library
Use dlopen(3) with an absolute file path; the plugin would typically be passed as a program option (like RefPerSys does, or like GCC does) and dlopen-ed at program or library initialization time. Practically speaking, you can have lots of plugins (dozen of thousands, see manydl.c and check with pmap(1) or proc(5)). The dlsym(3)-ed C++ functions in your plugins should be declared extern "C" to disable name mangling.
A single C++ file plugin (in yourplugin.cc) can be compiled with g++ -Wall -O -g -fPIC -shared yourplugin.cc -o yourplugin.so and later you would dlopen "./yourplugin.so" or an absolute path (or configure suitably your $LD_LIBRARY_PATH -see ld.so(8)- and pass "yourplugin.so" to dlopen). Be also aware of Rpath.
Consider also (after upgrading your GCC to GCC 9 at least, perhaps by compiling it from its source code) using libgccjit (it is faster than generating temporary C++ code in some file and compiling that file into a temporary plugin).
For ease of debugging your loaded plugins, you might be interested by Ian Taylor's libbacktrace.
Notice that your program's global symbols (declared as extern "C") can be accessed by name by passing a nullptr file path to dlopen(3), then using dlsym(3) on the obtained handle. You want to pass -rdynamic -ldl when linking your program (or your shared library).
What I want is just avoiding the recompiling when I want to register another class.
You might registering classes in a different translation unit (a short one, presumably). You could take inspiration from RefPerSys multiple #include-s of its generated/rps-name.hh file. Then you would simply recompile a single *.cc file and relink your entire program or library. Notice that Qt plays similar tricks in its moc, and I recommend taking inspiration from it.
Read also J.Pitrat's book on Artificial Beings: the Conscience of a Conscious Machine ISBN which explains why a metaprogramming approach is useful. Study the source code of GCC (or of RefPerSys), use or take inspiration from SWIG, ANTLR, GNU bison (they all generate C++ code) when relevant
You seem to have asked for more dynamism than you actually need. You want to avoid the factory itself having to be aware of all of the classes registered in it.
Well, that's doable without going all the way runtime code generation!
There are several implementations of such a factory; but I am obviously biased in favor of my own: einpoklum's Factory class (gist.github.com)
simple example of use:
#include "Factory.h"
// we now have:
//
// template<typename Key, typename BaseClass, typename... ConstructionArgs>
// class Factory;
//
#include <string>
struct Foo { Foo(int x) { }; }
struct Bar : Foo { Bar(int x) : Foo(x) { }; }
int main()
{
util::Factory<std::string, Foo, int> factory;
factory.registerClass<Bar>("key_for_bar");
auto* my_bar_ptr factory.produce("key_for_bar");
}
Notes:
The std::string is used as a key; you could have a factory with numeric values as keys instead, if you like.
All registered classes must be subclasses of the BaseClass value chosen for the factory. I believe you can change the factory to avoid that, but then you'll always be getting void *s from it.
You can wrap this in a singleton template to get a single, global, static-initialization-safe factory you can use from anywhere.
Now, if you load some plugin dynamically (see #BasileStarynkevitch's answer), you just need that plugin to expose an initialization function which makes registerClass() class calls on the factory; and call this initialization function right after loading the plugin. Or if you have a static-initialization safe singleton factory, you can make the registration calls in a static-block in your plugin shared library - but be careful with that, I'm not an expert on shared library loading.
Definetly YES!
Theres an old antique post from 2006 that solved my life for many years. The implementation runs arround having a centralized registry with a decentralized registration method that is expanded using a REGISTER_X macro, check it out:
https://web.archive.org/web/20100618122920/http://meat.net/2006/03/cpp-runtime-class-registration/
Have to admit that #einpoklum factory looks awesome also. I created a headeronly sample gist containing the code and a sample:
https://gist.github.com/h3r/5aa48ba37c374f03af25b9e5e0346a86

DLL fails to load if unused ref class is removed

I'm running into a very strange problem trying to compile and use a windows runtime component within an UWP application (VS2017 community 15.9.13 with NetCore.UniversalWindowsPlatform 6.2.8, compiled without /clr but with /ZW).
It is basically something like the Grayscaletransform. The runtime component is actually working as expected, now I wanted to remove some unused code. However, as soon as I remove it from a particular file and recompile, it indeed compiles, links, but the DLL does not load any more.
Here's some example code that I have to put in:
ref class DummyU sealed
{
public:
DummyU() {}
};
DummyU^ CreateDummyU()
{
return ref new DummyU();
}
The code just makes it work, although it is a) not referenced at all and b) does not do anything useful.
The result of removing it:
Exception thrown at 0x0EFF322F (vccorlib140d_app.dll) in TestAppUWP.exe: 0xC0000005: Access violation reading location 0x00000000.
in
STDAPI DllGetActivationFactory(_In_ HSTRING activatibleClassId, _Deref_out_ IActivationFactory** ppFactory)
{
return Platform::Details::GetActivationFactory(Microsoft::WRL::Details::ModuleBase::module_, activatibleClassId, ppFactory);
}
function in dllexports.cpp which is part of VS. The module_ becomes NULL.
Does anyone have an idea if there are any known bugs with respect to the windows runtime not being initialized/used properly if there is no explicit instantiation of a ref class in a file?
EDIT 1:
Here's the link to the full source code:
What's happening here is that you're mixing modes a bit. Since you've compiled your C++ code with the /CX flag, you've told the compiler to enable winrt extensions to produce a WinRT DLL. In practice though, none of your code is actually using the CX extensions to implement classes. You're using WRL and standards C++. The compiler looks at the DLL, finds no CX-style WinRT classes, and doesn't set up the module, accordingly.
This is basically an untested & unsupported case, since you've chosen to say that you want to expose ref classes by picking a UWP component library project type, but then didn't actually provide any ref classes. It happens that under the hood, /CX effectively uses WRL, so you can nudge it along and initialize the state to work correctly, but you're kinda hacking the implementation details of the system.
There are two options I would recommend, either works: just make the project a non-CX Win32 DLL and init the module as described above. Or, better yet, flip over to C++ /WinRT, which will give you better support for the WinRT types than /CX and allow you to more easily mix in the classic COM types in your implementation. You can get started by just turning off the /CX flag in the compiler switches, then start updating the code accordingly.
Ben
You might have wrong .winmd file for your component. WinRT components made in C++ produce two outputs, dll and winmd. Both must match. It's possible you have them from different builds.
Another possible reason is error in manifest. The manifest of the app must include all referenced runtime components.
BTW, for native DLLs written in classic C++ and exposing C API, deployment is simpler, you include a DLL in the package and they just work, with [DllImport] if you're consuming them from C#.
Update: You can replace that ref class with the following code, works on my PC.
struct ModuleStaticInitialize
{
ModuleStaticInitialize()
{
Microsoft::WRL::Module<Microsoft::WRL::InProc>::GetModule();
}
};
static ModuleStaticInitialize s_moduleInit;
Probably a bug in Microsoft's runtime somewhere.

How does it work and compile a C++ extension of TCL with a Macro and no main function

I have a working set of TCL script plus C++ extension but I dont know exactly how it works and how was it compiled. I am using gcc and linux Arch.
It works as follows: when we execute the test.tcl script it will pass some values to an object of a class defined into the C++ extension. Using these values the extension using a macro give some result and print some graphics.
In the test.tcl scrip I have:
#!object
use_namespace myClass
proc simulate {} {
uplevel #0 {
set running 1
for {} {$running} { } {
moveBugs
draw .world.canvas
.statusbar configure -text "t:[tstep]"
}
}
}
set toroidal 1
set nx 100
set ny 100
set mv_dist 4
setup $nx $ny $mv_dist $toroidal
addBugs 100
# size of a grid cell in pixels
set scale 5
myClass.scale 5
The object.cc looks like:
#include //some includes here
MyClass myClass;
make_model(myClass); // --> this is a macro!
The Macro "make_model(myClass)" expands as follows:
namespace myClass_ns { DEFINE_MYLIB_LIBRARY; int TCL_obj_myClass
(mylib::TCL_obj_init(myClass),TCL_obj(mylib::null_TCL_obj,
(std::string)"myClass",myClass),1); };
The Class definition is:
class MyClass:
{
public:
int tstep; //timestep - updated each time moveBugs is called
int scale; //no. pixels used to represent bugs
void setup(TCL_args args) {
int nx=args, ny=args, moveDistance=args;
bool toroidal=args;
Space::setup(nx,ny,moveDistance,toroidal);
}
The whole thing creates a cell-grid with some dots (bugs) moving from one cell to another.
My questions are:
How do the class methods and variables get the script values?
How is possible to have c++ code and compile it without a main function?
What is that macro doing there in the extension and how it works??
Thanks
Whenever a command in Tcl is run, it calls a function that implements that command. That function is written in a language like C or C++, and it is passed in the arguments (either as strings or Tcl_Obj* values). A full extension will also include a function to do the library initialisation; the function (which is external, has C linkage, and which has a name like Foo_Init if your library is foo.dll) does basic setting up tasks like registering the implementation functions as commands, and it's explicit because it takes a reference to the interpreter context that is being initialised.
The implementation functions can do pretty much anything they want, but to return a result they use one of the functions Tcl_SetResult, Tcl_SetObjResult, etc. and they have to return an int containing the relevant exception code. The usual useful ones are TCL_OK (for no exception) and TCL_ERROR (for stuff's gone wrong). This is a C API, so C++ exceptions aren't allowed.
It's possible to use C++ instance methods as command implementations, provided there's a binding function in between. In particular, the function has to get the instance pointer by casting a ClientData value (an alias for void* in reality, remember this is mostly a C API) and then invoking the method on that. It's a small amount of code.
Compiling things is just building a DLL that links against the right library (or libraries, as required). While extensions are usually recommended to link against the stub library, it's not necessary when you're just developing and testing on one machine. But if you're linking against the Tcl DLL, you'd better make sure that the code gets loaded into a tclsh that uses that DLL. Stub libraries get rid of that tight binding, providing pretty strong ABI stability, but are little more work to set up; you need to define the right C macro to turn them on and you need to do an extra API call in your initialisation function.
I assume you already know how to compile and link C++ code. I won't tell you how to do it, but there's bound to be other questions here on Stack Overflow if you need assistance.
Using the code? For an extension, it's basically just:
# Dynamically load the DLL and call the init function
load /path/to/your.dll
# Commands are all present, so use them
NewCommand 3
There are some extra steps later on to turn a DLL into a proper Tcl package, abstracting code that uses the DLL away from the fact that it is exactly that DLL and so on, but they're not something to worry about until you've got things working a lot more.

How to compile app in C with module?

I want to do application, which can be compiled with external modules, for example like in php. In php you can load modules in runtime, or compile php with modules together, so modules are available without loading in runtime. But i don't understand how this can be done. If i have module in module.c and there is one function, called say_hello, how can i register it to main application, if you understand what i mean?
/* module.c */
#include <stdio.h>
// here register say_hello function, but how, if i can't in global scope
// call another function?
void say_hello()
{
printf("hello!");
}
If i compile all that files(main app + modules) together, there isn't some reference to say_hello function from main app, because it is called only if user call it in its code. So how can i say to my app, hey, there is say_hello function, if someone want to call it, you know it exists.
EDIT1: I need to have something like table at runtime, where i can see if user called function exists (have C equivavent). Header files doesn't help to me.
EDIT2: My app is interpret for my script langugage.
EDIT3: If someone call function in php, php interpret must know that function exists. I know about dynamic linking and if .so or .dll is loaded, then some start routine is called and you can simple register function in that dll, so php interpret can see, if some module registred for example function called "say_hello". But if i want compile php with for example gd support, then how gd functions are registred to some php function list, hashtable or whatever?
I guess what you are looking for is dynamic libraries (we call runtime loadable modules as dynamic/shared libraries in C and in the OS world, in general). Take, for example, Pidgin which supports plugins to extend it's functionalities. It gives a particular interface to it's plugin-makers to abide by, say functions to register, load, unload and use, which the plugins will have to follow.
When the program loads, it looks for such dynamic libraries in it's plugins directory, if present, it'll load and use it, else it'll skip exposing the functionality. The reason why an interface is needed is that since different modules can have different functionalities which are unknown uptil runtime, an app. has to have a common, agreed-upon way of "talking" to it's plugins/modules.
Every C program can be linked to a static or a dynamic library; static will copy the code from the library to the said program, there by leaving no dependencies for the program to run, while linking to a dynamic library expects the dynamic library to be present when the program is launched. A third way of doing it, is not to link to a DLL, but just asking the OS to perform a load operation of the library. If this succeeds, then the dynamic module is used, else ignored. The functionality which the dynamic library should perform is exposed to the user, only if the load call succeeds.
It is to be noted that this is a operating system provided feature and it has nothing to do with the language used (C or C++ or Python doesn't matter here); as far as C is concered, the compiler still links to known code i.e. code which is available # compile time. This is the reason for different operating system, one needs to write different code to load a dynamic module. Even more, the file type/format of syuch libraries vary from system to system. In Linux it's called shared objects (.so), in Mac it's called dynamic libraries (.dylib) and in Windows as Dynamic link libraries (.dll).
C is not interpreted language. So you need linking, you may want static linking or dynamic linking.
Program building consists of 2 major phases: compiling and linking. During compiling all c-files are translated into machine code, leaving called functions unresolved (obj or o files). Then linker merges all these files into one executable, resolving what was unresolved.
This is static linking. Linked module becomes integral part of executable.
Dynamic linking is platform specific. Under windows these are DLLs. You should issue a system call to load DLL after which you will be able to call functions from it.
What you need is dynamic library. Let's first take a look at the example provided in the Linux manpage of dlopen(3):
/* Load the math library, and print the cosine of 2.0: */
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(int argc, char **argv) {
void *handle;
double (*cosine)(double);
char *error;
handle = dlopen("libm.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
/* Writing: cosine = (double (*)(double)) dlsym(handle, "cos");
would seem more natural, but the C99 standard leaves
casting from "void *" to a function pointer undefined.
The assignment used below is a workaround. */
*(void **) (&cosine) = dlsym(handle, "cos");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
printf("%f\n", (*cosine)(2.0));
dlclose(handle);
exit(EXIT_SUCCESS);
}
There's also a C++ dlopen mini HOWTO.
For more general information about dynamic loading, start from the wikipedia page first.
I think it is impossible, if i understand what you mean. Because it is compiled language.