Using Python CFFI with .lib and bunch of .dll and .h files - c++

I need to write Python2 wrapper for proprietary library, consists of few .h files (I made one big), bunch of .dll and one .lib file to link all this stuff.
I think I need API level, because of all this `typedef' in .h files
Script to create wrapper: build_wrapper.py
from cffi import FFI
import setuptools
ffibuilder = FFI()
ffibuilder.set_unicode(enabled_flag=True)
with open(os.path.join(curdir, 'include', 'ScadWrapper.h'), 'r') as f:
source = f.read()
ffibuilder.set_source('_wrapper', source,
extra_link_args=[r'C:\Documents\python\pyScadApi\pyScadApi\include\SCADAPIX.lib', ],
source_extension='.cpp')
if __name__ == '__main__':
ffibuilder.compile(verbose=True)
This runs without errors Creating library .\Release\_wrapper.lib and object .\Release\_wrapper.exp
But, for example,
from _wrapper import ffi, lib
lp_api = ffi.new('ScadAPI *')
r = lib.ApiCreate(lp_api)
Fails with
lp_api = ffi.new('ScadAPI *')
ffi.error: undefined type name
ScadAPI *
ScadAPI defined as
struct APIHandle_tag;
typedef APIHandle_tag * ScadAPI;
in ScadWrapper.h

You are never calling ffibuilder.cdef(). That's why the lib object is empty: it doesn't know about any type or function.
Sorry about being brief. My point is that the basics are explained on http://cffi.readthedocs.io/en/latest/overview.html. Following the "real example", the idea is to write in the cdef() only the pieces that are interesting to you, one function after the other, and the type declarations with suitable usage of ...;. Every function or type you write in cdef() becomes available for call (via lib.the_function_name()) or for ffi operations (ffi.new(), etc.).
There are different approaches to cffi (not directly documented or supported) that try to expose a whole large library at once, without needing any function-by-function work, however small that is. The idea is to extract from a .h file (possibly preprocessed by gcc -E) something that can be accepted by the cdef(). The drawback, particularly if you use gcc -E, is that the result is likely to work only on your exact OS. Moreover, such an approach appears faster but often isn't: it seems to avoid a per-function work, but that's never true, because if you are going to use that function somewhere, then you'll take time to write that function call anyway.

Related

Load a ctypes dlopen handle with cffi

I have this weird case where I absolutely need to open a library using ctypes._dlopen (intercepted method that use a weird memory layout) and then so I can use it in a more simpler way (with my header definitions) I need to load the handle (so the buffer) via cffi.
Here is the code I have so far, my issue is: How can I pass the handle which is just an int (aka the pointer address) as a void * to cffi without it not screaming about it (remember it needs an object of instance CData)
from cffi import FFI
from ctypes import *
import _cffi_backend
ffi = _cffi_backend.FFI()
ctypes_lib = CDLL("./_native__lib.so",mode = RTLD_GLOBAL)
print(ctypes_lib)
handle=ctypes_lib._handle
print(handle)
# handle=ctypes.cast(handle, ctypes.POINTER(ffi.CData))
handle=ctypes.cast(handle, ctypes.c_void_p)
print(handle)
lib = ffi.dlopen(handle)
TL;DR: ffi.dlopen(ctype._dlopen('lib.so')._handle)
NOTE: This is done in Python 2.7, so some of the hack in cffi source code will work (but this does not affect the core source of the problem)
For ressources, here is the main paragraph of cffi: (https://cffi.readthedocs.io/en/latest/cdef.html?highlight=dlopen#id11)
New in version 1.14: ffi.dlopen(handle): instead of a file path, you can give an already-opened library handle, as a cdata of type void *. Such a call converts this handle into a regular FFI object with the functions and global variables declared by ffi.cdef(). Useful if you have special needs (e.g. you need the GNU extension dlmopen(), which you can itself declare and call using a different ffi object). Note that in this variant, dlclose() is not called automatically if the FFI object is garbage-collected (but you can still call ffi.dlclose() explicitly if needed).

Why am I getting some of my python functions, when I import my module, but not others?

I'm writing function libraries in Python 2.7.8, to use in some UAT testing using froglogic Squish. It's for my employer, so I'm not sure how much I can share and still conform to company privacy regulations.
Early in the development, I put some functions in some very small files. There was one file that contained only a single function. I could import the file and use the function with no problem.
I am at a point where I want to consolidate some of those tiny files into a larger file. For some reason that completely eludes me, some of the functions that I copy/pasted into this larger file, are not being found, and a "NameError: global name 'My_variableStringVerify' is not defined" error is displayed, for example. (I just added the "My_", in case there was a name collision with some other function...)
This worked with the EXACT same simple function in a separate 'module'. Other functions in this python file -- appearing both before and after this function in the new, expanded module -- are being found and used without problems. The only module this function needs is re. I am importing that. I deleted all the pyc files in the directory, in case that was not getting updated (I'm pretty sure it was, from the datetime on the pyc file).
I have created and used dozens of functions in a dozen of my 'library modules', all with no issues. What's so special about this trivial, piece of crap function, as a part of a different module? It worked before, and it STILL works -- as long as I do not try to use it from the new library module.
I'm not python guru, but I have been doing this kind of thing for years...
Ugh. What a fool. The answer was in the error, after all: "global name xxx is not found". I was trying to use the function directly inside a Squish API call, which is the global scope. Moving the call to my function outside of the Squish API call (using it in the local scope), it worked fine.
The detail that surprised me: I was using "from foo import *", in both cases (before and after adding it to another 'library' module of mine).
When this one function was THE ONLY function in foo, I was able to use it in the global scope successfully.
When it was just one of many functions in foo-extended (names have been changed, to protect the innocent), I could NOT use it in the global scope. I had to reference it in the local scope.
After spending more time reading https://docs.python.org/2.0/ref/import.html (yes, it's old), I'm surprised it appeared in the global scope in either case. That page did state that "(The current implementation does not enforce the latter two restrictions, but programs should not abuse this freedom, as future implementations may enforce them or silently change the meaning of the program.)" about scope restrictions with the "from foo import *" statement.
I guess I found an edge case that somehow skirted the restriction in this implementation.
Still... what a maroon! Verifies my statement that I am no python guru.

Using C++ classes in LLVM Modules

Based on the Kaleidoscope and Kaleidoscope with MCJIT tutorials, I have code to create a Module and function and call it using MCJIT. The function needs a prototype:
auto ft = llvm::FunctionType::get(llvm::Type::getInt32Ty(Context), argTypes, false);
However, the example only covers Double as parameters and return values (the above uses an int). To do anything advanced, you need to pass things like classes and containers.
How do you use existing C++ classes in the module?
Sure, you can link to any library you want, but you need to declare function prototypes to use them. If the library API has classes, how do you declare them?
What I want is something like this:
auto ft = llvm::FunctionType::get(llvm::Type::getStructTy("class.std::string"), argTypes, false);
where class.std::string has been imported from string.h.
The LLVM API only has primitive types. You can define structs to represent the classes, but this is way too hard to do manually (and not portable).
A way to do it might be to compile the class to bitcode and read it into a module, but I want to avoid temporary files if possible. Also I'm not sure how to extract the type from the module but it should be possible. I tried this on a header file of one of my classes (I renamed the header file to a cpp file otherwise clang would make into a .gch precompiled header) and the result was just a constant... maybe it was optimised out? I tried it on the cpp file and it resulted in 36000 lines of code...
Then I found this page. Instead of using the LLVM API, I should use the Clang API because Clang, as a compiler, can compile the code into a Module. Then I can use the LLVM API with the imported Modules. Is this the right way to go? Any working source code is appreciated because it took forever just to get function calling working (the tutorials are out of date and documentation is scarce).
The way I would do it is to compile the class to LLVM IR, and then link the two modules. Then, there's two options to extract the type from the module:
First, you can use the llvm::TypeFinder. The way you use it is by creating it, and then calling run() on it with the module as an argument. This code snippet will print out all of the types in the module:
llvm::TypeFinder type_finder;
type_finder.run(module, true);
for (auto t : type_finder) {
std::cout << t->getName().str() << std::endl;
}
Alternatively, it's possible to use Module's getIdentifiedStructTypes() method and iterate over the resulting vector in the same way as above.

Registering each C/C++ source file to create a runtime list of used sources

For a debugging and logging library, I want to be able to find, at runtime, a list of all of the source files that the project has compiled and linked. I assume I'll be including some kind of header in each source file, and the preprocessor __FILE__ macro can give me a character constant for that file, so I just need to somehow "broadcast" that information from each file to be gathered by a runtime function.
The question is how to elegantly do this, and especially if it can be done from C as opposed to C++. In C++ I'd probably try to make a class with a static storage to hold the list of filenames. Each header file would create a file-local static instance of that class, which on creation would append the FILE pointer or whatever into the class's static data members, perhaps as a linked list.
But I don't think this will work in C, and even in C++ I'm not sure it's guaranteed that each element will be created.
I wouldn't do that sort of thing right in the code. I would write a tool which parsed the project file (vcproj, makefile or even just scan the project directory for *.c* files) and generated an additional C source file which contained the names of all the source files in some kind of pre-initialized data structure.
I would then make that tool part of the build process so that every time you do a build this would all happen automatically. At run time, all you would have to do is read that data structure that was built.
I agree with Ferruccio, the best way to do this is in the build system, not the code itself. As an expansion of his idea, add a target to your build system which dumps a list of the files (which it has to know anyway) to a C file as a string, or array of strings, and compile this file into your source. This avoids a lot of complication in the source, and is expandable, if you want to add additional information, like the version number from your source code control system, who built the executable, etc.
There is a standard way on UNIX and Linux - ident. For every source file you create ID tag - usually it is assigned by you version control system, e.g. SVN keywords.
Then to find out the name and revision of each source file you just use ident command. If you need to do it at runtime check out how ident does it - source for it should be freely available.
Theres no way to do it in C. In C++ you can create a class like this:
struct Reg {
Reg( const char * file ) {
StaticDictionary::Register( file );
};
where StaticDictionary is a singleton container for all your file names. Then in each source file:
static Reg regthisfile( __FILE__ );
You would want to make the dictionary a Meyers singleton to avoid order of creation problems.
I don't think you can do this in the way you outline in a "passive" mode. That is, you are going to somehow run code for each source file to be added to the registry, it's hard to get it to happen automatically.
Of course, it's possible that you can make that code very unobtrusive using macros. It might be problematic for C source files that don't have an "entrypoint", so if your code isn't already organised as "modules", with e.g. an init() function for each module, it might be hard. Static initializing code might be possible, I'm not 100% sure if the order in which things are initialized creates problems here.
Using static storage in the registry module sounds like an excellent idea, a plain linked list or simple hash table should be easy enough to implement, if your project doesn't already include any general-purpose utility library.
In C++ your solution will work. It's guaranteed.
Edit: Just found out a solution in my head: Change a rule in your makefile to add
'-include "cfiles_register.h"' to each 'g++ file.cpp'.
%.o : %.cpp
$(CC) -include 'cfiles_register.h' -o $# $<
put your proposed in the question implemnatation to that 'cfiles_register.h'.
Using static instances in C++ would work fine.
You could do this also in C, but you need to use runtime specific features - for MSVC CRT take a look at http://www.codeguru.com/cpp/misc/misc/threadsprocesses/article.php/c6945/
For C - you could do it with a macro - define a variable named corresponding to your file, and then you could scan the symbols of your executable, just as an idea:
#define TRACK_FILE(name) char _file_tracker_##name;
use it in your my_c_file.c like this:
TRACK_FILE(my_c_file_c)
and than grep all file/variable names from the binary like this
nm my-binary | grep _file_tracker
Not really nice, but...
Horrible idea, I'm sure, but use a singleton. And on each file do something like
Singleton.register(__FILE__);
at global scope. It'll only work on cpp files though.
I did something like this years ago as a novice, and it worked. But I'd cringe to do it now. I'd add a build step now.
I agree with those who say that it is better to avoid doing this at run time, but in C, you can initialize a static variable with a function call, that is, in every file:
static int doesntmatter = register( __FILE__);

python executing existent (&big) c++ code

I have a program in C++ that uses the cryptopp library to decrypt/encrypt messages.
It offers two interface methods encrypt & decrypt that receive a string and operate on it through cryptopp methods.
Is there some way to use both methods in Python without manually wrapping all the cryptopp & files included?
Example:
import cppEncryptDecrypt
string foo="testing"
result = encrypt(foo)
print "Encrypted string:",result
If you can make a DLL from that C++ code, exposing those two methods (ideally as "extern C", that makes all interfacing tasks so much simpler), ctypes can be the answer, not requiring any third party tool or extension. Otherwise, it's your choice between cython, good old SWIG, SIP, Boost, ... -- many, many such 3rd party tools will let your Python code call those two C++ entry points without any need for wrapping anything else but them.
As Alex suggested you can make a dll, export the function you want to access from python and use ctypes(http://docs.python.org/library/ctypes.html) module to access e.g.
>>> libc = cdll.LoadLibrary("libc.so.6")
>>> printf = libc.printf
>>> printf("Hello, %s\n", "World!")
Hello, World
or there is alternate simpler approach, which many people do not consider but is equally useful in many cases i.e. directly call the program from command line. You said you have already working program, so I assume it does both encrypt/decrypt from commandline? if yes why don't you just call the program from os.system, or subprocess module, instead of delving into code and changing it and maintaining it.
I would say go the second way unless it can't fulfill your requirements.