Including a C++ executable when freezing a Python script with PyInstaller - c++

I've tried to read through the PyInstaller documentation and using spec files, as well as Google/SO, but haven't found any clear answers.
I have written a python script using Biopython, and have made it into an executable with PyInstaller and it works fine. However the script uses a Biopython function (NcbiBlastnCommandline()) that is a wrapper for the NCBI Blast+ blastn program (written in C++), and at the moment the user still needs to have the NCBI Blast+ installed locally.
Is it possible to package the C++ .exe along with the rest, so the end user only needs to download my executable and nothing else?

Read the part of the tutorial about adding binary files:
To add binary files, make a list of tuples that describe the files needed. Assign the list of tuples to the binaries= argument of Analysis.
a = Analysis(...
binaries=[ ( '/path/to/blastn.exe', 'blastn.exe' ) ],
...

Related

Building a C++ project that links to a Haskell library, using Shake and Stack

I'm trying to build a simple C++ project (an executable) that calls a Haskell function,
using Shake for the build script and calling Stack from within the script to build the Haskell library.
Let's say the Haskell library is called haskell-simple-lib.
The shake script calls stack install haskell-simple-lib which outputs an .so file: libHShaskell-simple-lib-*version*-*unique identifier*.so
My Shake rules depends on filenames, and so I can't use the aforementioned name as I don't know in advance what the unique identifier will be. And so, the Shake script runs a cp on the file to _build/libHShaskell-simple-lib.so
The link options for the C++ executable has -L_build and -lhaskell-simple-lib.
When I try to run the executable I get an error saying:
error while loading shared libraries: libHShaskell-simple-lib-0.1.0.0-8DkaSm3F3d44RUd03fOuDx-ghc7.10.2.so: cannot open shared object file: No such file or directory
But, if I rename the file I copied to _build, to the original name that stack install outputted (the one with the unique identifier), the executable runs correctly.
One would think that all I need to do is to simply cp the file to _build without erasing the unique identifier from the name, however I need to know the name of the .so file in advance for the shake script.
I don't understand why when the executable is run the original .so filename is searched for. The link flag doesn't mention the fullname of the .so that stack install outputted, only libHShaskell-simple-lib.
Could it be that the original name is embedded in the .so file? If so, how does one go about solving this issue?
EDIT:
I'm aware this could be solved using a dummy file, but I'd like to know if there's a better way to do this.
The original identifier is embedded in the .so. I don't remember all the details, but I do know that I've solved such problems using rpath twiddling in the past.

How to add and use .zip (or .pak) files to c++ project?

I'm compiling CEF (Chromium Embedded Framework) for our local html5 presentation.
I should say I'm very new for all this (CEF and C++).
I've already optimized cefclient project for the presentation, but I need to embed all html/js/css/etc files into project (reading from local storage is not an option).
As I understood, I should use .zip or .pak (renamed zip) files to embed. But how can I use them inside the project?
Should I use some lib for unzipping (zlib?) or there is another popular way? And how can I be sure that files will be compiled into project?
Sorry for such basic questions but there are very few information about this (or google hates me today).
Thank you for any help!
UPD: found great tool - WBEA (http://asterclick.drclue.net/WBEA.html), it looks like exactly what I want to, but works pretty slow (with JS).
UPD 2: It turns out that there are many ways to make HTML5 desktop application, for example Node-Webkit.
Here is an article that compares some of them http://clintberry.com/2013/html5-apps-desktop-2013/
You need:
Create zip file whitin your resources.
Embed it as win32 resource (after this step you will get correct executable with .zip file inside).
Create custom scheme handler to access this zip file.
CefZipReader class will be handly to implement handler from step 3.
Look around, may be something like what you want already exist somewhere.
This sounds very similar to self extracting installers.
No need to compile anything, just concatenate the zip to the end of the executable. All you need to do is find the offset at runtime from the start of the executable. This can be done easily by writing a large magic number and looking for it later.
Example Linux:
cat app magic_number data > new_app
Example Windows:
copy app.exe /B + magic.dat /B + data.dat /B new_app.exe

Moving from sourceCpp to a package w/Rcpp

I currently have a .cpp file that I can compile using sourceCpp(). As expected the corresponding R function is created and the code works as expected.
Here it is:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector exampleOne(NumericVector vectorOne, NumericVector vectorTwo){
NumericVector outputVector = vectorOne + vectorTwo;
return outputVector;
}
I am now converting my project over to a package using Rcpp. So I created the skeleton with rStudio and started looking at how to convert things over.
In Hadley's excellent primer on Cpp, he says in section "Using Rcpp in a Package":
If your packages uses the Rcpp::export attribute then one additional step in the package build process is requried. The compileAttributes function scans the source files within a package for Rcpp::export attributes and generates the code required to export the functions to R.
You should re-run compileAttributes whenever functions are added, removed, or have their signatures changed. Note that if you build your package using RStudio or devtools then this step occurs automatically.
So it looks like the code that compiled with sourceCpp() should work pretty much as is in a package.
I created the corresponding R file.
exampleOne <- function(vectorOne, vectorTwo){
outToR <- .Call("exampleOne", vectorOne, vectorTwo, PACKAGE ="testPackage")
outToR
}
Then I (re)built the package and I get this error:
Error in .Call("exampleOne", vectorOne, vectorTwo, PACKAGE = "voteR") :
C symbol name "exampleOne" not in DLL for package "testPackage"
Does anyone have an idea as to what else I need to do when taking code that compiles with sourceCpp() and then using it in a package?
I should note that I have read: "Writing a package that uses Rcpp" http://cran.rstudio.com/web/packages/Rcpp/vignettes/Rcpp-package.pdf and understand the basic structure presented there. However, after looking at the RcppExamples source code, it appears that the structure in the vignettes is not exactly the same as that used in the example package. For example there are no .h files used. Also neither the vignette nor the source code use the [[Rcpp::export]] attribute. This all makes it difficult to track down exactly where my error is.
Here is my "walk through" of how to go from using sourceCpp() to a package that uses Rcpp. If there is an error please feel free to edit this or let me know and I will edit it.
[NOTE: I HIGHLY recommend using RStudio for this process.]
So you have the sourceCpp() thing down pat and now you need to build a package. This is not hard, but can be a bit tricky, because the information out there about building packages with Rcpp ranges from the exhaustive thorough documentation you want with any R package (but that is above your head as a newbie), and the newbie sensitive introductions (that may leave out a detail you happen to need).
Here I use oneCpp.cpp and twoCpp.cpp as the names of two .cpp files you will use in your package.
Here is what I suggest:
A. First I assume you have a version of theCppFile.cpp that compiles with sourceCpp() and works as you expect it to. This is not a must, but if you are new to Rcpp OR packages, it is nice to make sure your code works in this simple situation before you move to the more complicated case below.
B. Now build your package using Rcpp.package.skeleton() or use the Project>Create Project>Package w/Rcpp wizard in RStudio (HIGHLY recommended). You can find details about using Rcpp.package.skeleton() in hadley/devtools or Rcpp Attributes Vignette. The full documentation for writing packages with Rcpp is in Writing a package that uses Rcpp, however this one assumes you know your way around C++ fairly well, and does not use the new "Attributes" way of doing Rcpp. It will be invaluable though if you move toward making more complex packages.
You should now have a directory structure for your package that looks something like this:
yourPackageName
- DESCRIPTION
- NAMESPACE
- \R\
- RcppExports.R
- Read-and-delete-me
- \man\
- yourPackageName-package.Rd
- \src\
- Makevars
- Makevars.win
- oneCpp.cpp
- twoCpp.cpp
- RcppExports.cpp
Once everything is set up, do a "Build & Reload" if using RStudio, or compileAttributes() if you are not in RStudio.
C. You should now see in your \R directory a file called RcppExports.R. Open it and check it out. In RcppExports.R you should see the R wrapper functions for all the .cpp files you have in your \src directory. Pretty sweet, eh?.
D) Try out the R function that corresponds to the function you wrote in theCppFile.cpp. Does it work? If so move on.
E) You can now just add new .cpp files like otherCpp.cpp to the \src directory as you create them. Then you just have to rebuild the package, and the R wrappers will be generated and added to RcppExports.R for you. In RStudio this is just "Build & Reload" in the Build menu. If you are not using RStudio you should run compileAttributes()
In short, the trick is to call compileAttributes() from within the root of the package. So for instance for package foo
$ cd /path/to/foo
$ ls
DESCRIPTION man NAMESPACE R src
$ R
R> compileAttributes()
This command will generate the RcppExports.cpp and RcppExports.R that were missing.
You are missing the forest for the trees.
sourceCpp() is a recent function; it is part of what we call "Rcpp attributes" which has its own vignette (with the same title, in the package, on my website, and on CRAN) which you may want to read. It among other things details how to turn something you compiled and run using sourceCpp() into a package. Which is what you want.
Randomly jumping between documentation won't help you, and at the end of the genuine source documentation by package authors may be preferable. Or to put a different spin on it: you are using a new feature but old documentation that doesn't reflect it. Try to write a basic package with Rcpp, ie come to it from the other end as well.
Lastly, there is a mailing list...

Py_Initialize fails - unable to load the file system codec

I am attempting to put together a simple c++ test project that uses an embedded python 3.2 interpreter. The project builds fine but Py_Initialize raises a fatal error:
Fatal Python error: Py_Initialize: unable to load the file system codec
LookupError: no codec search functions registered: can't find encoding
Minimal code:
#include <Python.h>
int main (int, char**)
{
Py_Initialize ();
Py_Finalize ();
return 0;
}
The OS is 32bit Vista.
The python version used is a python 3.2 debug build, built from sources using VC++ 10.
The python_d.exe file from the same build runs without any problems.
Could someone explain the problem and how to fix it? My own google-fu fails me.
EDIT 1
After going through the python source code I've found that, as the error says, no codec search functions have been registered. Both codec_register and PyCodec_Register are as they should be. It's just that nowhere in the code are any of these functions called.
I don't really know what this means as I still have no idea when and from where these functions should have been called. The code that raises the error is entirely missing from the source of my other python build (3.1.3).
EDIT 2
Answered my own question below.
Check the PYTHONPATH and PYTHONHOME environment variables and make sure they don't point to Python 2.x.
http://bugs.python.org/issue11288
Parts of this have been mentioned before, but in a nutshell this is what worked for my environment where I have multiple Python installs and my global OS environment set-up to point to a different install than the one I attempt to work with when encountering the problem.
Make sure your (local or global) environment is fully set-up to point to the install you aim to work with, e.g. you have two (or more) installs of, let's say a python27 and python33 (sorry these are windows paths but the following should be valid for equivalent UNIX-style paths just as well, please let me know about anything I'm missing here (probably the DLLs path might differ)):
C:\python27_x86
C:\python33_x64
Now, if you intend to work with your python33 install but your global environment is pointing to python27, make sure you update your environment as such (while PATH and PYTHONHOME may be optional (e.g. if you temporarily work in a local shell)):
PATH="C:\python33_x64;%PATH%"
PYTHONPATH="C:\python33_x64\DLLs;C:\python33_x64\Lib;C:\python33_x64\Lib\site-packages"
PYTHONHOME=C:\python33_x64
Note, that you might need/want to append any other library paths to your PYTHONPATH if required by your development environment, but having your DLLs, Lib and site-packages properly set-up is of prime importance.
Hope this helps.
The core reason is quite simple: Python does not find its modules directory, so it can of course not load encodings, too
Python doc on embedding says "Py_Initialize() calculates the module search path based upon its best guess" ... "In particular, it looks for a directory named lib/pythonX.Y"
Yet, if the modules are installed in (just) lib - relative to the python binary - above guess is wrong.
Although docs says that PYTHONHOME and PYTHONPATH are regarded, we observed that this was not the case; their actual presence or content was completely irrelevant.
The only thing that had an effect was a call to Py_SetPath() with e.g. [path-to]\lib as argument before Py_Initialize().
Sure this is only an option for an embedding scenario where one has direct access and control over the code; with a ready-made solution, special steps may be necessary to solve the issue.
Ran into the same thing trying to install brew's python3 under Mac OS! The issue here is that in Mac OS, homebrew puts the "real" python a whole layer deeper than you think. You would think from the homebrew output that
$ echo $PYTHONHOME
/usr/local/Cellar/python3/3.6.2/
$ echo $PYTHONPATH
/usr/local/Cellar/python3/3.6.2/bin
would be correct, but invoking $PYTHONPATH/python3 immediately crashes with the abort 6 "can't find encodings." This is because although that $PYTHONHOME looks like a complete installation, having a bin, lib etc, it is NOT the actual Python, which is in a Mac OS "Framework". Do this:
PYTHONHOME=/usr/local/Cellar/python3/3.x.y/Frameworks/Python.framework/Versions/3.x
PYTHONPATH=$PYTHONHOME/bin
(substituting version numbers as appropriate) and it will work fine.
From python3k, the startup need the encodings module, which can be found in PYTHONHOME\Lib directory.
In fact, the API Py_Initialize () do the init and import the encodings module.
Make sure PYTHONHOME\Lib is in sys.path and check the encodings module is there.
I had this issue with python 3.5, anaconda 3, windows 7 32 bit. I solved it by moving my pythonX.lib and pythonX.dll files into my working directory and calling
Py_SetPythonHome(L"C:\\Path\\To\\My\\Python\\Installation");
before initialize so that it could find the headers that it needed, where my path was to "...\Anaconda3\". The extra step of calling Py_SetPythonHome was required for me or else I'd end up getting other strange errors where python import files.
I just ran into the exact same problem (same Python version, OS, code, etc).
You just have to copy Python's Lib/ directory in your program's working directory ( on VC it's the directory where the .vcproj is )
There appears to be something going wrong with the release build either failing to include the appropriate codecs or else misidentifying the codec to use for system APIs. Since the python_d executable is working, what does that return for os.getfsencoding()? (Use the C API to call that between your Initialize/Finalize calls)
I had the same issue and found this question. However from the answers here I was not able to solve my problem. I started debugging the cpython code and thought that I might be discovered a bug. Therefore I opened a issue on the python issue tracker.
My mistake was that I did not understand that Py_SetPath clears all inferred paths.
So one needs to set all paths when calling this function.
Link to the issue conversation
For completion I also copied the most important part of the conversation below.
My original issue text
I compiled the source of CPython 3.7.3 myself on Windows with Visual Studio 2017 together with some packages like e.g numpy. When I start the Python Interpreter I am able to import and use numpy. However when I am running the same script via the C-API I get an ModuleNotFoundError.
So the first thing I did, was to check if numpy is in my site-packages directory and indeed there is a folder named numpy-1.16.2-py3.7-win-amd64.egg. (Makes sense because the python interpreter can find numpy)
The next thing I did was to get some information about the sys.path variable created when running the script via the C-API.
#### sys.path content ####
C:\Work\build\product\python37.zip
C:\Work\build\product\DLLs
C:\Work\build\product\lib
C:\PROGRAM FILES (X86)\MICROSOFT VISUAL STUDIO\2017\PROFESSIONAL\COMMON7\IDE\EXTENSIONS\TESTPLATFORM
C:\Users\rvq\AppData\Roaming\Python\Python37\site-packages
Examining the content of sys.path I noticed two things.
C:\Work\build\product\python37.zip has the correct path 'C:\Work\build\product\'. There was just no zip file. All my files and directory were unpacked. So I zipped the files to an archive named python37.zip and this resolved the import error.
C:\Users\rvq\AppData\Roaming\Python\Python37\site-packages is wrong it should be C:\Work\build\product\Lib\site-packages but I dont know how this wrong path is created.
The next thing I tried was to use Py_SetPath(L"C:/Work/build/product/Lib/site-packages") before calling Py_Initialize(). This led to
Fatal Python Error 'unable to load the file system encoding'
ModuleNotFoundError: No module named 'encodings'
I created a minimal c++ project with exact these two calls and started to debug Cpython.
int main()
{
Py_SetPath(L"C:/Work/build/product/Lib/site-packages");
Py_Initialize();
}
I tracked the call of Py_Initialize() down to the call of
static int
zipimport_zipimporter___init___impl(ZipImporter *self, PyObject *path)
inside of zipimport.c
The comment above this function states the following:
Create a new zipimporter instance. 'archivepath' must be a path-like
object to a zipfile, or to a specific path inside a zipfile. For
example, it can be '/tmp/myimport.zip', or
'/tmp/myimport.zip/mydirectory', if mydirectory is a valid directory
inside the archive. 'ZipImportError' is raised if 'archivepath'
doesn't point to a valid Zip archive. The 'archive' attribute of the
zipimporter object contains the name of the zipfile targeted.
So for me it seems that the C-API expects the path set with Py_SetPath to be a path to a zipfile. Is this expected behaviour or is it a bug?
If it is not a bug is there a way to changes this so that it can also detect directories?
PS: The ModuleNotFoundError did not occur for me when using Python 3.5.2+, which was the version I used in my project before. I also checked if I had set any PYTHONHOME or PYTHONPATH environment variables but I did not see one of them on my system.
Answer
This is probably a documentation failure more than anything else. We're in the middle of redesigning initialization though, so it's good timing to contribute this feedback.
The short answer is that you need to make sure Python can find the Lib/encodings directory, typically by putting the standard library in sys.path. Py_SetPath clears all inferred paths, so you need to specify all the places Python should look. (The rules for where Python looks automatically are complicated and vary by platform, which is something I'm keen to fix.)
Paths that don't exist are okay, and that's the zip file. You can choose to put the stdlib into a zip, and it will be found automatically if you name it the default path, but you can also leave it unzipped and reference the directory.
A full walk through on embedding is more than I'm prepared to type on my phone. Hopefully that's enough to get you going for now.
I had the problem and was tinkering with different solutions mentioned here. Since I was running my project from Visual Studio, apparently, I needed to set the environment path inside Visual Studio and not the system path.
Adding a simple PYTHONHOME=PATH\TO\PYTHON\DIR in the project solution\properties\environment solved the problem.
For me this happened when I updated Python 64 bit from 3.6.4 to 3.6.5. It threw some error like "unable to extract python.dll. Do you have permissions."
Pycharm also failed to load interpreter, even though I reloaded it in settings. Running python command gave same error, with and without administrator mode.
Reason
There was error in installation of Python, include folder in python installation directory C:\Users\USERNAME\AppData\Local\Programs\Python\Python36 was missing
Reinstalling Python also dint solve the issue.(Not removal and install)
Solution
Uninstall Python and Install of Python again.
Because running installer was just extracting same files excluding include folder
In my cases, for windows, if you have multiple python versions installed, if PYTHONPATH is pointing to one version the other ones didn't work. I found that if you just remove PYTHONPATH, they all work fine
For those working in Visual Studio simply add the include, Lib and libs directories to the Include Directories and Library Directories under
Projects Properties -> Configuration Properties > VC++ Directories :
For example I have Anaconda3 on my system and working with Visual Studio 2015 This is how the settings looks like (note the Include and Library directories) :
Edit:
As also pointed out by bossi setting PYTHONPATH in your user Environment Variables section seems necessary.
a sample input can be like this (in my case):
C:\Users\Master\Anaconda3\Lib;C:\Users\Master\Anaconda3\libs;C:\Users\Master\Anaconda3\Lib\site-packages;C:\Users\Master\Anaconda3\DLLs
is necessary it seems.
Also, you need to restart Visual Studio after you set up the PYTHONPATH in your user Environment Variables for the changes to take effect.
Also note that :
Make sure the PYTHONHOME environment variable is set to the Python
interpreter you want to use. The C++ projects in Visual Studio rely on
this variable to locate files such as python.h, which are used when
creating a Python extension.
So, for some reason the python dll fails to locate the encodings module. The python.exe executable apparently finds it because it has the expected relative path. Modifying the search path works.
The reason for all of this? Don't know but at least it works. I highly suspect a typo on my part somewhere, that's usually the reason for odd bugs it seems.

Using code generated by Py++ as a Python extension

I have a need to wrap an existing C++ library for use in Python. After reading through this answer on choosing an appropriate method to wrap C++ for use in Python, I decided to go with Py++.
I walked through the tutorial for Py++, using the tutorial files, and I got the expected output in generated.cpp, but I haven't figured out what to do in order to actually use the generated code as an extension I can import in Python. I'm sure I have to compile the code, now, but with what? Am I supposed to use bjam?
Py++ generates you syntax you use along with boost::python to generate python entry points in your app.
Assuming everything went well with Py++ you need to download the Boost framework, and add the boost include directory and the boost::python lib to your project then compile with the Py++ generated cpp.
You can use whatever build system you want for your project, but boost is built with bjam. You need to choose whether you want a static lib or a dynamic boost python lib then follow the instructions for building boost here.
If on windows, you need to change the extension on your built library from .dll to.pyd. And yes it needs to be a library project, this does not work with executables.
Then, place the pyd where the python on your machine can find it and go into python and execute import [Your-library-name] and hopefully everything will work.
One final note, the name given in generated.cpp in this macro:
BOOST_PYTHON_MODULE( -name- )
needs to be the exact name of your project, otherwise python will complain.
I just went through all this less than a month ago so I know about the confusion.
One thing I did to make my python extension very easy to use while building the library and testing, was to build boost::python and python myself in my build environment. That way the pyd ends up exactly where I want it and users do not need to install python in order to run with my extension. That may be overkill for what you are doing though.
Edit:
If you want your extension to be easily installed and compiled on a machine, check out python's setuptools. With just a few simple lines you can have python compile and install your package for you. One downside though is its not IDE compatible for those of us who like developing in visual studio.
The following answer was provided to me by Roman Yakovenko on the Python C++-sig mailing list; I'm posting it here, with minor edits, for the benefit of the Stack Overflow community.
I don't fully comprehend the answer yet, but I felt it points me in the right direction.
After you have generated the code, you have to compile it. For this purpose, you can use your favorite build system. I use bjam only to compile boost. After this, I prefer to use scons (on Windows and on Linux).
The following is an example of sconstruct file, which is used to compile one of the Py++ unittests (this is generated code too :-) ):
import sys
env = Environment()
if 'linux' not in sys.platform:
env['MSVS'] = {'VERSION': ''}
env['MSVS_VERSION'] = ''
Tool('msvc')(env)
t = env.SharedLibrary(
target=r'abstract_classes',
source=[r'/home/roman/language-binding/sources/pyplusplus_dev/unittests/temp/abstract_classes.cpp'],
LIBS=[r"boost_python"],
LIBPATH=[r"", r"/home/roman/include/libs"],
CPPPATH=[
r"/home/roman/boost_svn",
r"/usr/include/python2.6",
r"/home/roman/language-binding/sources/pyplusplus_dev/unittests/temp",
r"/home/roman/language-binding/sources/pyplusplus_dev/unittests/data",
r"/home/roman/boost_svn"
],
CCFLAGS=[ ],
SHLIBPREFIX='',
SHLIBSUFFIX='.so'
)
Since your code generator written in Python, you can continue where Py++ stops and generate your favorite "make" file. You can go even father. Py++ tests generate the code, compile, load the new module and test the functionality. All this is done in a single, stand alone process.
I wrote a small makefile with the following:
GNUmakefile:
PYTHON_INC=$(shell python-config --includes)
PYTHON_LIBS=$(shell python-config --libs)
BOOST_LIBS=-lboost_python
all:
g++ -W -Wall $(PYTHON_INC) $(PYTHON_LIBS) $(BOOST_LIBS) -fPIC -shared generated.cpp -o hw.so
and then loaded the created .so into ipython to play around with it:
In [1]: import hw
In [2]: a = hw.animal('zebra')
In [3]: a.name()
Out[3]: 'zebra'